g0b's comments

g0b · on Jan 10, 2024

It's still early days for Vcc, I outline the caveats in the landing page. While I'm confident the control-flow bits and whatnot will work robustly, there's a big open question when it comes to the fate of standard libraries, the likes of libstdc++ were not designed for this use-case.

We'll be working hard on it all the way to Vulkanized, if you have some applications you can get up and running by then, feel free to get in touch.

I think the driver ecosystem for Vulkan is rather high-quality but that's more my (biased!) opinion that something I have hard data on. The Mesa/NIR-based drivers in particular are very nice to work with!

cloudhan · on Jan 10, 2024

Thoes "existing libraries" does not necessary mean stdc++, but some parallel primitive, and are essential to performance portability. For example, cub for scan and reduction, cutlass for dense linear algebra[1].

> I think the driver ecosystem for Vulkan is rather high-quality

Sorry, I meant OpenGL. At the time of evaluation, the market shared of vulkan on Android deivces is too small and been out of consideration at very early stage. I'd assume the state has changed a lot thereafter.

It is really good to see more projects take a shot on compiling C++ to GPU natively.

[1] cutlass itself is not portable, but the recently added cute is well portable as I evaluated. It provides a unified abstraction for hierarchical layout decomposition along with copy primitive and gemm primitive.

mathiasgredal · on Jan 10, 2024

Will C++17 parallel algorithms be supported?

https://on-demand.gputechconf.com/supercomputing/2019/pdf/sc...

Edit: Nevermind, I think I have misunderstood the purpose of this project. I thought it was a CUDA competitor, but it seems like it is just a shading language compiler for graphics.

pjmlp · on Jan 10, 2024

SYCL/DPC++ are the only viable CUDA competitors I would say, assuming that the tooling gets feature parity.

cloudhan · on Jan 10, 2024

circle lang is also very worth to check out.

g0b · on Jan 10, 2024

> Vector processing differences.. shader stage concepts.. primitive types like textures/samplers.. bindings.. push constants.. uniforms.. recursion limitations.. dynamic array limitations.. invocation groups.. control flow limitations.. extension hell.. -- these aren't going anywhere

The point of Vcc/Shady is to address some of these, in particular recursion is now possible in Vulkan, and control-flow is no longer limited. A lot of those are just historical language limitations and can be eliminated with enough effort.

The elephant in the room is SIMT and the subgroup/workgroup considerations, which don't really require any changes to syntax but indeed needs the programmer to have a good mental model. But I don't think it would be incompatible with raising the bar on expressiveness or host/device code interoperability !

emidoots · on Jan 10, 2024

> recursion is now possible in Vulkan, and control-flow is no longer limited.

If you care about mobile devices (even modern android phones) I suspect this is not true

g0b · on Jan 10, 2024

As others have commented, Shady is an IR that happens to have a Clang front-end bolted on, that combination is what I call Vcc. I really don't expect people to start writing shaders in the IR directly, even though has a textual form!

It's very comparable to Rust-GPU, I actually know a really nice person who does key work on that (you know who you are!) and they're actually facing very similar challenges, and we get to exchange ideas regularly. I am confident it's in good hands.

g0b · on May 24, 2022

I'm in fact talking about post-Volta hardware there, but this is not about forward progress, I meant using __ballotsync() and getting it wrong (ie waiting on the __activemask() from outside an if, but only in one branch of the if, meaning some of the threads will never participate in the sync) will deadlock the GPU.

It's a powerful (since _different_ locations statically can sync with each other), but also risky abstraction to expose, as compared to GLSL where it's impossible to deadlock anything by using subgroup intrinsics.

my123 · on May 24, 2022

That's indeed a quite raw abstraction, but is way too powerful performance-wise to not expose...

g0b · on May 24, 2022

Perhaps it makes sense for CUDA to expose it, but it certainly can't make sense for SPIR-V which has to work for a variety of hardware, most of which doesn't do ITS

g0b · on May 24, 2022

This grew out from a much rantier (and worse) version of this article, which talked about the hurdles I faced when considering SPIR-V codegen for our research compiler (Thorin).

For a while I wanted to rewrite it, and ultimately to properly discuss what I wanted to discuss, I had to write something introductory in a much broader sense. So it sort of organically grew from there, and that's why the title is now a bit weird, but I plan for the rest of the series to continue with SPIR-V as some sort of central reference point.

It's fair to say DXIL suffers from this class of troubles too, there's a series themaister where he describes the awfulness of the story over there, but I don't work with DX and I wanted to make as few comments on that as possible to avoid saying something wrong and unverified.

https://themaister.net/blog/2022/04/24/my-personal-hell-of-t...