What does GPU assembly look like?

Question

I have played around with CPU assembly programming like Nasm, Tasm or Masm, but I'm really curious to know how GPU works now.
However, i'm quite confused when I look on internet. I've heard about Cuda and OpenCL, but it is not what I'm looking for.
I'd like to know how GPUs instructions are in RAM... What are the Nasm and Masm for most GPUs? What is the x86 or Z80 of GPUs (What are the different families of GPU)? Do you know a constructor opcodes reference manual?
I think I really need something to compare between the two Processing Units to make it clear, because GPU assembly programming seems to be an even harder subject to learn from on internet that CPU asm programming. I've also read that "NVIDIA never released details on the instructions actually understood by their hardware", but it seems pretty surprising to me. Full post there: https://stackoverflow.com/questions/4660974/how-to-create-or-manipulate-gpu-assembler?newreg=e31519279ce949f087df6322dbf2bf4d
Thanks for your help!

Dan Hulme · Accepted Answer

You're tilting at windmills trying to learn "GPU assembly", and it's due to the differences between how CPUs and GPUs are made and sold.

Each CPU has what's called an instruction set architecture, for example x86 or ARMv8. The instruction set is the interface between the user of the CPU (i.e. the programmer) and the chip. The chip designer publishes the details of the instruction set so that compiler vendors can write compilers to target that instruction set. Any CPU that uses that instruction set can run the same binaries. (Extensions like SSE make that slightly untrue, but they only add new instructions: they don't change the structure of the binary.) When the vendor creates a new processor family, it could have completely different micro-architecture internally, but the same instruction set.

GPUs are not like this at all. For best efficiency, the instruction set is usually closely tied to the micro-architecture of the CPU. That means each new family of GPUs has a new instruction set. GPUs can't run binaries made for different GPUs. For this reason, the instruction set usually isn't published: instead, your interface to the GPU is the driver published by the vendor for each graphics API (OpenGL, Vulkan, DirectX, &c.). This is why the graphics APIs have functions to take the source code of a shader and run it: the compiled shader only runs on the same model or family of GPU it was compiled for.

The closest you get to GPU assembly language is SPIR-V. This is an industry-standard intermediate representation, which GPU vendors are starting to support. It's like a GPU equivalent of LLVM's intermediate representation. It allows you to do the parsing and source optimization parts of compilation up-front to get a SPIR-V file, and then the GPU driver only needs to compile that to the GPU's instruction set at load time.

Danil · Answer

For Nvidia GPUs there are several machine languages all of which can be read here:
https://docs.nvidia.com/cuda/cuda-binary-utilities/index.html#instruction-set-ref
Above the "Instruction Set Reference" section there are also some examples on SM70 variant of SASS code which is stored in ELF and dumped from there too so it's a machine code as ELF always was an unmanaged code container by historic reasons (apart from PE which could contain CIL managed code but even in this case it must be in data section while ELF does not support such confusing feature at all) and it's not managed/virtual bytecode or something as some users say so, it's a real native compiled device code for GPU. The correct instructions though are those which are written with capital letter and addresses in hexademical form (like /*0000*/ MOV R1, c[0x0][0x28] ;), others are just PTX and they aren't actually relevant to the question. As a drawback, fatbin technique may be required if a distributor has chosen to publish the GPU program in precompiled form rather than source code or at least portable PTX. But that is very rare as there are not as much such programs around as binaries for CPU. Anyway such code is usually contained within cubin files which are dynamically loaded by calling "cuModuleLoad" then you rewrite pointer to device GPU function handle using "cuModuleGetFunction" and then call "cuLaunchKernel" which forces the driver to send binary code to the GPU and start execution from like 0 address as all the kernels start from 0 (actually ELF variant used by Nvidia contains many sections but the driver only sends the desired kernel code along with its subroutines and data such as switch data etc.). And what about the instructions itself, well, they are mostly RISC-fashioned. There is no div or mod, instead nvcc compiler must output sophisticated algorithm based on addition, subtraction, multiplication, shifts, bitwise logical operations (bitwise and mostly). Only addressing mode reminds some CISC presence like "LD R2, [R2+0x1c];" etc.

Simon F · Answer

As others have said, GPUs expose a higher-level language, which allows multiple different architectures i.e. different vendors and different GPU generations, to all support the same applications.
[Update: Link to older ISA replaced with newer one]
If, however, you're still curious, documentation for the recent PowerVR instruction set is available - my favourite opcodes being "FRED" and "FARCTANC" :-)
Note that instructions sets can change from generation to generation so this is mainly for illustrative purposes.

user1118321 · Answer

What you've read is correct. In general, GPU vendors do not release lists of machine instructions for their GPUs.

That said, you can do something similar to assembly programming on the GPU by using OpenGL ARB Assembly Language. It allows you to program in assembly style, writing 1 opcode and its operands per line. While this is not quite the same as writing assembly language for the CPU, it's as close as your likely to get. I believe it's deprecated in modern OpenGL (though not positive). It still worked as of OpenGL 2.1.

What does GPU assembly look like?

4 Answers

Add your own answers!

Ask a Question