TransWikia.com

Reading from buffer versus calculating on the fly performance

Computer Graphics Asked by wduk on August 27, 2021

I am creating a fast fourier transform algorithm for the compute shader – i am no expert on how GPUs really run optimally so thought i would ask here.

I have the option to calculate on the fly or precompute a lot of the trig function (cos and sin) values and store it in a read only float buffer.

My question is, though i am aware cos() and sin() is fast, how do they compare to simply getting the value precomputed in a buffer instead?

I don’t mean a LUT here i do not need to interpolate between values from the buffer so i just get them values directly. I don’t know how to really bench test shaders properly so i am unsure which is faster.

Perhaps some one knows more info on this?

One Answer

My question is, though i am aware cos() and sin() is fast, how do they compare to simply getting the value precomputed in a buffer instead?

This will depend on your shader code and GPU model.

  • GPUs utilize latency hiding by executing independent ALU instructions while waiting for data to arrive from memory.
  • Accessing memory is usually the slowest operation on a GPU.

If your shader code allows for latency hiding, it may be faster.

The best option here is to profile the shader performance.

I don't know how to really bench test shaders properly so i am unsure which is faster.

I think the easiest way would be to download RenderDoc and get a frame capture of your application.

After that, you click on the timing icon (shown red) to get some measurements.

enter image description here

Although these are not super accurate measurements, they're good enough and sort of representative of the application performance.

For precise measurements, you'll need your GPU vendor's graphics profiling tools.


Update: Below are some captures from RGP, showing GPU assembly instructions and instruction timings of various instruction types. You can see latency hiding in action below:

  • vmem : texture read
  • smem : constant buffer / descriptor<texture/buffer> read
  • ALU : math ops (arithmetic / logic)
  • sync : waits for vmem/smem instructions to finish loading data from memory

enter image description here

Another capture showing instruction timings while first two texture loads not having latency hiding.

enter image description here

Correct answer by Varaquilex on August 27, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP