#rocThrust - kbin.social

giuseppebilotta, 10 months ago (edited 10 months ago) to Amd

Anyway, as I mentioned recently, I have a new workstation that finally allows me to test our code using all three backends (#CUDA, #ROCm/‌#HIP and #CPU w/ #OpeMP) thanks to having an #AMD #Ryzen processor with an integrated #GPU in addition to a discrete #NVIDIA‌ GPU.
Of course the iGPU is massively underpowered compared to the high-end dGPU workhorse, but I would expect it to outperform the CPU on most workloads.
And this is where things get interesting.

reply

expand (17)

collapse (17)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ giuseppebilotta

giuseppebilotta, 10 months ago

So, one of the reasons why we could implement the #HIP backend easily in #GPUSPH is that #AMD provides #ROCm drop-in replacement for much of the #NVIDIA #CUDA‌ libraries, including #rocThrust, which (as I mentioned in the other thread) is a fork of #Thrust with a #HIP/‌#ROCm backend.
This is good as it reduces porting effort, but it also means you have to trust the quality of the provided implementation.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

giuseppebilotta, 10 months ago

one is a build failure against my GPU (already reported, with a fix ready and pending release), and the other is … slow performance in one of the #Thrust API calls that we use!

Turns out, sort_by_key, at least in the way we use it, is somewhere between 25% and 50% slower on my #AMD iGPU when using the latest #rocThrust (from the 5.6.0 software stack) than it is on the CPU when using the latest #Thrust with the OpenMP backend!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

giuseppebilotta, 10 months ago

The other one was #Thrust producing completely bogus results:
https://github.com/NVIDIA/thrust/issues/1341
Interestingly, in both cases the issue wasn't with Thrust proper, but with complex interactions between our usage of the Thrust API, the compiler and/or the driver and the hardware.
Now, I don't know where the performance issues I'm seeing in #rocThrust are coming from, but I'm sure they'll get fixed soon.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...