lisyarus,
@lisyarus@mastodon.gamedev.place avatar

So I was accumulating Monte Carlo samples using good old blending here, now I've switched to a compute shader explicitly blending into a storage texture and it is ~10x slower, interesting

lisyarus,
@lisyarus@mastodon.gamedev.place avatar

So I messed with it a little and it seems that the storage texture itself isn't a problem (writing random values to it works with ~200 fps). Seemingly the core loop over the scene geometry is slower with compute. Interestingly, max ray depth doesn't affect the performance, as if it's prefetching all the data and then looping over it cached.

#gpu and #graphics folks, does anybody have a clue what's going on? This is wgpu-native, btw.

lisyarus,
@lisyarus@mastodon.gamedev.place avatar

@demofox @BartWronski
(sorry for the ping)

demofox,
@demofox@mastodon.gamedev.place avatar

@lisyarus do I have this right?
You were doing a full screen triangle and doing work in a pixel shader, using the alpha channel of your written pixel color to blend the sample into the final target.
Now you are doing the same work in a compute shader, and manually doing the blending (averaging) yourself, but now with an extra texture?
(But you ruled out the extra texture and manual blending as being the source of the perf delta)

lisyarus,
@lisyarus@mastodon.gamedev.place avatar

@demofox Almost! There's no extra texture in the second case, just a read - blend - write sequence in the compute shader with the same texture I was previously blending to.

BartWronski,
@BartWronski@mastodon.gamedev.place avatar

@lisyarus @demofox I expect read-blend-write to be slower, but by maybe 10%, not 1000% 😅
Launch profiler and see what's going on, maybe you launch 10x too much work?

demofox,
@demofox@mastodon.gamedev.place avatar

@BartWronski @lisyarus good point. I was helping someone recently who was doing that. They were dispatching without dividing the dispatch size by the thread counts. "Why is this so slow?!" It was doing 64x the work needed.

lisyarus,
@lisyarus@mastodon.gamedev.place avatar

@demofox @BartWronski Jesus f****** Christ, that's exactly what I was doing. Thanks you so much! :) It even runs 15% faster now!

demofox,
@demofox@mastodon.gamedev.place avatar
demofox,
@demofox@mastodon.gamedev.place avatar

@lisyarus I would expect that could be slower, because AFAIK there is dedicated hardware for blending on modern cards, that would be bypassed by doing it yourself. But you ruled that out by writing values without your scene code right? Hrm...
I'm not convinced this js it, but maybe the scene code plus custom blending makes it take more registers (vs built in transparency path) and decreases occupancy (parallelism)?

lisyarus,
@lisyarus@mastodon.gamedev.place avatar

Nevermind, I was just incredibly stupid: https://mastodon.gamedev.place/@demofox/112350078834348618

Craigp,
@Craigp@mastodon.social avatar

@lisyarus Something about the texture management on the backend?

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • tacticalgear
  • DreamBathrooms
  • cisconetworking
  • khanakhh
  • mdbf
  • magazineikmin
  • modclub
  • InstantRegret
  • rosin
  • Youngstown
  • slotface
  • Durango
  • kavyap
  • ngwrru68w68
  • provamag3
  • everett
  • normalnudes
  • cubers
  • tester
  • thenastyranch
  • osvaldo12
  • GTA5RPClips
  • ethstaker
  • megavids
  • anitta
  • Leos
  • JUstTest
  • lostlight
  • All magazines