@nh@mastodon.gamedev.place avatar

nh

@nh@mastodon.gamedev.place

Math, compilers, GPUs, horrible puns, off-by-one errors, and Oxford commas.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

dneto, to random
@dneto@mastodon.gamedev.place avatar

Love to see it. Testify!

"Implicit Warp-Synchronous Programming is Unsafe"

Well, i'd personally say "ill-defined", but you be you.

https://developer.nvidia.com/blog/using-cuda-warp-level-primitives/#implicit_warp-synchronous_programming_is_unsafe

nh, (edited )
@nh@mastodon.gamedev.place avatar

@dneto That's their design choice. It's not inherent to the problem space, and the explicit approach has its pain points as well. Raw masks are inherently error-prone to use.

I think they recognized that and that's why they added some higher level stuff on top, but I still don't find that solution very convincing.

Basically, their approach only makes sense if you think independent forward progress is important at a thread level.

nh,
@nh@mastodon.gamedev.place avatar

@dneto And that's obviously a cool feature, but...

Now the hot thing is ML where even divergent control flow is such a rarity that people invest into new languages like Triton that are the opposite of SIMT in the sense that you have a single thread of control that controls an entire workgroup.

nh, to random
@nh@mastodon.gamedev.place avatar

I wonder if compilers could meaningfully benefit from smart cache blocking.

LLVM has function pass managers. The idea being that we keep a single function in cache while running many passes on it, before then doing the same on the next function etc.

This makes sense because compilers tend to be pointer chase nightmares. You want that sweet L1 cache hit latency.

But compilers also tend to be branch nightmares. What if you have many, many tiny functions?

nh,
@nh@mastodon.gamedev.place avatar

You could bundle a small number of functions and run each pass on all of them before going to the next pass etc.

Once the pass sequence is done on one bundle of functions, grab the next bundle.

This is better in theory for I$ and branch prediction if you have a large compiler that runs many passes.

Is there a meaningful regime of function size where it'd be worth doing this?

dneto, to llvm
@dneto@mastodon.gamedev.place avatar

So the thing about LLVM IR is, in reality it's a family of accidental and informally defined dialects. Every LLVM-based compiler for a particular machine target refines the input module down to its own idiosyncratic dialect.

Consequently there's way more latitude for confusion and bugs than you'd initially guess.

Try using an i65 type in an x86 backend. When I did that years ago it sailed right through and generated nonsense code.

#llvm #compilers

nh,
@nh@mastodon.gamedev.place avatar

@dneto Totally. Which is why I would like to formalize this dialect-ness by borrowing some ideas from MLIR.

And why stuff like the ClangIR make me a little sad. Great idea, but build it natively on LLVM instead!

nh, to random
@nh@mastodon.gamedev.place avatar

Atomic loads and stores are sufficient for multi-threaded programming without tears, but they can't guarantee multi-threaded programming without tears.

nh, to random
@nh@mastodon.gamedev.place avatar

Regardless of how the Scarlett Johansson / OpenAI thing ultimately turns out, it's a great reminder that the more pressing alignment problem isn't alignment of machine learning systems, it's alignment of the alien intelligences also known as "corporations".

gfxstrand, to random
@gfxstrand@mastodon.gamedev.place avatar

This week's project: Reworking NVK cbuf support. We've had a lot of issues with too much internal stalling and I think a lot of them come down to the fact that we're re-binding cbufs every draw call.

My plan for root constants, is to do inline updates with the LOAD_CONSTANT_BUFFER command. I don't know how much of a difference there is but I strongly suspect this pipelines much better.

For bound cbufs, I'm planning to just make our dirty tracking way more competent.

We'll see how it goes!

nh,
@nh@mastodon.gamedev.place avatar

@castano @gfxstrand @Biovf Oh neat. How did that work? Location metadata of the image?

hal9000bma, to random German
@hal9000bma@lustigetiernamenbubble.de avatar

Kann es ein größeres „Fickt euch“ in Richtung Bayern geben?

nh,
@nh@mastodon.gamedev.place avatar

@hal9000bma Die Weißwurst am Abend ist das weitaus schlimmere Vergehen 😜

nh, to random
@nh@mastodon.gamedev.place avatar

Some thoughts about Git history, and a kind of history that is missing from commit metadata today
http://nhaehnle.blogspot.com/2024/05/a-new-kind-of-git-history.html

dneto, to random
@dneto@mastodon.gamedev.place avatar

SSD stands for "solid state drive".

That makes no sense. What is being driven?
I know it's from hard drive,. or floppy drive, where a disk was being spun.

nh,
@nh@mastodon.gamedev.place avatar

@dneto Not the case for SSDs, but could you have a solid state engine propelling a vehicle? Hmm, maglev trains...

nh, to random
@nh@mastodon.gamedev.place avatar

You may be nerdy, but you'll never be "build an entire fashion brand around the concept of taking the root of the previous result on your calculator" nerdy.

nh, to random
@nh@mastodon.gamedev.place avatar

Anarchist communities are unpresidented.

dneto, to random
@dneto@mastodon.gamedev.place avatar

Yes, friends, today I wrote an email beginning with "Yikes!"

nh,
@nh@mastodon.gamedev.place avatar

@dneto Yikes!

aras, to random
@aras@mastodon.gamedev.place avatar

You know that Sia/Guetta song that goes like,

“Rotate me down, no gimbal lock
I am quaternion”

Right?

nh,
@nh@mastodon.gamedev.place avatar

@aras I don't actually, and I'm curious.

nh, to random
@nh@mastodon.gamedev.place avatar

Lots of propaganda for elliptic curves, yet RSA signatures have built-in Two-Factor Authentication.

nh, to random
@nh@mastodon.gamedev.place avatar

Schade, dass KUKA von Chinesen gekauft wurde. Wären es Inder gewesen, gäbe es jetzt vielleicht einen KUKA-Radscha.

NOTimothyLottes, to random
@NOTimothyLottes@mastodon.gamedev.place avatar

Efficient Parallel GPU Allocation and Free (Wave Granularity)

Alloc:
(1.) Have bit array, one bit per item
(2.) Wave64 GPU can grab two 128-byte cachelines at a time for chunk of 2048 bits, loop until get something non-zero (ballot), have waves start at spaced out intervals
(3.) Grab the DWORD from the first non-zero lane, findFirstOne/etc to find a bit
(4.) AtomicAnd to clear the bit in memory, check the atomic return to know the bit wasn't already cleared

Free:
(5.) AtomicOr the bit

nh,
@nh@mastodon.gamedev.place avatar

@NOTimothyLottes I've been wondering about this for a while. Point 3 means you can allocate ~once per L2 latency. Not good for contention.

There's probably some variation of the scheme in which waves pick a pseudo-random bit to decrease the chance of collision and get higher allocation throughput.

nh,
@nh@mastodon.gamedev.place avatar

@NOTimothyLottes Could be as simple as logically rotating the words/bits based on a wave ID.

gfxstrand, to random
@gfxstrand@mastodon.gamedev.place avatar

Do I have any USB folks following me? I've got a weird issue with my Alienware R14 laptop keyboard that I think might call for a new quirk.

Whenever the laptop fails to resume (it's Linux, this happens), I hard reset the laptop and then my keyboard fails to enumerate. I've tried every quirk in the list and nothing works.

Ya'know what does work? Suspending the laptop and coming out of suspend again. Seems to bring the keyboard back every time.

nh,
@nh@mastodon.gamedev.place avatar

@gfxstrand I have a laptop whose keyboard very, very rarely gets into a state in which the keyboard is confused about which key is which. This happens at such a low level that the power button doesn't work anymore. The only known workaround I have is to let the battery run out. (At least ssh tends to still work in that case, so I can kick off something remotely that burns power...)

No, I don't think that helps you. I'm simply commiserating.

nh, to random
@nh@mastodon.gamedev.place avatar

Building a ROCm/HIP environment from scratch (at least the user space parts)
http://nhaehnle.blogspot.com/2024/02/building-hip-environment-from-scratch.html

nh, to random
@nh@mastodon.gamedev.place avatar

Every good dystopia is somebody's utopia.

jeffowski, to random
@jeffowski@mastodon.world avatar
nh,
@nh@mastodon.gamedev.place avatar

@Doomed_Daniel @jeffowski @korenchkin Yeah, there are important ways in which this is true, e.g. wealth and power scale, but also important ways in which it can be considered tone-deaf

nh,
@nh@mastodon.gamedev.place avatar

@Doomed_Daniel @jeffowski @korenchkin The tricky part is: the former matters more for voting, but the latter matters massively, and perhaps more, for feelings

At least IMHO

nh, to random
@nh@mastodon.gamedev.place avatar

Talking about dragons, it only takes a child to raze a village.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • anitta
  • thenastyranch
  • magazineikmin
  • cisconetworking
  • tacticalgear
  • mdbf
  • rosin
  • Youngstown
  • slotface
  • khanakhh
  • GTA5RPClips
  • kavyap
  • ngwrru68w68
  • DreamBathrooms
  • megavids
  • everett
  • ethstaker
  • modclub
  • cubers
  • love
  • normalnudes
  • Durango
  • InstantRegret
  • provamag3
  • tester
  • Leos
  • osvaldo12
  • JUstTest
  • All magazines