nh

@nh@mastodon.gamedev.place

Math, compilers, GPUs, horrible puns, off-by-one errors, and Oxford commas.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

dneto, 5 days ago to random

Love to see it. Testify!

"Implicit Warp-Synchronous Programming is Unsafe"

Well, i'd personally say "ill-defined", but you be you.

https://developer.nvidia.com/blog/using-cuda-warp-level-primitives/#implicit_warp-synchronous_programming_is_unsafe

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

nh, 5 days ago (edited 5 days ago)

@dneto That's their design choice. It's not inherent to the problem space, and the explicit approach has its pain points as well. Raw masks are inherently error-prone to use.

I think they recognized that and that's why they added some higher level stuff on top, but I still don't find that solution very convincing.

Basically, their approach only makes sense if you think independent forward progress is important at a thread level.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

nh, 5 days ago

@dneto And that's obviously a cool feature, but...

Now the hot thing is ML where even divergent control flow is such a rarity that people invest into new languages like Triton that are the opposite of SIMT in the sense that you have a single thread of control that controls an entire workgroup.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

nh, 10 days ago to random

I wonder if compilers could meaningfully benefit from smart cache blocking.

LLVM has function pass managers. The idea being that we keep a single function in cache while running many passes on it, before then doing the same on the next function etc.

This makes sense because compilers tend to be pointer chase nightmares. You want that sweet L1 cache hit latency.

But compilers also tend to be branch nightmares. What if you have many, many tiny functions?

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

nh, 10 days ago

You could bundle a small number of functions and run each pass on all of them before going to the next pass etc.

Once the pass sequence is done on one bundle of functions, grab the next bundle.

This is better in theory for I$ and branch prediction if you have a large compiler that runs many passes.

Is there a meaningful regime of function size where it'd be worth doing this?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

dneto, 11 days ago to llvm

So the thing about LLVM IR is, in reality it's a family of accidental and informally defined dialects. Every LLVM-based compiler for a particular machine target refines the input module down to its own idiosyncratic dialect.

Consequently there's way more latitude for confusion and bugs than you'd initially guess.

Try using an i65 type in an x86 backend. When I did that years ago it sailed right through and generated nonsense code.

#llvm #compilers

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

nh, 11 days ago

@dneto Totally. Which is why I would like to formalize this dialect-ness by borrowing some ideas from MLIR.

And why stuff like the ClangIR make me a little sad. Great idea, but build it natively on LLVM instead!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ dneto

nh, 15 days ago to random

Atomic loads and stores are sufficient for multi-threaded programming without tears, but they can't guarantee multi-threaded programming without tears.

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

nh, 17 days ago to random

Regardless of how the Scarlett Johansson / OpenAI thing ultimately turns out, it's a great reminder that the more pressing alignment problem isn't alignment of machine learning systems, it's alignment of the alien intelligences also known as "corporations".

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

gfxstrand, 25 days ago to random

This week's project: Reworking NVK cbuf support. We've had a lot of issues with too much internal stalling and I think a lot of them come down to the fact that we're re-binding cbufs every draw call.

My plan for root constants, is to do inline updates with the LOAD_CONSTANT_BUFFER command. I don't know how much of a difference there is but I strongly suspect this pipelines much better.

For bound cbufs, I'm planning to just make our dirty tracking way more competent.

We'll see how it goes!

reply

expand (45)

collapse (45)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ BrodieOnLinux

nh, 14 days ago

@castano @gfxstrand @Biovf Oh neat. How did that work? Location metadata of the image?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

hal9000bma, 30 days ago to random German

Kann es ein größeres „Fickt euch“ in Richtung Bayern geben?

reply

expand (7)

collapse (7)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ bonk, prefec2, MagicLike, Gleisplan

nh, 30 days ago

@hal9000bma Die Weißwurst am Abend ist das weitaus schlimmere Vergehen 😜

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

nh, 1 month ago to random

Some thoughts about Git history, and a kind of history that is missing from commit metadata today
http://nhaehnle.blogspot.com/2024/05/a-new-kind-of-git-history.html

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ kernellogger, Doomed_Daniel

dneto, 1 month ago to random

SSD stands for "solid state drive".

That makes no sense. What is being driven?
I know it's from hard drive,. or floppy drive, where a disk was being spun.

reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

nh, 1 month ago

@dneto Not the case for SSDs, but could you have a solid state engine propelling a vehicle? Hmm, maglev trains...

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

nh, 1 month ago to random

You may be nerdy, but you'll never be "build an entire fashion brand around the concept of taking the root of the previous result on your calculator" nerdy.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Fangh, aras, dneto

nh, 1 month ago to random

Anarchist communities are unpresidented.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Doomed_Daniel

dneto, 1 month ago to random

Yes, friends, today I wrote an email beginning with "Yikes!"

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

nh, 1 month ago

@dneto Yikes!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

aras, 2 months ago to random

You know that Sia/Guetta song that goes like,

“Rotate me down, no gimbal lock
I am quaternion”

Right?

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

nh, 2 months ago

@aras I don't actually, and I'm curious.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

nh, 3 months ago to random

Lots of propaganda for elliptic curves, yet RSA signatures have built-in Two-Factor Authentication.

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

nh, 3 months ago to random

Schade, dass KUKA von Chinesen gekauft wurde. Wären es Inder gewesen, gäbe es jetzt vielleicht einen KUKA-Radscha.

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

NOTimothyLottes, 3 months ago to random

Efficient Parallel GPU Allocation and Free (Wave Granularity)

Alloc:
(1.) Have bit array, one bit per item
(2.) Wave64 GPU can grab two 128-byte cachelines at a time for chunk of 2048 bits, loop until get something non-zero (ballot), have waves start at spaced out intervals
(3.) Grab the DWORD from the first non-zero lane, findFirstOne/etc to find a bit
(4.) AtomicAnd to clear the bit in memory, check the atomic return to know the bit wasn't already cleared

Free:
(5.) AtomicOr the bit

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

nh, 3 months ago

@NOTimothyLottes I've been wondering about this for a while. Point 3 means you can allocate ~once per L2 latency. Not good for contention.

There's probably some variation of the scheme in which waves pick a pseudo-random bit to decrease the chance of collision and get higher allocation throughput.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

nh, 3 months ago

@NOTimothyLottes Could be as simple as logically rotating the words/bits based on a wave ID.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

gfxstrand, 3 months ago to random

Do I have any USB folks following me? I've got a weird issue with my Alienware R14 laptop keyboard that I think might call for a new quirk.

Whenever the laptop fails to resume (it's Linux, this happens), I hard reset the laptop and then my keyboard fails to enumerate. I've tried every quirk in the list and nothing works.

Ya'know what does work? Suspending the laptop and coming out of suspend again. Seems to bring the keyboard back every time.

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov, ljrk

nh, 3 months ago

@gfxstrand I have a laptop whose keyboard very, very rarely gets into a state in which the keyboard is confused about which key is which. This happens at such a low level that the power button doesn't work anymore. The only known workaround I have is to let the battery run out. (At least ssh tends to still work in that case, so I can kick off something remotely that burns power...)

No, I don't think that helps you. I'm simply commiserating.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

nh, 4 months ago to random

Building a ROCm/HIP environment from scratch (at least the user space parts)
http://nhaehnle.blogspot.com/2024/02/building-hip-environment-from-scratch.html

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Doomed_Daniel

nh, 4 months ago to random

Every good dystopia is somebody's utopia.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ dneto

jeffowski, 4 months ago to random

#NoBillionaires

reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ kimby, Wen, br00t4c, goinfawr +16 more

nh, 4 months ago

@Doomed_Daniel @jeffowski @korenchkin Yeah, there are important ways in which this is true, e.g. wealth and power scale, but also important ways in which it can be considered tone-deaf

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

nh, 4 months ago

@Doomed_Daniel @jeffowski @korenchkin The tricky part is: the former matters more for voting, but the latter matters massively, and perhaps more, for feelings

At least IMHO

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

nh, 4 months ago to random

Talking about dragons, it only takes a child to raze a village.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ dneto