@pervognsen@mastodon.social avatar

pervognsen

@pervognsen@mastodon.social

Performance, compilers, hardware, mathematics, computer science.

I've worked in or adjacent to the video game industry for most of my career.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

dotstdy, to random
@dotstdy@mastodon.social avatar

I feel like the most difficult part of subgroups and GPU programming in general, is getting all the terminology straight in your head. Sometimes it seems like it would be easier just writing rdna asm directly. :')

pervognsen,
@pervognsen@mastodon.social avatar

@dotstdy You know what has never helped? Every IHV and API having their own incompatible terminology!

pervognsen,
@pervognsen@mastodon.social avatar

@dotstdy It's my favorite too. It used to be fashionable to deride it as marketing wank but I think it's evocative and memorable and the hierarchy of the terms makes sense once you know that warp isn't a sci-fi word.

pervognsen,
@pervognsen@mastodon.social avatar

@dotstdy I think the marketing wank accusation is justified when we're counting each separate lane of a SIMD unit as a CUDA core. :)

pervognsen, to random
@pervognsen@mastodon.social avatar

I haven't done any real Vulkan programming since 1.0. Are there any good guides that skip all the legacy junk and only show the streamlined 1.3 way of doing things?

pervognsen,
@pervognsen@mastodon.social avatar

@zeux What's the compatibility landscape like for GPUs that support 1.3 but don't support bindless? I was hoping to just require bindless.

pervognsen,
@pervognsen@mastodon.social avatar

@zeux Oh, I just remembered you had your Niagara project. Do you recommend using that as a reference for good practices, etc?

dpiponi, to random
@dpiponi@mathstodon.xyz avatar

It's not like I had any chance of resisting when one of my favourite books is published in a fancy new hardback edition

pervognsen,
@pervognsen@mastodon.social avatar

@dpiponi Where are you on Player of Games vs Use of Weapons?

shriramk, to random
@shriramk@mastodon.social avatar

You can't be the financial capital of the world if you can't monetize everything in the news. (Hidden Grounds cafe, NYC.)

pervognsen,
@pervognsen@mastodon.social avatar

@shriramk I hope you put that $10 in the Kendrick jar.

pervognsen, to random
@pervognsen@mastodon.social avatar

Nothing is new: hash consing/value numbering in 1958. On Programming of Arithmetic Operations, A. P. Ershov, https://dl.acm.org/doi/10.1145/368892.368907

pervognsen, (edited )
@pervognsen@mastodon.social avatar

Ershov also independently invents open-addressed linear probing in that short paper although Amdahl, et al, had the idea a few years earlier in 1954.

pervognsen,
@pervognsen@mastodon.social avatar

Let's also invent the Sethi-Ullman algorithm 12 years early while we're at it.

pervognsen,
@pervognsen@mastodon.social avatar

(I'm not sure how much credit he gets for that. I've always been amused that Sethi-Ullman gets to have a fancy name attached for something so simple and relatively limited in practice. Whereas value numbering/hash consing might be a simple idea but it's extremely powerful and far reaching. But it's nice to see him attack related parts of the problem at once in such a short paper, not just value numbering but instruction scheduling and register allocation, since they all affect each other.)

pervognsen, to random
@pervognsen@mastodon.social avatar

One of my favorite hip-hop instrumentals: https://www.youtube.com/watch?v=s6Yyb3N9IuA. I was listening to J Cole's Everybody Dies and a YouTube commenter had just written "Kenny Dope" without any further context or explanation and I immediately understood what it meant.

pervognsen,
@pervognsen@mastodon.social avatar

If you don't get the reference, listen to the two tracks back to back: https://www.youtube.com/watch?v=-5slZHLSnow. They both sample https://en.wikipedia.org/wiki/Inside_My_Love.

lritter, to random
@lritter@mastodon.gamedev.place avatar

interesting problem: progressively mapping a cosmically high number of unique strings of arbitrary length to an ordered set so that we can assign an index to each string, extract a substring from each index, and filter strings not in the set.

evidently, this approach requires compression. the compressed result is functionally equivalent to a regular expression, or a schema validation system.

pervognsen,
@pervognsen@mastodon.social avatar

@lritter You didn't define everything to the point where I'm completely sure what you're describing but maybe https://blog.burntsushi.net/transducers/ is relevant.

pervognsen,
@pervognsen@mastodon.social avatar

@lritter Alright, I thought you were talking about strings-strings. Carry on. :)

pervognsen,
@pervognsen@mastodon.social avatar
pervognsen,
@pervognsen@mastodon.social avatar

@lritter Definitely one of the best simple ideas in CS.

pervognsen,
@pervognsen@mastodon.social avatar

@lritter Yeah, that's why I said it's all hash consing. It's very general and goes at least as far as back as a Russian paper in the early 60s on value numbering.

pervognsen,
@pervognsen@mastodon.social avatar

@lritter My bad, make that late 50s. https://dl.acm.org/doi/10.1145/368892.368907. Although I remember the terminology in that paper being somewhat impenetrable and the generality not so immediately apparent.

vurtun, to random
@vurtun@mastodon.gamedev.place avatar

Currently in process of converting my tool code from using dynamic arrays for lists/tree views to using fixed size arrays to complete the cycle.

Common problem is basically SELECT * FROM xxx WHERE xxx LIMIT off, count SORT BY xxxx. So we only want count elements from offset off and sorted by certain element.

Looked into it and with help from @slembcke found Floyd–Rivest and heap to get n*log(k) runtime.

Written about here: https://gist.github.com/vurtun/7063afddcf1491af16037a207a167e49. Does anyone know something better than this?

pervognsen,
@pervognsen@mastodon.social avatar

@vurtun Right, it's basically the same old IMGUI idea. It's just that sometimes people revert to indices that implicitly presume a fixed snapshot in time of something that in reality might be dynamically changing.

I think you can reconcile file-as-anchor with the traditional "jump to the 70th percentile of the list" UI because when you perform that kind of random access action you have no presumption that any particular entity is at that percentile so it's actually a different query altogether.

pervognsen,
@pervognsen@mastodon.social avatar

@vurtun In other words, that kind of percentile selection problem is actually where you need a selection algorithm like Floyd-Rivest. Because the query is inherently about percentiles. When your scroll window is anchored/focused on an item you're now in a fundamentally different situation. Another way to see that the situation is different is that the percentile query really is inherently about a snapshot in time whereas scrolling/pagination from a given point is something else.

steve, to random
@steve@discuss.systems avatar

A slight re-organization of Priest's "Efficient Scaling for Complex Division" to make it compatible with "try to divide the dumb fast way inline, then branch to rescale only if necessary" while preserving scale invariance of rounding.

Also fixes it up to work for Float16, which the original approach does not.

Further optimization possible and pretty straightforward.

https://github.com/apple/swift-numerics/pull/289

pervognsen,
@pervognsen@mastodon.social avatar

@saagar @steve @neilhenning That's affine algebra. Linear algebra is when you're stuck at y=mx.

amonakov, to random
@amonakov@mastodon.gamedev.place avatar

(prompted by discussion of detecting bitwise and-not earlier in GCC's optimization pipeline)

My ideal compiler IR would not have and/or/xor as distinct bitwise ops, just generic ternlog and probably the corresponding two-operand function ("bilog"?) too.

pervognsen,
@pervognsen@mastodon.social avatar

@amonakov For a reason I don't fully understand, this seems to be common in GPUs but not in CPUs. Even the VPTERNLOG instructions in AVX-512 were inherited from Larrabee AFAIK. Maybe GPU ISAs are less averse to many-operand instructions than CPU ISAs have traditionally been?

pervognsen,
@pervognsen@mastodon.social avatar

@amonakov Hmm, I just remembered Southern Islands had a metric truckload of 3 in, 1 out instructions and thought I remembered ternlog being in there. But looking through the ISA manual now I can't find it.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • megavids
  • magazineikmin
  • Youngstown
  • khanakhh
  • ngwrru68w68
  • slotface
  • ethstaker
  • mdbf
  • everett
  • kavyap
  • DreamBathrooms
  • thenastyranch
  • cisconetworking
  • rosin
  • JUstTest
  • Durango
  • GTA5RPClips
  • Leos
  • tester
  • tacticalgear
  • InstantRegret
  • normalnudes
  • osvaldo12
  • cubers
  • anitta
  • modclub
  • provamag3
  • lostlight
  • All magazines