@pervognsen@mastodon.social avatar

pervognsen

@pervognsen@mastodon.social

Performance, compilers, hardware, mathematics, computer science.

I've worked in or adjacent to the video game industry for most of my career.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

pervognsen, to random
@pervognsen@mastodon.social avatar

I'd love to see a survey of non-gamedevs about the wider perception of ECS in the industry. The only AAA game I know of which is structured anything like the popular conception of ECS is Overwatch.

pervognsen, to random
@pervognsen@mastodon.social avatar

Two things I hate with the same flavor: Programs that spin up too many compute-bound worker threads and too many parallel downloader threads. Thanks for the DoS, I didn't want to use my computer anyway.

pervognsen, (edited ) to random
@pervognsen@mastodon.social avatar

Was randomly reminded of the most baffling prescription in Uncle Bob's Clean Code book, as summarized here: "He says that an ideal function has zero arguments (but still no side effects?), and that a function with just three arguments is confusing and difficult to test." His actual before-and-after code examples involve regressing to a pre-structured form of programming where even local dataflow is implicit through member variables (i.e. shared state) rather than arguments and return values.

pervognsen, to random
@pervognsen@mastodon.social avatar

I've written low-overhead tracers and this reposted reddit comment still managed to give me a couple of new ideas. Recommended reading if you're a systems programmer: https://ochagavia.nl/blog/low-latency-logging-in-rust/

pervognsen, to random
@pervognsen@mastodon.social avatar

This came up in another thread today but I figure I'd throw a brief comment to the timeline. The concept of "grace periods" where you separate the logical and physical deletion of resources is something you see in RCU, EBR, QSBR, etc, but it's just as useful in single-threaded code where you only have pseudo-concurrency through subroutine calls. Like the age-old pattern of deferring deletion until the end of the game loop, or autorelease pools in Objective C which get drained by the run loop.

pervognsen, to random
@pervognsen@mastodon.social avatar

Started to sketch out the parsing example with the left-fold postorder consumer design. I haven't had the time to write tests yet, but hey it passes the type checker. Figured I'd post up the WIP and I'll finish it off tomorrow.

https://gist.github.com/pervognsen/cf7f77a66e614fc4b32858e78918fdf5

pervognsen, to random
@pervognsen@mastodon.social avatar

New video from Laura: the billion dollar decision that launched XNA, https://www.youtube.com/watch?v=wJY8RhPHmUQ

pervognsen, to random
@pervognsen@mastodon.social avatar

It's pretty depressing to what happens here on AdvSIMD/NEON. There's neither movmsk or a scalar popcnt. When I first got my M1 and cared enough to work through the math for the different alternatives on M1, I think I came to the conclusion (ping @dougallj) that byte-indexed popcount LUTs was optimal for synthesizing wider popcounts, if you could justify the LUT size in your use case.

https://rust.godbolt.org/z/Eqceq61dn

pervognsen, to random
@pervognsen@mastodon.social avatar

I still find it weird when people complain that immediate-mode UI is a toy model that doesn't scale to advanced use cases. This is also true for a lot of game editors (immediate mode or not) but I'm looking at the Rerun demos and the polish and sophistication are about 100x higher along any dimension I care about than the most advanced UIs I've seen people build in the modern crop of reactive UI libraries.

pervognsen, (edited ) to random
@pervognsen@mastodon.social avatar

Another thing I noticed in the SDL 3 preview (which was apparently already added in vestigial form later in SDL 2) is that they have support now for a callback-based life cycle so you can integrate properly with platforms which expect that without having to resort to inversion-of-control hacks like fibers for interop. While I always liked the "app controls the main loop" aspect of the classic SDL event pumping model, that ship has sailed if you care about cross-platform support.

pervognsen, to random
@pervognsen@mastodon.social avatar

"The algorithm uses exactly the same terminology and is presented in bottom-up form. (If you prefer top-down design, please read the rest of this section backwards.)"

pervognsen, to random
@pervognsen@mastodon.social avatar

It's common to use fixed magic numbers for things like allocation headers/footers to detect write-clobbering corruption but you can actually do one better with incrementally updatable checksums at a very manageable cost. An intentional field write removes the checksum contribution corresponding to its old value before writing the new value and then it adds in the checksum contribution corresponding to the new value. There's an obvious concurrency issue here, so reason and apply accordingly.

pervognsen, to random
@pervognsen@mastodon.social avatar

The JetBrains pricing model is neat and I'm surprised I haven't seen it replicated more. The initial purchase gets you updates for a year. If you want updates for the second year you only pay 80% of the initial price. For the third and subsequent years, it's only 60%. Compare this to WholeTomato after they got bought by Embarcadero where VAX updates have gotten more and more expensive over time.

pervognsen, to random
@pervognsen@mastodon.social avatar

I've never fully worked out how best to articulate my dissatisfaction with the usual way people talk about pluggable allocators in systems programming. Sure, I'd like to have some standard for fallible, pluggable allocation at the lower level of a language's standard library. But the entire mindset of plugging together allocators and data structures is something I find dubious and at best it feels like a poor compromise.

pervognsen, (edited ) to random
@pervognsen@mastodon.social avatar

How much RAM do you have in your dev workstation/laptop?

pervognsen, (edited ) to random
@pervognsen@mastodon.social avatar

@rygorous I vaguely remember you did a bunch of implementation experiments for histogramming. Do you remember how many parallel histograms you needed to avoid machine clears from memory disambiguation misspeculation? There's no loop-carried dependency for the load/store addresses so I guess you only need a couple (2?) to make sure back to back increments to the same slot won't cause clears. I know there's also a training effect where it will turn off memory disambiguation after those clears.

pervognsen, to random
@pervognsen@mastodon.social avatar

Days like today I look at my food log and realize being a fruitarian wouldn't be half bad.

pervognsen, to random
@pervognsen@mastodon.social avatar

I think I've heard people use at least four significantly different pronunciations of "Dyck" when talking about "Dyck languages" which might be a new record.

pervognsen, to random
@pervognsen@mastodon.social avatar

I'm thinking of getting back to streaming casually (probably on YouTube, not Twitch), in the vein of the old intermittent streams I did pre-Bitwise where each topic or mini-project would be one or a couple of videos until we reach a natural stopping point.

I'm curious what topics other people might be interested that I cover. I have my own ideas of course.

pervognsen, (edited ) to random
@pervognsen@mastodon.social avatar

Suppose you have a linked list with n nodes represented as indices i in 0..n and successors as next[i]. The indices are drawn from that filled range, but there's no guaranteed order to them. You should be able to break the "latency barrier" for linked list algorithms with similar tricks to pointer jumping in parallel list ranking. For example, pick 1 + 7 random indices and traverse the list forward (with instruction-level parallelism) from them until the segments join up.

pervognsen, to random
@pervognsen@mastodon.social avatar

If you're writing a C library please don't require zero-terminated strings as inputs. It isn't even a good way of interfacing with other C code (e.g. I want to pass a substring of a zero-terminated string).

pervognsen, (edited ) to random
@pervognsen@mastodon.social avatar

It looks a bit funny but Rc<Arc<T>> seems like a reasonable choice in a lot of cases. Specifically, you have locally shared ownership of a remotely shared resource instead of directly sharing ownership of the remote resource (which comes with contention issues). Most of the time you probably wouldn't literally have Rc<Arc<T>> but Rc<LocalStruct> where LocalStruct (transitively) has an Arc<T>. But same thing really.

pervognsen, (edited ) to random
@pervognsen@mastodon.social avatar

You can't draw many conclusions from benchmarking a general purpose allocator versus a specialized allocator in a microbenchmark situation where you start with a fresh heap, especially if the allocations and deallocations are cleanly separated into disjoint phases. Most general purpose allocators will have a mostly-linear allocation pattern (often segregated by size class) under those conditions so you're likely only going to see smaller differences on the fast paths of the GPA.

pervognsen, to random
@pervognsen@mastodon.social avatar

There's a retired couple living in my mom's apartment complex who seems to spend 12 hours every day sunbathing outside during the summer months. They did this when we were visiting last year and they're continuing the streak apparently. After you've lived in a warm climate for a while, the whole idea of sunbathing starts to seem obscene, but this is something else.

pervognsen, (edited ) to random
@pervognsen@mastodon.social avatar

I assume this isn't a problem for EEs but for CS types who are taught logic gates, etc, in their curriculum I wonder if timing should be included in a first course. I'm still trying to help the person I mentioned earlier in a private chat and it sounds like that's the source of almost all their confusion. They think logic gates are instant and one of the "counterexamples" they came up for why delays seem logically inconsistent is y = xor(x, not(x)). Which is a standard edge detector.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • megavids
  • kavyap
  • DreamBathrooms
  • cisconetworking
  • magazineikmin
  • InstantRegret
  • Durango
  • thenastyranch
  • Youngstown
  • rosin
  • slotface
  • mdbf
  • khanakhh
  • ethstaker
  • JUstTest
  • everett
  • GTA5RPClips
  • Leos
  • cubers
  • ngwrru68w68
  • osvaldo12
  • tester
  • tacticalgear
  • modclub
  • anitta
  • normalnudes
  • provamag3
  • lostlight
  • All magazines