@pervognsen@mastodon.social avatar

pervognsen

@pervognsen@mastodon.social

Performance, compilers, hardware, mathematics, computer science.

I've worked in or adjacent to the video game industry for most of my career.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

pervognsen, (edited ) to random
@pervognsen@mastodon.social avatar

Idle thought: if you cache the min/max key bounds (in a pair of arrays) along the descending path when you do an ordered search tree descent, you can binary search a correlated query key against the nested bounds in O(log(log(n))) time to use as a local root/LCA. In practice binary search isn't necessary for any realistic n, fast linear search is better if you can use SIMD. There are no chained data dependencies for the search, unlike restarting the tree descent from the top.

pervognsen, (edited )
@pervognsen@mastodon.social avatar

Mostly just found this amusing because it's rare to see log log complexity outside of vEB trees, x-fast/y-fast tries, etc, and there the nested logarithm is also because you're binary searching on the depth of a log-depth tree although in their case it's O(log(log(u)) where u is the size of the universe, not the number of keys in the tree.

pervognsen, to random
@pervognsen@mastodon.social avatar

Every once in a while there's a bug on YouTube where I get the comments for the wrong video and it's never not hilarious. Anyone else see this occasionally?

pervognsen,
@pervognsen@mastodon.social avatar

In this case I got the comments from a J Dilla track applied to a technical podcast. Sample comments:

"I listened to this on mushrooms, closed my eyes, and laid back on the couch."
"I recently found out that a vinyl can be made out of your ashes."
"His shit really have me ballin like a lil girl."

Must be one hell of a tech podcast to elicit those responses.

castano, to random
@castano@mastodon.gamedev.place avatar

Is there a way not to burn quithub's LFS quota on github actions?
I have a repo with GBs of LFS data, and it looks like the checkout action downloads the entire thing every time it's triggered.
This is not only is extremely wasteful, but also gets expensive very quickly!

pervognsen,
@pervognsen@mastodon.social avatar

@wolfpld @castano I wonder if someone has done a cost comparison of GitHub vs Gitea on a DigitalOcean node or some other cheap host.

pervognsen,
@pervognsen@mastodon.social avatar

@dotstdy @wolfpld @castano Oh, that works now? I could have sworn last time I looked into it, it was almost literally a "We have GitHub Actions at home" meme.

pervognsen, (edited )
@pervognsen@mastodon.social avatar

@wolfpld @castano It looks like Hetzner is a better deal on dedicated vCPU slices. CCX13 is $15 for 2 vCPUs, 8 GB RAM, 80 GB disk, 20 TB traffic. BuyVM seems like a better deal if you want a lot of storage for cheap and "unlimited" bandwidth (personally not a fan of unlimited anything, when companies do that it usually means they're going to start throttling aggressively or fire you as a customer). Do you have good experiences with BuyVM?

pkhuong, to random

Work is both performance and liability^Wcorrectness oriented, and I noticed a common pattern is that we'll generate commands with a fully deterministic program (i.e., a function), reify the command stream, and act on the commands.The IO monad is real!

pervognsen,
@pervognsen@mastodon.social avatar

@pkhuong Game developers love command buffers as an API abstraction and for good reason. It's also great for testing of stateful systems so the tests don't have to make up their own reification of API commands.

pervognsen, (edited )
@pervognsen@mastodon.social avatar

@pkhuong I guess those wacky web people are used to this with gRPC-based web APIs and all that stuff? It's nice to have an explicit notion of "batch" or "transaction" though so you're not just reifying individual commands but a sequence. Or to take it a step further: an explicit dependency graph rather than a linear sequence. I think GPU APIs are a little stuck in the linear command list mentality owing to the original graphics-centric semantics where draw order was king.

pervognsen,
@pervognsen@mastodon.social avatar

@pkhuong I mean, draw order is still meaningful on render queues in Vulkan but it doesn't make as much sense on compute queues. A lot of those other commands should probably just be explicit dependency based rather than trying to linearize the issue order with queues and then synchronize completion with semaphores and fences. That's my totally half-baked take, though.

pervognsen, (edited )
@pervognsen@mastodon.social avatar

@pkhuong Yeah, at least you can control it. I'd still like to see what a Vulkan-level API would look like with explicit dependencies, though.

pervognsen,
@pervognsen@mastodon.social avatar

@pkhuong When I've looked at the release notes for different versions of io_uring I can see he's trying to put in some stuff for dependencies and I wonder if and when there's going to be something more general. Last I checked (which isn't that recently) your only option was to make one command dependent on the previous command with a flag bit. But that doesn't seem sufficient for e.g. A->B, A->C where you don't want C to have a false dependency on B just to depend on A.

pervognsen,
@pervognsen@mastodon.social avatar

@pkhuong It just seems like the story for every kind of "decoupled" command list API that you eventually want or need some more general way of expressing graph dependencies.

pervognsen,
@pervognsen@mastodon.social avatar

@pkhuong I guess this is a lot of what this new thing is about and a good motivation for me to actually read through it properly: https://devblogs.microsoft.com/directx/d3d12-work-graphs/

pervognsen,
@pervognsen@mastodon.social avatar

@zwarich @pkhuong It's funny to consider all the things people use on GPUs for things that could broadly be classed as "monadic actions". Most of them are not really monadic (although I think CUDA has some newer features where tasks can spawn tasks dynamically) in the traditional sense but they're essentially round-based: round 1 can emit commands to a command buffer which are executed in round 2, etc. You can do this with draw commands via indirect draw buffers.

pervognsen,
@pervognsen@mastodon.social avatar

@zwarich @pkhuong On consoles you could also just issue unified memory writes directly from shaders and patch jump targets in the middle of command buffers that were already queued for execution, etc.

pervognsen,
@pervognsen@mastodon.social avatar

@pkhuong Do you mean because the dependency graph does not expose pipelinining? Or it's the only way to achieve pipelining? You definitely want the graph API to support pipelining. If you use index-based back references to earlier commands it's no problem to execute in a pipelined manner, i.e. you require a pre-topologically-sorted graph so you can issue commands as they arrive if their dependencies are already satisfied.

pervognsen, (edited )
@pervognsen@mastodon.social avatar

@pkhuong Or was that problem with Azure just that their scheduler wasn't properly designed/implemented? On top of the topological sorting requirement, I think it would also be reasonable to only offer a specific sliding issue window so that the API user is responsible for some basic level of static scheduling so that you don't gunk up your dynamic scheduler's scoreboard with an unbounded pile of crap that won't be scheduled for an eternity.

pervognsen,
@pervognsen@mastodon.social avatar

@pkhuong I.e. it's not the scheduler's job to notice that after 199K serially dependent tasks at the start of the command list, you have 1K independent commands at the end of the list that could start early. If you want those to start early, the user must make sure they're sorted near the start of the list. Basically like an out-of-order CPU core.

pervognsen,
@pervognsen@mastodon.social avatar

@pkhuong Right, I guess that gets into a conversation we've had in the past about why schedulers need to be concerned with the right notion of forward progress, which is not really about individual tasks but something closer to user-level requests. And for dynamic scheduling where you can't do omniscient critical path scheduling of tasks even if you wanted to, focusing on a higher-level notion of requests/job progress can also help the user guide the scheduler towards the critical path, I guess?

pervognsen, (edited ) to random
@pervognsen@mastodon.social avatar

For people who've been around much longer, has there been any retrospectives on Rust's decision to allow panics to unwind rather than abort? I've mostly come to terms with it in a practical sense but it's something that really "infects" the language and library ecosystem at a deep level, e.g. fn(&mut T) isn't "the same" as fn(T) -> T and it's especially troublesome if you're writing unsafe library code and dynamically calling code through closures or traits that could potentially panic.

pervognsen,
@pervognsen@mastodon.social avatar

@glaebhoerl @foonathan @pkhuong Considering that most large scale, high reliability server users anecdotally seem to use panic=abort, I wonder if people like kornel is overestimating the extent to which you can build reliability based on catch_unwind/thread-level panic isolation. It's a much better situation than in C++ but I'm still suspicious of this approach when it comes to functional correctness "in the large". I feel much better relying on unwinding in cases where the stakes are low.

pervognsen,
@pervognsen@mastodon.social avatar

@glaebhoerl @foonathan @pkhuong E.g. should_panic tests, Salsa's use of unwinding for catching query errors in a context where you don't have "internal" side effects, client-side/UI event loop error recovery, etc. Given that we're stuck with it, I'm happy to rely on unwinding in those cases, but that's about it for me personally.

pervognsen,
@pervognsen@mastodon.social avatar

@dotstdy @glaebhoerl @foonathan @pkhuong I was thinking the other day about subprocess/sandbox isolation in the context of fallible allocations, too. There's at least two kinds of fallibility: the kind required to compose allocators (low-level), which doesn't infect the whole system; and the kind required to recover from allocation failure "anywhere", which does infect most of the system. The latter is really the kind of thing where you want something closer to sandboxing, IMO.

pervognsen,
@pervognsen@mastodon.social avatar

@dotstdy @glaebhoerl @foonathan @pkhuong I realize that sandboxing (I'm intentionally using the term very loosely) isn't always feasible and so approximating the properties of sandboxing with unwind recovery, best-effort side effect isolation, etc, can be a valid and necessary alternative. But it does feel like it really wants to be a form of sandboxing. And if fault isolation and functional correctness were both of the highest importance for me, I'd probably want actual sandboxing.

aras, to Playdate
@aras@mastodon.gamedev.place avatar

Because no one stopped me, I ported "Everybody Wants to Crank the World" #playdate demo to PC (Windows/Mac). https://github.com/aras-p/demo-pd-cranktheworld/pull/1 :playdate: :demoscene:

Using Sokol libraries by @floooh to do most of heavy lifting.

Fun fact: while the demo is running, it takes up as much CPU time as the windows task manager on my PC.

pervognsen, (edited )
@pervognsen@mastodon.social avatar

@zeux @floooh @aras I'm using Vivaldi (Chromium based) on Linux and also don't get any audio after clicking the screen. It shows the speaker icon on the tab as if it were playing sound, so not sure what's going on.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • megavids
  • kavyap
  • DreamBathrooms
  • cisconetworking
  • magazineikmin
  • InstantRegret
  • Durango
  • thenastyranch
  • Youngstown
  • rosin
  • slotface
  • mdbf
  • khanakhh
  • ethstaker
  • JUstTest
  • everett
  • GTA5RPClips
  • Leos
  • cubers
  • ngwrru68w68
  • osvaldo12
  • tester
  • tacticalgear
  • modclub
  • anitta
  • normalnudes
  • provamag3
  • lostlight
  • All magazines