amonakov

@amonakov@mastodon.gamedev.place

This profile is from a federated server and may be incomplete. Browse more on the original instance.

simontatham, 6 months ago to random

I wish 'git push' had a less hardcore mode for -n.

It connects to the server using the push protocol, and stops short of actually uploading anything. So it uses the push URI instead of the pull one, if they're different.

This means it checks that the push URI actually works. Great!

But if the push URI needs awkward authentication (e.g. SSH key on a token) then I'd also like a mode where it uses the pull URI, so I can easily check 'is this command line asking for what I meant to ask for?'

reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ quixoticgeek

amonakov, 6 months ago

@simontatham Here's a possible alternative. At work we have a repo created with 'git clone --mirror' from upstream. When I clone that repo for development, 'git push -n' from my worktree is a local operation, and I do 'git push upstream my-branch:remote-branch' when everything looks ok (after setting up the 'upstream' remote).

(if the upstream updated in the meantime, I have to 'git fetch' in the mirror repo, and 'git pull --rebase' in my tree)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 6 months ago

@simontatham there's also 'git worktree add' if your git is not too old.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mjp, 6 months ago to random

My very brief summary of Apple's new "dynamic caching" as described in this video: https://developer.apple.com/videos/play/tech-talks/111375

There's a shared pool of on-chip memory on the shader core that can be dynamically split up to serve as register file, tile cache, shared memory, general buffer L1, or stack. Since it's dynamic even within the lifetime of a thread/wave, the registers can be allocated dynamically as the program needs them rather than needing registers to be statically-allocated up-front.

reply

expand (11)

collapse (11)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ demofox, bitinn, aras, pervognsen

amonakov, 6 months ago

@aras @pervognsen @mjp for OoO CPU you mainly look at critical path length and throughput limits without regards to specific instruction schedule, which is way easier compared to in-order. It's not like you can add up instruction latencies for a pipelined in-order core to get a good estimate. @rygorous tweeted about this too iirc.

(I don't suppose there's an implied ";)" after your "?")

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mbr, 6 months ago to random

Does anybody know what the current usable state-of-the-art method of converting an unsigned integer multiply by constant is? I'd imagine it's not still a variant Bernstein.

reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 6 months ago

@mbr @fatlimey ”expanding” or ”synthesizing” multiplication via shifts/adds might be clearer :)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 6 months ago (edited 6 months ago) to random

I was a daily Mac user around 2008 for several years (back when no-one else except maybe Thinkpad had figured out how to make laptops but I can't stand trackpoints and their touchpad sucked) and I've been back for the last few weeks while my gaming laptop is getting repaired. The absolute hardest part of switching back and forth is getting used to Cmd and how shortcuts are mapped. It reminds me of when I used to live in Emacs and had to use any non-Emacs text editor.

reply

expand (6)

collapse (6)

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 6 months ago

@pervognsen yeah, we spend so much time in the browser, it's more critical there than in text editors

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mbr, 6 months ago to random

My insufferable brain keeps repeating to me (for the past week) that if 'x' has popcount 'p' then bit_gather(~0,x) is the integer with the lowest 'p' bits set.

Maybe this post will make it stop.

(edit: PEXT on intel)

#bithack #popcount

reply

expand (9)

collapse (9)

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 6 months ago

@mbr @harold next exercise! what does clmul(x, x) do?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 6 months ago

@harold @mbr ahhh, now I realize your blog could be the place I've seen that initially!

do you know if anyone looked into optimal expansion of clmul by known constant via common bitwise ops (like expansion of integer multiplication by a constant on CPUs lacking a multiplier)?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

floooh, 6 months ago to random

Hmm... I'm tempted to make my emulator code clang-only and use the clang-vector extension SIMD stuff (might help in some exotic but expensive cases like counter/timer chips, sprite units, or pixel decoding)

Seems to work fine on x86_64, ARM and WASM:

https://www.godbolt.org/z/6cEPxedW4

reply

expand (11)

collapse (11)

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 6 months ago

@floooh I might be missing some context, but I'm not sure why you're jumping straight to Clang-only ext_vector_type instead of vector_size that is supported in both GCC and Clang? Are you anticipating using any features that would be available with ext_vector_type but not vector_size?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

aras, 6 months ago to random

So! Did you notice old conversations/favorites/bookmarks being "broken" here after a while, if they involve people from other instances? Turns out there was a Mastodon setting that, kinda, "breaks them by design", and it was set to "plz break conversations after 14 days" on this instance. Now that's fixed and 🤞 it should be better from now on. I might have accidentally set that setting myself without realizing the implications, sorry!

reply

expand (28)

collapse (28)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Doomed_Daniel, Metamere, oblomov, eniko +1 more

amonakov, 6 months ago

@aras the design is bizarre: the references in local toots are modified instead of just dropping the foreign toots from local cache and re-fetching them if needed again. That would be a conventional cache.

(remember when I timidly attempted to ask you about this, before irritated people started throwing around heated epithets?)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

0xabad1dea, 6 months ago to random

you ever look at a university hostname in a url and think “I don’t recognize it but I am 100% confident that is the name of an especially obscure elf from the Silmarillion as picked by the IT intern in 1992”

reply

expand (15)

collapse (15)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ layoutSubviews, bkeegan, bitprophet, jbaggs +2 more

amonakov, 6 months ago

@0xabad1dea 'twas the Fellowship of the Token Ring

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pkhuong, 6 months ago to random

RIP itanium… I remember when it was an even more annoying target than Alpha, yet worth the effort.

reply

expand (44)

collapse (44)

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 6 months ago

@mbr @pervognsen @rygorous @jfbastien @regehr @steve @pkhuong can you elaborate why not? because it wouldn't do anything useful for floats on 64-bit architectures?
(points at x86 long double: this bad boy can fit the entire PC in its NaN payload)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

zeux, 7 months ago to random

For a much better experience of building LLVM,

cmake -S llvm -B build -G Ninja -DLLVM_ENABLE_PROJECTS=clang -DCMAKE_BUILD_TYPE=Debug -DLLVM_TARGETS_TO_BUILD=host -DLLVM_PARALLEL_LINK_JOBS=4 -DLLVM_OPTIMIZED_TABLEGEN=ON

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 7 months ago (edited 7 months ago)

@pervognsen @zeux My experience is further improved by (ymmv depending on your requirements)

-DLLVM_LINK_LLVM_DYLIB=ON

this links libLLVM.so just once, instead of linking hundreds of unit tests against big static libraries

-DCMAKE_CXX_FLAGS_DEBUG='-Og -g1'

I'd prefer to build everything with optimization and lightweight debug info (lineinfo only), then rebuild the particular .o file with full debug if/when I need that.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 7 months ago (edited 7 months ago) to random

From my time-based measurements (not recently), it seems like both Linux and Windows always force processes to synchronously unmap their pages before process exit. This doesn't seem strictly necessary--you could schedule the pages for unmapping and apply backpressure to mapping to balance the backlog build-up. The best rationale I've come up with is having the same process who mapped the pages "pay off" the deferred cost of unmapping those pages is better from a system dynamics perspective.

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 7 months ago (edited 7 months ago)

@pervognsen "Speedrunning" exiting a process that has mapped many pages is a (if a bit silly) Linux puzzle I've pondered not long ago. Here's a template: https://gist.github.com/amonakov/fef4f673e53569f98d2801b3e195c502

On my laptop the template finishes in ~200ms, and my solution in ~180ms.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 7 months ago to random

Well, if you're okay with using some inline asm, there's apparently (?) no problem with a 1 GB .bss heap that you can load/store via RIP-relative addressing. Since your RIP-relative addressing range is +/- 2 GB, 1 GB seems like a reasonable upper limit. https://c.godbolt.org/z/fec6fMvxe

reply

expand (8)

collapse (8)

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 7 months ago

@pervognsen I cannot decipher the context here? Why hide the array in a toplevel asm? Also note that your 'extern mem[]' declaration seems to miss attribute((visibility("hidden"))) and thus uses relaxable GOT-indirect access — not sure if that's intended.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 7 months ago

@pervognsen Sorry, I don't see what that limit could be, or what you'd need section attributes for. I guess the source you fed to GCC was somehow different?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simontatham, 8 months ago to random

Don't suppose anyone knows a gdb trick to jump easily to the stack frame of a failing assertion, after gdb stops when your program receives SIGABRT?

The signal occurs several function calls deep in the assert() machinery. Some time in the last couple of years it changed from 4 frames deep to 7 frames deep. I don't want to have to remember whether to type "fr 4" or "fr 7" depending on libc version – I want to say "fr whichever has the assert in it"!

reply

expand (11)

collapse (11)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov, ColinTheMathmo, jernej__s

amonakov, 7 months ago

@trademark @simontatham @epixoip @barubary @ColinTheMathmo
you can add the following snippet in your ~/.gdbinit:

catch load libc.so
commands
silent
b __assert_fail
commands
up
end
c
end

explanation: assuming the assert macro invokes __assert_fail from libc.so, we add a catchpoint for the library load, make it add a breakpoint for __assert_fail, which in turn does the 'up' command when fires!

The 'c' (continue) in the end is for automatically resuming when the catchpoint fires.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 7 months ago

@simontatham re: nesting of 'commands', I wouldn't say it's something to "know"; I didn't know that either when answering; I came up with that from general principles, tried it, and if didn't work due to a parser limitation, I'd work around with an auxiliary macro for the inner 'commands'.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 7 months ago

@simontatham I've edited the snippet to use 'tcatch' to place a one-shot catchpoint instead of 'catch'

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mattpharr, 8 months ago to random

Given u in [0,1), what is the range of values returned by f?

float f(float u) {
return (u * 600000) - int(u * 600000);
}

with clang-14's new default, "yolo let's randomly use FMAs if we feel like it even though you didn't enable fast-math", you can get negative values. 👏 👏 👏

reply

expand (13)

collapse (13)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ slink, pervognsen

amonakov, 8 months ago

@rygorous @mbr @BartWronski @mattpharr And let's not forget the remaining part of the sane-math trinity: -fno-trapping-math

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 8 months ago to random

I still can't figure out the intended use-case for AMD's IBS (instruction-based sampling). You select a period N, and then for each N'th instruction you get info about that particular instruction (in which caches it missed, was a branch, was it mispredicted, ...). Which seems... completely unworkable for rare events? If I want to sample on mispredicted branches, and they account for 1% of all instructions, I'll have to discard 99% of IBS data, and my effective sampling period is 0.01 of nominal?

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 8 months ago

@pervognsen With back-end "Ops" IBS you have a choice of cycle-based or uop-based period, but the front-end "Fetch" IBS only does instruction-based sampling (but that's the side which sees the L1i and iTLB misses).

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 8 months ago (edited 8 months ago) to random

I had just written a toot on how I wish Rust automatically did niche filling via prefix sums/flattened discriminator ranges for nested enums since I've gotten some good results with doing that by hand in C code. Well, I felt confident I'd verified from codegen on previous occasions that it didn't do this, so color me surprised (and impressed): https://rust.godbolt.org/z/Ys5zsYGse

reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 8 months ago

@pervognsen Do you know why it's ok to omit C::encode_a and C::encode_b from the asm output? The B::encode_a symbol is a plain global symbol, so I'd expect the C::encode_* symbols to be present as well (potentially as aliases of B::encode_a).

But I'm quite ignorant of Rust, so...

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 8 months ago

@pervognsen ah, I tried to uncheck the hiding options on Godbolt, but then you have to scroll all the way down past the unreadable debug info bits to find those .set directives — in GCC land usually there's nothing interesting after the debuginfo sections.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

rygorous, 8 months ago to random

1883: Treasure Island
1994: Treasure Galaxy!
2002: Treasure Planet

look if I've learned one thing about sequel escalation it's that this cannot end well

reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 8 months ago

@pervognsen @rygorous Also, the 1988 Treasure Island is an absolute treasure and its Wikipedia page got links to a subtitled version on YT:
https://en.wikipedia.org/wiki/Treasure_Island_(1988_film)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

regehr, 8 months ago to random

I probably already posted this but since I'm working on a paper I'll say again that one of the best paper-writing life hacks I've learned in the last long time is this makefile target that makes latexmk rerun every time you touch a dependency and then pushes the result out to the PDF viewer (which should be configured to re-render when the file changes):

watch:
latexmk -pdf $(PAPER).tex -pvc

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 8 months ago

@regehr For me, latexrun is a recent life-changing discovery: the way it presents latex warnings and errors in a nice readable way is unparalleled. It doesn't offer a "background monitor" mode like 'latexmk -pvc' on its own, but one can use inotifywatch for that purpose.
https://github.com/aclements/latexrun/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ pervognsen