I wish 'git push' had a less hardcore mode for -n.
It connects to the server using the push protocol, and stops short of actually uploading anything. So it uses the push URI instead of the pull one, if they're different.
This means it checks that the push URI actually works. Great!
But if the push URI needs awkward authentication (e.g. SSH key on a token) then I'd also like a mode where it uses the pull URI, so I can easily check 'is this command line asking for what I meant to ask for?'
@simontatham Here's a possible alternative. At work we have a repo created with 'git clone --mirror' from upstream. When I clone that repo for development, 'git push -n' from my worktree is a local operation, and I do 'git push upstream my-branch:remote-branch' when everything looks ok (after setting up the 'upstream' remote).
(if the upstream updated in the meantime, I have to 'git fetch' in the mirror repo, and 'git pull --rebase' in my tree)
There's a shared pool of on-chip memory on the shader core that can be dynamically split up to serve as register file, tile cache, shared memory, general buffer L1, or stack. Since it's dynamic even within the lifetime of a thread/wave, the registers can be allocated dynamically as the program needs them rather than needing registers to be statically-allocated up-front.
@aras@pervognsen@mjp for OoO CPU you mainly look at critical path length and throughput limits without regards to specific instruction schedule, which is way easier compared to in-order. It's not like you can add up instruction latencies for a pipelined in-order core to get a good estimate. @rygorous tweeted about this too iirc.
(I don't suppose there's an implied ";)" after your "?")
Does anybody know what the current usable state-of-the-art method of converting an unsigned integer multiply by constant is? I'd imagine it's not still a variant Bernstein.
I was a daily Mac user around 2008 for several years (back when no-one else except maybe Thinkpad had figured out how to make laptops but I can't stand trackpoints and their touchpad sucked) and I've been back for the last few weeks while my gaming laptop is getting repaired. The absolute hardest part of switching back and forth is getting used to Cmd and how shortcuts are mapped. It reminds me of when I used to live in Emacs and had to use any non-Emacs text editor.
My insufferable brain keeps repeating to me (for the past week) that if 'x' has popcount 'p' then bit_gather(~0,x) is the integer with the lowest 'p' bits set.
@harold@mbr ahhh, now I realize your blog could be the place I've seen that initially!
do you know if anyone looked into optimal expansion of clmul by known constant via common bitwise ops (like expansion of integer multiplication by a constant on CPUs lacking a multiplier)?
Hmm... I'm tempted to make my emulator code clang-only and use the clang-vector extension SIMD stuff (might help in some exotic but expensive cases like counter/timer chips, sprite units, or pixel decoding)
@floooh I might be missing some context, but I'm not sure why you're jumping straight to Clang-only ext_vector_type instead of vector_size that is supported in both GCC and Clang? Are you anticipating using any features that would be available with ext_vector_type but not vector_size?
So! Did you notice old conversations/favorites/bookmarks being "broken" here after a while, if they involve people from other instances? Turns out there was a Mastodon setting that, kinda, "breaks them by design", and it was set to "plz break conversations after 14 days" on this instance. Now that's fixed and 🤞 it should be better from now on. I might have accidentally set that setting myself without realizing the implications, sorry!
@aras the design is bizarre: the references in local toots are modified instead of just dropping the foreign toots from local cache and re-fetching them if needed again. That would be a conventional cache.
(remember when I timidly attempted to ask you about this, before irritated people started throwing around heated epithets?)
you ever look at a university hostname in a url and think “I don’t recognize it but I am 100% confident that is the name of an especially obscure elf from the Silmarillion as picked by the IT intern in 1992”
@mbr@pervognsen@rygorous@jfbastien@regehr@steve@pkhuong can you elaborate why not? because it wouldn't do anything useful for floats on 64-bit architectures?
(points at x86 long double: this bad boy can fit the entire PC in its NaN payload)
@pervognsen@zeux My experience is further improved by (ymmv depending on your requirements)
-DLLVM_LINK_LLVM_DYLIB=ON
this links libLLVM.so just once, instead of linking hundreds of unit tests against big static libraries
-DCMAKE_CXX_FLAGS_DEBUG='-Og -g1'
I'd prefer to build everything with optimization and lightweight debug info (lineinfo only), then rebuild the particular .o file with full debug if/when I need that.
From my time-based measurements (not recently), it seems like both Linux and Windows always force processes to synchronously unmap their pages before process exit. This doesn't seem strictly necessary--you could schedule the pages for unmapping and apply backpressure to mapping to balance the backlog build-up. The best rationale I've come up with is having the same process who mapped the pages "pay off" the deferred cost of unmapping those pages is better from a system dynamics perspective.
Well, if you're okay with using some inline asm, there's apparently (?) no problem with a 1 GB .bss heap that you can load/store via RIP-relative addressing. Since your RIP-relative addressing range is +/- 2 GB, 1 GB seems like a reasonable upper limit. https://c.godbolt.org/z/fec6fMvxe
@pervognsen I cannot decipher the context here? Why hide the array in a toplevel asm? Also note that your 'extern mem[]' declaration seems to miss attribute((visibility("hidden"))) and thus uses relaxable GOT-indirect access — not sure if that's intended.
@pervognsen Sorry, I don't see what that limit could be, or what you'd need section attributes for. I guess the source you fed to GCC was somehow different?
Don't suppose anyone knows a gdb trick to jump easily to the stack frame of a failing assertion, after gdb stops when your program receives SIGABRT?
The signal occurs several function calls deep in the assert() machinery. Some time in the last couple of years it changed from 4 frames deep to 7 frames deep. I don't want to have to remember whether to type "fr 4" or "fr 7" depending on libc version – I want to say "fr whichever has the assert in it"!
catch load libc.so
commands
silent
b __assert_fail
commands
up
end
c
end
explanation: assuming the assert macro invokes __assert_fail from libc.so, we add a catchpoint for the library load, make it add a breakpoint for __assert_fail, which in turn does the 'up' command when fires!
The 'c' (continue) in the end is for automatically resuming when the catchpoint fires.
@simontatham re: nesting of 'commands', I wouldn't say it's something to "know"; I didn't know that either when answering; I came up with that from general principles, tried it, and if didn't work due to a parser limitation, I'd work around with an auxiliary macro for the inner 'commands'.
with clang-14's new default, "yolo let's randomly use FMAs if we feel like it even though you didn't enable fast-math", you can get negative values. 👏 👏 👏
I still can't figure out the intended use-case for AMD's IBS (instruction-based sampling). You select a period N, and then for each N'th instruction you get info about that particular instruction (in which caches it missed, was a branch, was it mispredicted, ...). Which seems... completely unworkable for rare events? If I want to sample on mispredicted branches, and they account for 1% of all instructions, I'll have to discard 99% of IBS data, and my effective sampling period is 0.01 of nominal?
@pervognsen With back-end "Ops" IBS you have a choice of cycle-based or uop-based period, but the front-end "Fetch" IBS only does instruction-based sampling (but that's the side which sees the L1i and iTLB misses).
I had just written a toot on how I wish Rust automatically did niche filling via prefix sums/flattened discriminator ranges for nested enums since I've gotten some good results with doing that by hand in C code. Well, I felt confident I'd verified from codegen on previous occasions that it didn't do this, so color me surprised (and impressed): https://rust.godbolt.org/z/Ys5zsYGse
@pervognsen Do you know why it's ok to omit C::encode_a and C::encode_b from the asm output? The B::encode_a symbol is a plain global symbol, so I'd expect the C::encode_* symbols to be present as well (potentially as aliases of B::encode_a).
@pervognsen ah, I tried to uncheck the hiding options on Godbolt, but then you have to scroll all the way down past the unreadable debug info bits to find those .set directives — in GCC land usually there's nothing interesting after the debuginfo sections.
I probably already posted this but since I'm working on a paper I'll say again that one of the best paper-writing life hacks I've learned in the last long time is this makefile target that makes latexmk rerun every time you touch a dependency and then pushes the result out to the PDF viewer (which should be configured to re-render when the file changes):
@regehr For me, latexrun is a recent life-changing discovery: the way it presents latex warnings and errors in a nice readable way is unparalleled. It doesn't offer a "background monitor" mode like 'latexmk -pvc' on its own, but one can use inotifywatch for that purpose. https://github.com/aclements/latexrun/