An interesting concurrency operation that @tavianator mentioned is "acquire a lock and immediately release it". I've seen this in another context before.
I sort of wonder whether this is trying to be some other serialization operation, or what the simplest version of this is.
If you find yourself acquiring a lock just because the condition variable API requires you to have a lock, and not using it anywhere else, you should probably reconsider what you're doing.
When people advertise software as "written in X", I usually take that as a slightly negative signal (you don't have something more interesting than the language you used?). The main exception is Go, where it means there's a good chance they ship a simple static binary.
I liked @ptrschmdtnlsn's explanation for what's going on with Fibonacci vs. Galois LFSRs: Say you want to compute Fibonacci numbers (or some other linear recurrence). You have an infinite array of mutable cells, F[i], zero-initialized except F[0] = F[1] = 1
Apparently even asm(""); is treated as some sort of compiler barrier for clang, causing it to spill registers to memory. https://godbolt.org/z/G3c778WqE
Is there a good overview reference for concurrent memory reclamation -- potential issues (ABA, use-after-free), and approaches people use (GC, epochs, hazard pointers, RCU-style read locks, etc.) and tradeoffs between them?
In practice, is it reasonable to sometimes do an atomic exchange/store on 16 bits of a 32-bit value, and sometimes do CAS on the whole value? I assume it's not incorrect, but will the store buffer get mad at me and use some awful slow path? What about if the former case is rare?