An interesting concurrency operation that @tavianator mentioned is "acquire a lock and immediately release it". I've seen this in another context before.
I sort of wonder whether this is trying to be some other serialization operation, or what the simplest version of this is.
If you find yourself acquiring a lock just because the condition variable API requires you to have a lock, and not using it anywhere else, you should probably reconsider what you're doing.
When people advertise software as "written in X", I usually take that as a slightly negative signal (you don't have something more interesting than the language you used?). The main exception is Go, where it means there's a good chance they ship a simple static binary.
@shachaf Software or libraries? For libraries I do care. If they're written in C, they can be consumed from any language with relatively little effort. If they're written in X and I'm also using X, likewise.
I liked @ptrschmdtnlsn's explanation for what's going on with Fibonacci vs. Galois LFSRs: Say you want to compute Fibonacci numbers (or some other linear recurrence). You have an infinite array of mutable cells, F[i], zero-initialized except F[0] = F[1] = 1
Apparently even asm(""); is treated as some sort of compiler barrier for clang, causing it to spill registers to memory. https://godbolt.org/z/G3c778WqE
Is there a good overview reference for concurrent memory reclamation -- potential issues (ABA, use-after-free), and approaches people use (GC, epochs, hazard pointers, RCU-style read locks, etc.) and tradeoffs between them?
@shachaf yeah it's weird. There is power in making a language etc. that makes maximum sense to the hardware and then forcing everyone into that paradigm. There is also power in writing custom functionality for your specific need. Folks tout the efficiency of databases, but you will be hard pressed to find a game that uses one for data or assets (outside of mmos), and there's a real reason for that regarding efficiency.
In practice, is it reasonable to sometimes do an atomic exchange/store on 16 bits of a 32-bit value, and sometimes do CAS on the whole value? I assume it's not incorrect, but will the store buffer get mad at me and use some awful slow path? What about if the former case is rare?
@shachaf The trace of a rank 1 operator f = u v^T is Tr(u v^T) = v^T u. Since v^T is the "input part" and u is the "output part" of f this provides another way to think about how the trace connects the output of an operator back to its own input. Here I'm just using v^T as "abstract matrix notation" for a linear functional, so no inner product or basis is assumed.
@shachaf You can define F(n) as the number of tilings of a 1xn board (n-tilings) with 1x1 tiles (1-tiles) and 1x2 tiles (2-tiles). Then an (m+n)-tiling comes in two cases based on whether or not a 2-tile bridges the m part and the n part. If there's a 2-tile bridge then there's a m-1 remainder in the m part and a n-1 remainder in the n part, hence the F(m-1) F(n-1) term. Else it's a clean split and you get the F(m) F(n) term.