shachaf, EN In practice, is it reasonable to sometimes do an atomic exchange/store on 16 bits of a 32-bit value, and sometimes do CAS on the whole value? I assume it's not incorrect, but will the store buffer get mad at me and use some awful slow path? What about if the former case is rare?
Add comment