niconiconi, In the process of debugging a NUMA first-touch problem, I accidentally found my simulation becomes significantly faster when it's running on garbage data without memset() - even on non-NUMA systems... What?! Does the kernel provide a fast-path for uninitialized memory that I've never heard of?
"a read from a never-written anonymous page will get every page copy-on-write mapped to the same physical page of zeros, so you can get TLB misses but L1d cache hits when reading it."
Yes... #hpc
Add comment