In the process of debugging a NUMA first-touch problem, I accidentally found my simulation becomes significantly faster when it's running on garbage data without memset() - even on non-NUMA systems... What?! Does the kernel provide a fast-path for uninitialized memory that I've never heard of?
"a read from a never-written anonymous page will get every page copy-on-write mapped to the same physical page of zeros, so you can get TLB misses but L1d cache hits when reading it."
ICYMI: We're excited to announce the AlmaLinux community's newest Special Interest Group: The High Performance Computing and Artificial Intelligence SIG! 🎉
CERN trusts AlmaLinux as the base to offer access to non-CERN sites as well as for virtual machine and container images that will be distributed outside of CERN.
New to the Linux Australia #jobs board: the lovely folks at National Computational Infrastructure are looking for a #HPC#Linux administrator.
#NCI is the leading national provider of high-end computational and data-intensive services. It forms an integral part of the Australia Government’s #research#infrastructure strategy.
--
For more details, or to apply, please visit the listing on the ANU Jobs portal at:
#TIL C++26 is planning to add the whole BLAS into the standard library, and will natively support std::mdspan (the official multi-dimensional array since C++23). C++ surely is a programming language that people just throw everything imaginable into it. #hpc
Preliminary results of the CPU memory bandwidth micro-benchmark: If you're jumping in memory randomly, try doing number-crunching on at least 2-4 KiB of contiguous segments of data before jumping again to amortize the latency penalty down to an acceptable level (throughput is ~70%-80% compared to a sequential pattern). #hpc
It might just be that I'm more proficient analyzing and working around #GPU quirks (happens, when you do mostly #GPGPU for more than a decade) than #CPU, but there's so many weird things happening on this machine that I don't know where to start from.
Just to mention one: why is it that the performance per core when using #OpenMP drops by 40% when switching from 1 to 2 threads, but only when using OMP_PROC_BIND=close and not when using OMP_PROC_BIND=spread? If anything I'd expect the reverse.
And then adding more threads gives me almost perfect scaling, at least up to 16 threads, before dropping again … WTH is happening here? Honestly wouldn't mind some #fediHelp with suggestions on what to look at/for … #HPC#askFedi do your magic!
Your periodic reminder that Stellarium is an amazing planetarium tool. Free, #OpenSource and very easy to use. There are desktop apps, mobile apps and a web version.
@astro_jcm I wonder how much digital astronomy is reproducible nowadays? Bioinformatics benefits by implementing pipelines on #HPC with #Guix. #Stellarium and #INDI are available as reproducible Guix packages.