ACM, We are sad to hear of the passing of Gordon Bell, a pioneer in high-performance and parallel computing and the visionary behind the ACM Gordon Bell Prize. His dedication to innovation inspired countless breakthroughs. Our deepest condolences to his loved ones.
HPC_Guru, Robert Dennard, the inventor of DRAM, and famous for Dennard scaling, died on April 23 at the age of 91
niconiconi, In the process of debugging a NUMA first-touch problem, I accidentally found my simulation becomes significantly faster when it's running on garbage data without memset() - even on non-NUMA systems... What?! Does the kernel provide a fast-path for uninitialized memory that I've never heard of?
"a read from a never-written anonymous page will get every page copy-on-write mapped to the same physical page of zeros, so you can get TLB misses but L1d cache hits when reading it."
Yes... #hpc
BenjaminHCCarr, #AlmaLinux Forms An #SIG to advance interests around high performance computing (#HPC) and artificial intelligence (#AI) for this #RHEL-derived #operatingsystem.
The leader of this new AlmaLinux SIG is #HaydenBarnes as the #OpenSource Community Manager for AI at #HPE.
https://www.phoronix.com/news/AlmaLinux-HPC-AI-SIG
civodul, French Les vidéos du forum ORAP sur #HPC & #reproductibilité de mars sont en ligne 👇
http://orap.irisa.fr/52ieme-forum-reproductibilite/
hpcnotes,
boegel, Miquel Pericàs (Chalmers University of Technology) did a great job during his keynote presentation at the 9th EasyBuild User Meeting in Sweden.
RISC-V is coming, and the European #HPC community is working hard to prepare for it via projects like EUPILOT & co.
linuxaustralia, New to the Linux Australia #jobs board: the lovely folks at National Computational Infrastructure are looking for a #HPC #Linux administrator.
#NCI is the leading national provider of high-end computational and data-intensive services. It forms an integral part of the Australia Government’s #research #infrastructure strategy.
--
For more details, or to apply, please visit the listing on the ANU Jobs portal at:
niconiconi, Preliminary results of the CPU memory bandwidth micro-benchmark: If you're jumping in memory randomly, try doing number-crunching on at least 2-4 KiB of contiguous segments of data before jumping again to amortize the latency penalty down to an acceptable level (throughput is ~70%-80% compared to a sequential pattern). #hpc
azonenberg, @niconiconi @ignaloidas Going back to my claim from years ago:
DRAM is a block device. Change my mind.
whitequark, @azonenberg @niconiconi @ignaloidas well it's definitely not a character device so what else could it be? 😇
janekdererste, Love to #visualize stuff about our model. Here we have expected computational load for nodes in our simulation network.
I am actually doing something else, but wanted to see, how this is distributed in our model.
janekdererste, @asltf @thijs_lucas Das Simulationsprogramm heißt MATSim und kann im Prinzip alle möglichen Verkehrsträger berechnen. Im gezeigten Bild sind Auto, Fracht, Fahrrad und zu Fuß enthalten.
Normalerweise haben wir auch noch öffentlichen Verkehr mit drin, hier aber nicht, weil es eher um den Informatikaspekt und weniger um die Simulationsstudie geht.
Wir können dann auch noch Dinge wie Demand Responsive Transport (Taxi, Uber) abbilden.
Das Ursprungsmodell gibts hier:
https://github.com/matsim-scenarios/matsim-berlin
prefec2, German @janekdererste @asltf @thijs_lucas wie cool ist das denn? Ein Verkehrssimulator in Open-Source. 😁
fclc,
ProjectPhysX, @fclc FP4 arithmetic on Blackwell... 🖖🤣
Here is all possible values of the glorious FP4 format:0111 +Inf
0110 +NaN
0101 +NaN
0100 +NaN
0011 +2.0
0010 +1.0
0001 +0.5
0000 +0
1000 -0
1001 -0.5
1010 -1.0
1011 -2.0
1100 -NaN
1101 -NaN
1110 -NaN
1111 -Inf
rygorous, @steve @amonakov @ProjectPhysX @fclc @fay59 I will say that it was interesting talking to mobile GPU compiler devs in the early 2010s where GLSLs FP rules boiled down to "don't be evil" on a coffee-stained napkin as a former PC GPU shader compiler dev where the requirements for our FP environment were quite a bit more nailed down, specifically the now public https://microsoft.github.io/DirectX-Specs/d3d/archive/D3D11_3_FunctionalSpec.htm#3.1%20Floating%20Point%20Rules
HPC_Guru, Which much-hyped programming language had the least adoption?
Not a #HPC question. Any programming language.
HPC_Guru, 64K Kernel Page Size Performance Benefits For #HPC Refreshed
This round includes NVIDIA's GH200, along with the AMD & Intel CPUs
Linux 6.8 kernel performance with a 64K page size improved on average by about 15%
HPC_Guru, @feld My understanding is that to get a larger page size than 64K in Linux, one would have to enable huge pages
Huge pages can use larger memory blocks (e.g., 2MB or 1GB)
HPC_Guru, #ISC24 Registration Now Open
Take advantage of early-bird pricing by registering before March 27
HPC_Guru, Intel CEO Pat Gelsinger: I hope to build chips for Lisa Su and AMD
Goal is to be the foundry for the world and that includes competitors
Methylzero,
niconiconi, Memory Bandwidth Is All You Need #hpc
ignaloidas, @niconiconi If only...
I've been looking into ways to accelerate my SAT solving stuff, and there just isn't an easy hardware out to take...
rupdecat, Everything set up. Waiting for students.
First round of the #Snakemake for #HPC users course (in preparation).
#OpenScience and #Reproducibility do demand efforts! Teaching is important!
I am so excited. Like it's the first time in a classroom ... 😊
civodul, Attending the #HPC talk by my colleague Philippe Swartvagher, showcasing #Guix (and more!) as a foundation for #ReproducibleResearch workflows. Wo0t!
https://fosdem.org/2024/schedule/event/fosdem-2024-2651-making-reproducible-and-publishable-large-scale-hpc-experiments/
fclc,
niconiconi, ProTip: Do NOT set OMP_PROC_BIND and OMP_PLACES globally. At least for GOMP, it breaks multiprocessing in many non-OpenMP applications. I wrote them into /etc/environment then I started wondering why Gentoo's code compilation is only able to use 1 CPU core. :woozypad: #hpc
azonenberg, @niconiconi Reminds me that OMP_WAIT_POLICY has to be PASSIVE for ngscopeclient to work properly and I never figured out why. I think it has to do with multiple threads spawning OpenMP tasks at once and getting confused, idk.
I'm gradually moving the project away from OMP and towards application-managed threading or GPU processing anyway so that might end up being the fix.
civodul, 📺 Videos of the Nov. 2023 Workshop on Reproducible Software Environments for Research and High-Performance Computing are on-line!
https://hpc.guix.info/events/2023/workshop/program/Videos include short interviews with the speakers. Tutorial material is also available from that page.
Many thanks to the speakers and to the video team at Institut Agro!
HPC_Guru, Today, Intel celebrated the opening of Fab 9 in Rio Rancho, New Mexico
The milestone is part of Intel's $3.5B investment to equip its New Mexico operations for the manufacturing of advanced semiconductor packaging technologies including Foveros
https://www.intel.com/content/www/us/en/newsroom/news/intel-opens-fab-9-new-mexico.html
niconiconi, Finyally understood how to generalize the diamond tiling algorithm from 1D+1T spacetime to 2D+1T spacetime for stencil code. This mysterious diagram now makes perfect sense. But to the researchers at Keldysh Institute of Applied Mathematics, this algorithm was already obsolete for 20 years. It was already in use in Russian HPC code of the late 1990s and was known as "ConeTur". In the mean time, they invented 3 generations of newer algorithms, which are even more incomprehensible. Keldysh is at least 10 years ahead of the rest of the world... #hpc
gorplop, @niconiconi oh i get it now, this is pretty neat! thanks :)
niconiconi, @gorplop A related field is polyhedral compilers, which can automatically do loop transformations in this manner, and is the mainstream of research today. GCC's Graphite optimizer is an example, and there are many HPC-specific ones. The idea is that they are general-purpose, given an unmodifed loop, they can apply extremely complicated patterns beyond human understanding. But the heavy focus on automatic code generation means that if your code doesn't match the pattern the compiler already know, it likely won't perform very well. On the other hand, this Keldysh research team's algorithms are designed by hand for specific algorithms and all have geometric interpretations, and are meant for for human use - which appears to be rarely studied or implemented today by anyone else as they're just too difficult to reason.
turniphead, Hello!
Having a second attempt at a tech focused account. Previous was at a server that was too small, and missed a lot of interesting stuff.
By day I'm a software developer, mainly working in Go on #hpc and #containers stuff.
Partial to old Sun / SGI stuff, and the Atari ST. Have a love-hate relationship with #Linux and #FOSS
My main for non-tech stuff (homebrewing, music, Cornwall) is @dctrud
hyc, @turniphead @dctrud I wonder how many Atari ST fans are still around
turniphead, @hyc well... I'm not far into my 4th decade, so hopefully plenty of us are still around :-)