High-Performance Computing

ACM,
@ACM@mastodon.acm.org avatar

We are sad to hear of the passing of Gordon Bell, a pioneer in high-performance and parallel computing and the visionary behind the ACM Gordon Bell Prize. His dedication to innovation inspired countless breakthroughs. Our deepest condolences to his loved ones.

HPC_Guru,
@HPC_Guru@mastodon.social avatar

Robert Dennard, the inventor of DRAM, and famous for Dennard scaling, died on April 23 at the age of 91

https://lohud.com/obituaries/pnys0809210

niconiconi,

In the process of debugging a NUMA first-touch problem, I accidentally found my simulation becomes significantly faster when it's running on garbage data without memset() - even on non-NUMA systems... What?! Does the kernel provide a fast-path for uninitialized memory that I've never heard of?

"a read from a never-written anonymous page will get every page copy-on-write mapped to the same physical page of zeros, so you can get TLB misses but L1d cache hits when reading it."

Yes...

BenjaminHCCarr,
@BenjaminHCCarr@hachyderm.io avatar

#AlmaLinux Forms An #SIG to advance interests around high performance computing (#HPC) and artificial intelligence (#AI) for this #RHEL-derived #operatingsystem.
The leader of this new AlmaLinux SIG is #HaydenBarnes as the #OpenSource Community Manager for AI at #HPE.
https://www.phoronix.com/news/AlmaLinux-HPC-AI-SIG

civodul, French
@civodul@toot.aquilenet.fr avatar

Les vidéos du forum ORAP sur & de mars sont en ligne 👇
http://orap.irisa.fr/52ieme-forum-reproductibilite/

hpcnotes,
@hpcnotes@mast.hpc.social avatar

Experiment comparing reactions across social media platforms ...

One or more of or or will be obsolete by 2030.

Ignoring this message counts as agreeing :-)

fclc,
@fclc@mast.hpc.social avatar

@abarker @hpcnotes Depending on who you ask, C is already obsolete, and barely hanging on.

(CC: @thephd )

thephd,
@thephd@pony.social avatar

@fclc @abarker @hpcnotes Can't really say when C will be obsolete. I'm certainly preventing it as best as I can, but...

boegel,
@boegel@mast.hpc.social avatar

Miquel Pericàs (Chalmers University of Technology) did a great job during his keynote presentation at the 9th EasyBuild User Meeting in Sweden.

RISC-V is coming, and the European community is working hard to prepare for it via projects like EUPILOT & co.

https://easybuild.io/eum24/#program

linuxaustralia,
@linuxaustralia@fosstodon.org avatar

New to the Linux Australia board: the lovely folks at National Computational Infrastructure are looking for a administrator.


is the leading national provider of high-end computational and data-intensive services. It forms an integral part of the Australia Government’s strategy.

--
For more details, or to apply, please visit the listing on the ANU Jobs portal at:

🔗 http://jobs.anu.edu.au/cw/en/job/555156

niconiconi,

Preliminary results of the CPU memory bandwidth micro-benchmark: If you're jumping in memory randomly, try doing number-crunching on at least 2-4 KiB of contiguous segments of data before jumping again to amortize the latency penalty down to an acceptable level (throughput is ~70%-80% compared to a sequential pattern).

azonenberg,
@azonenberg@ioc.exchange avatar

@niconiconi @ignaloidas Going back to my claim from years ago:

DRAM is a block device. Change my mind.

whitequark,
@whitequark@mastodon.social avatar

@azonenberg @niconiconi @ignaloidas well it's definitely not a character device so what else could it be? 😇

janekdererste,
@janekdererste@det.social avatar

Love to stuff about our model. Here we have expected computational load for nodes in our simulation network.

I am actually doing something else, but wanted to see, how this is distributed in our model.

janekdererste,
@janekdererste@det.social avatar

@asltf @thijs_lucas Das Simulationsprogramm heißt MATSim und kann im Prinzip alle möglichen Verkehrsträger berechnen. Im gezeigten Bild sind Auto, Fracht, Fahrrad und zu Fuß enthalten.

Normalerweise haben wir auch noch öffentlichen Verkehr mit drin, hier aber nicht, weil es eher um den Informatikaspekt und weniger um die Simulationsstudie geht.

Wir können dann auch noch Dinge wie Demand Responsive Transport (Taxi, Uber) abbilden.

Das Ursprungsmodell gibts hier:
https://github.com/matsim-scenarios/matsim-berlin

prefec2,
@prefec2@norden.social avatar

@janekdererste @asltf @thijs_lucas wie cool ist das denn? Ein Verkehrssimulator in Open-Source. 😁

fclc,
@fclc@mast.hpc.social avatar

Ehhhh the newest big GPU has arrived!

And you can have two of them connected to Grace!

and for the GPU itself and the +2X GPU version

ProjectPhysX,
@ProjectPhysX@mast.hpc.social avatar

@fclc FP4 arithmetic on Blackwell... 🖖🤣
Here is all possible values of the glorious FP4 format:

0111 +Inf
0110 +NaN
0101 +NaN
0100 +NaN
0011 +2.0
0010 +1.0
0001 +0.5
0000 +0
1000 -0
1001 -0.5
1010 -1.0
1011 -2.0
1100 -NaN
1101 -NaN
1110 -NaN
1111 -Inf

rygorous,
@rygorous@mastodon.gamedev.place avatar

@steve @amonakov @ProjectPhysX @fclc @fay59 I will say that it was interesting talking to mobile GPU compiler devs in the early 2010s where GLSLs FP rules boiled down to "don't be evil" on a coffee-stained napkin as a former PC GPU shader compiler dev where the requirements for our FP environment were quite a bit more nailed down, specifically the now public https://microsoft.github.io/DirectX-Specs/d3d/archive/D3D11_3_FunctionalSpec.htm#3.1%20Floating%20Point%20Rules

HPC_Guru,
@HPC_Guru@mastodon.social avatar

Which much-hyped programming language had the least adoption?

Not a question. Any programming language.

hyc,
@hyc@mastodon.social avatar

@HPC_Guru Prolog?

HPC_Guru,
@HPC_Guru@mastodon.social avatar

64K Kernel Page Size Performance Benefits For Refreshed

This round includes NVIDIA's GH200, along with the AMD & Intel CPUs

Linux 6.8 kernel performance with a 64K page size improved on average by about 15%

https://www.phoronix.com/review/aarch64-64k-kernel-perf

HPC_Guru,
@HPC_Guru@mastodon.social avatar

@feld My understanding is that to get a larger page size than 64K in Linux, one would have to enable huge pages

Huge pages can use larger memory blocks (e.g., 2MB or 1GB)

https://blog.netdata.cloud/understanding-huge-pages/

HPC_Guru,
@HPC_Guru@mastodon.social avatar

Registration Now Open

Take advantage of early-bird pricing by registering before March 27

https://isc-hpc.com/registration-2024.html

HPC_Guru,
@HPC_Guru@mastodon.social avatar

Intel CEO Pat Gelsinger: I hope to build chips for Lisa Su and AMD

Goal is to be the foundry for the world and that includes competitors

https://tomshardware.com/pc-components/cpus/intel-ceo-pat-gelsinger-i-hope-to-build-chips-for-lisa-su-and-amd

Methylzero,


If you had to do a lot of linear least square solves, with potentially rank-deficient matrices, what would you use on a GPU? On CPUs, LAPACK's DGELSY does work, but most GPU libraries seem to not implement routines for rank-deficient matrices.

niconiconi,

Memory Bandwidth Is All You Need ​​

ignaloidas,
@ignaloidas@not.acu.lt avatar

@niconiconi If only...

I've been looking into ways to accelerate my SAT solving stuff, and there just isn't an easy hardware out to take...

rupdecat,
@rupdecat@fediscience.org avatar

Everything set up. Waiting for students.

First round of the for users course (in preparation).

and do demand efforts! Teaching is important!

I am so excited. Like it's the first time in a classroom ... 😊

civodul,
@civodul@toot.aquilenet.fr avatar

Attending the talk by my colleague Philippe Swartvagher, showcasing (and more!) as a foundation for workflows. Wo0t!
https://fosdem.org/2024/schedule/event/fosdem-2024-2651-making-reproducible-and-publishable-large-scale-hpc-experiments/

fclc,
@fclc@mast.hpc.social avatar

This is an @dougall appreciation post, his SVE instruction visualizer is great https://dougallj.github.io/asil/

niconiconi,

ProTip: Do NOT set OMP_PROC_BIND and OMP_PLACES globally. At least for GOMP, it breaks multiprocessing in many non-OpenMP applications. I wrote them into /etc/environment then I started wondering why Gentoo's code compilation is only able to use 1 CPU core. ​:woozypad:​

azonenberg,
@azonenberg@ioc.exchange avatar

@niconiconi Reminds me that OMP_WAIT_POLICY has to be PASSIVE for ngscopeclient to work properly and I never figured out why. I think it has to do with multiple threads spawning OpenMP tasks at once and getting confused, idk.

I'm gradually moving the project away from OMP and towards application-managed threading or GPU processing anyway so that might end up being the fix.

civodul,
@civodul@toot.aquilenet.fr avatar

📺 Videos of the Nov. 2023 Workshop on Reproducible Software Environments for Research and High-Performance Computing are on-line!
https://hpc.guix.info/events/2023/workshop/program/

Videos include short interviews with the speakers. Tutorial material is also available from that page.

Many thanks to the speakers and to the video team at Institut Agro!

HPC_Guru,
@HPC_Guru@mastodon.social avatar

Today, Intel celebrated the opening of Fab 9 in Rio Rancho, New Mexico

The milestone is part of Intel's $3.5B investment to equip its New Mexico operations for the manufacturing of advanced semiconductor packaging technologies including Foveros

https://www.intel.com/content/www/us/en/newsroom/news/intel-opens-fab-9-new-mexico.html

niconiconi,

Finyally understood how to generalize the diamond tiling algorithm from 1D+1T spacetime to 2D+1T spacetime for stencil code. This mysterious diagram now makes perfect sense. But to the researchers at Keldysh Institute of Applied Mathematics, this algorithm was already obsolete for 20 years. It was already in use in Russian HPC code of the late 1990s and was known as "ConeTur". In the mean time, they invented 3 generations of newer algorithms, which are even more incomprehensible. Keldysh is at least 10 years ahead of the rest of the world...

gorplop,
@gorplop@pleroma.m68k.church avatar

@niconiconi oh i get it now, this is pretty neat! thanks :)

niconiconi,

@gorplop A related field is polyhedral compilers, which can automatically do loop transformations in this manner, and is the mainstream of research today. GCC's Graphite optimizer is an example, and there are many HPC-specific ones. The idea is that they are general-purpose, given an unmodifed loop, they can apply extremely complicated patterns beyond human understanding. But the heavy focus on automatic code generation means that if your code doesn't match the pattern the compiler already know, it likely won't perform very well. On the other hand, this Keldysh research team's algorithms are designed by hand for specific algorithms and all have geometric interpretations, and are meant for for human use - which appears to be rarely studied or implemented today by anyone else as they're just too difficult to reason.

turniphead,

Hello!

Having a second attempt at a tech focused account. Previous was at a server that was too small, and missed a lot of interesting stuff.

By day I'm a software developer, mainly working in Go on and stuff.

Partial to old Sun / SGI stuff, and the Atari ST. Have a love-hate relationship with and

My main for non-tech stuff (homebrewing, music, Cornwall) is @dctrud

hyc,
@hyc@mastodon.social avatar

@turniphead @dctrud I wonder how many Atari ST fans are still around

turniphead,

@hyc well... I'm not far into my 4th decade, so hopefully plenty of us are still around :-)

  • All
  • Subscribed
  • Moderated
  • Favorites
  • hpc
  • DreamBathrooms
  • magazineikmin
  • cubers
  • everett
  • rosin
  • Youngstown
  • ngwrru68w68
  • slotface
  • osvaldo12
  • Durango
  • kavyap
  • InstantRegret
  • tacticalgear
  • khanakhh
  • megavids
  • GTA5RPClips
  • normalnudes
  • thenastyranch
  • mdbf
  • ethstaker
  • modclub
  • Leos
  • tester
  • provamag3
  • cisconetworking
  • anitta
  • JUstTest
  • lostlight
  • All magazines