@ProjectPhysX@mast.hpc.social
@ProjectPhysX@mast.hpc.social avatar

ProjectPhysX

@ProjectPhysX@mast.hpc.social

Summa cum laude Physics #PhD 🖖🧐🎓 | Graduate at EliteNet Bavaria 🧬 & DLR 🛰 | Developer of #FluidX3D #CFD 🌊 | Khronos #OpenCL Advisor 💻 | #GPU Wizard at #Intel 🟦

https://github.com/ProjectPhysX/FluidX3D

This profile is from a federated server and may be incomplete. Browse more on the original instance.

ProjectPhysX, to random
@ProjectPhysX@mast.hpc.social avatar
ProjectPhysX,
@ProjectPhysX@mast.hpc.social avatar

@tagesschau @DLR @dlr_next @stim3on obligatory evidence photo that I really took these pictures^^ 🖖🧐

ProjectPhysX,
@ProjectPhysX@mast.hpc.social avatar

@tagesschau @DLR @dlr_next @stim3on 2 hours later, May 11 00:39-00:54, round 2. The coveres the entire sky over Bavaria, Germany. It's even visible to the naked eye when looking south! This is insanity!! 🖖😳
First image even looks like the camera Sensor of my phone got hit with radiation particles.

10s exposure at 00:45, streamers dancing rapidly
10s exposure at 00:48
10s exposure at 00:54 looking east, the aurora covered the entire night sky over Bavaria

ProjectPhysX, to random
@ProjectPhysX@mast.hpc.social avatar

I found an interesting optimization for the marching-cubes algorithm today: Since vertex interpolation happens on axis-aligned edges of the unit cube, it's sufficient to interpolate in 1D instead of 3D. The faster interpolation makes the conditions for which edge to interpolate unnecessary, allowing to get rid of the edge table. That brings the implementation down to 73 lines, including the triangle table. 🖖🤠
https://github.com/ProjectPhysX/FluidX3D/commit/649fd40fa6270fbd0823a53b2a55f4194fc9510b#diff-464b1d19d4b616b9609031b48429081b2c215328d9f98bc5cbeac6b2b84fdbf3R456

ProjectPhysX,
@ProjectPhysX@mast.hpc.social avatar

@nickserv that's a bug in 's runtime: fused-multiply-add (fma) is somehow emulated with terrible performance. This is very similar to what @niconiconi found on Nvidia CMP 170HX, where fma was disabled in the driver.
I've just fixed this in , by macro-replacing fma with a*b+c. Performance went up by 8-13x on my Samsung S9+ (ARM Mali-G72 MP18) with this workaround.
https://github.com/ProjectPhysX/FluidX3D/commit/9ce2caecfc85e4fda50fed3350304b75b223b06b
cc @chipsandcheese

ProjectPhysX,
@ProjectPhysX@mast.hpc.social avatar

@niconiconi @chipsandcheese @nickserv mad is equally slow as fma 🐌

ProjectPhysX, to GraphicsProgramming
@ProjectPhysX@mast.hpc.social avatar

update alert! v2.15 speeds up framerate in interactive graphics by 20-70%. 🖖🥳💻

How? Turns out iterating over 2 million pixels with a single CPU core is... really slow. I did that 3 times more than necessary for every frame rendered on screen! 🖖😆
I've now eliminated a memory copy of the frame (in favor of pointer swap), and a clear frame/zbuffer operation on CPU since that's already done on .

Release notes: https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.15

ProjectPhysX, to GraphicsProgramming
@ProjectPhysX@mast.hpc.social avatar

One of my papers got selected for the 2022 Best Paper Award of MDPI Computation! 🖖🥳📃🏆

That was a very bold publication for multiple reasons:

  • I solo-authored it
  • I wrote that paper in only 2 weeks
  • the title contains "Esoteric" twice
  • I submitted it on April 1st

It's serious science though: I discovered a simple algorithm to cut memory demand of the in half, allowing huge simulations on cheap ​s. This is one of the key innovations in .

https://doi.org/10.3390/computation10060092

ProjectPhysX, to GraphicsProgramming
@ProjectPhysX@mast.hpc.social avatar

How realistic can a simulation be? Here is a 1 billion cell simulation of an impacting raindrop, fully raytraced in 8K. FluidX3D contains state-of-the-art volume-of-fluid and surface tension models for highly accurate free surface simulations. Combined with my own engine, results are rendered on-the-fly at resolution as large as remaining VRAM can hold. 🖖😋💧📺
https://youtu.be/MmLNQIW_Sic
FluidX3D is on : https://github.com/ProjectPhysX/FluidX3D

fclc, to hpc
@fclc@mast.hpc.social avatar

Ehhhh the newest big GPU has arrived!

And you can have two of them connected to Grace!

and for the GPU itself and the +2X GPU version

ProjectPhysX,
@ProjectPhysX@mast.hpc.social avatar

@fclc FP4 arithmetic on Blackwell... 🖖🤣
Here is all possible values of the glorious FP4 format:

0111 +Inf
0110 +NaN
0101 +NaN
0100 +NaN
0011 +2.0
0010 +1.0
0001 +0.5
0000 +0
1000 -0
1001 -0.5
1010 -1.0
1011 -2.0
1100 -NaN
1101 -NaN
1110 -NaN
1111 -Inf

badlogic, to random
@badlogic@mastodon.gamedev.place avatar

ML bubble, I need to spend some money and figured a little desktop machine with enough GPU power to train smaller models would be a fun thing to buy.

Suggestions? Full rig specs preferred! GPU wise there aren't many options other than A6000 and 4090 RTX it seems.

ProjectPhysX,
@ProjectPhysX@mast.hpc.social avatar

@badlogic for 2 GPUs better go with a mainboard that supports PCIe bifurcation to two x8 slots to the CPU. Some Z690/Z790 mainboards support this, like the Taichi ones. And make sure the 4090s are only 3-slot, as all the 4-slot models will block the second PCIe slot.
For AI stuff, 3090 (non-Ti) will perform about the same but are cheaper, and 2x 100W less power.

ProjectPhysX, to linux
@ProjectPhysX@mast.hpc.social avatar

Software should always "just work". To make compiling easier, I made the compile script smarter: it now automatically detects operating system ( / / ), support on Linux, and if GNU make is installaled. 🖖🧐
https://github.com/ProjectPhysX/FluidX3D/commit/f990dfbe3f7a922d1cb6523e8e0b8e6d6cf8c905

azonenberg, to random
@azonenberg@ioc.exchange avatar

So the RTX 2080 Ti in my current office workstation is starting to get a little cramped for me. If I have a large KiCAD design and a ngscopeclient session with a lot of waveforms and filters open simultaneously, I often run out of VRAM.

More compute would be nice as long as it doesn't come with a power budget much higher than my existing 2080 Ti (250W TDP). I plan to stick with NVIDIA since I'm very familiar with their shader debug tools etc.

The only option with more VRAM in the consumer space is the RTX 4090 which I'd like to avoid due to the ludicrous 450W TDP and incompatible power connector.

So that leaves RTX workstation cards.

ProjectPhysX,
@ProjectPhysX@mast.hpc.social avatar

@azonenberg Nvidia Ada has crippled VRAM bandwidth on all but the very expensive high end. Cheapest good option a 2nd hand 3090 (non-Ti) and undervolting it to reduce TDP.

ProjectPhysX, to GraphicsProgramming
@ProjectPhysX@mast.hpc.social avatar

v2.13 is out, providing faster export with automatic SI unit conversion and a variety of bug fixes!
Full release notes: https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.13

ProjectPhysX, to intel
@ProjectPhysX@mast.hpc.social avatar

This is wild: can "SLI" together 🔵 Arc A770 + 🟢 Titan Xp, pooling 12GB+12GB of their VRAM for one large 450M cell simulation. Top half on A770, bottom half on Titan Xp. They seamlessly communicate over PCIe. Performance is ~1.7x of what either could do on its own. 🖖😋🖥🔥
shows its true power here - one implementation works on literally all GPUs at full performance, even at the same time. Happy !
https://youtu.be/PscbxGVs52o

ProjectPhysX, to Nvidia
@ProjectPhysX@mast.hpc.social avatar

Another day, another #Nvidia #GPU driver bug that needs a workaround: seems like Nvidia's #OpenCL driver suffers 32-bit uint overflow within the cl::CommandQueue::enqueueFillBuffer call! 🖖🤦‍♂️
https://github.com/ProjectPhysX/FluidX3D/commit/82976f15d2bd20b9188ea623cf0bac046c6c81ce

ProjectPhysX,
@ProjectPhysX@mast.hpc.social avatar

Found and reported another bug in drivers: passing vector types like int3 as kernel parameters is broken. 🖖🙂

Techaltar, to random
@Techaltar@mas.to avatar

The circle to search feature on the S24 series works so unreasonably well. Took this random 50x zoom photo, did a quick circle and right away got an answer.

This feature in particular uses Google for the actual searching, but even the completely self-developed Galaxy AI features worked surprisingly well. More thoughts in the Friday Checkout

video/mp4

ProjectPhysX,
@ProjectPhysX@mast.hpc.social avatar

@Techaltar such online AI features are a nightmare for data privacy. Send all your photos straight to Google, hallelujah. And of what use is this phone when I can't even plug in headphones? 🚫🎧

ProjectPhysX,
@ProjectPhysX@mast.hpc.social avatar

@Techaltar @champingsajt do you really think it performs the web search only upon request?
It looks more like it does it in the background for any new photo you take, sends all the data to Google, and caches the search URL so that when the request comes it instantly has the search result.

ProjectPhysX, to random
@ProjectPhysX@mast.hpc.social avatar

The final part of my thesis has now been accepted and published in and ! 🖖🥳📃🎓
I'm proud to have coauthored this study by Lisa Marie Oehlschlägel. We looked at water-air transfer of microplastics during bursting in lab experiments, with surprising results:
https://doi.org/10.1186/s43591-023-00079-x 🌊🫧💥

ProjectPhysX, to linux
@ProjectPhysX@mast.hpc.social avatar

v2.11 is out! This update fully matches interactive graphics functionality and user interface between Windows and , and brings faster simulation startup time and bug fixes. 🖖😎💻
Full release notes: https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.11

giuseppebilotta, to random
@giuseppebilotta@fediscience.org avatar

OK so I'm ready for today's lesson with the new laptop. My only gripe for the lesson will be that in 23.2 doesn't support information. Apparently the feature was merged at a later commit
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24101
and I even tried upgrading to my distro's experimental 23.3-rc1 packages, but trying to use rusticl on those packages segfaults. So either I've messed up something with this mixed upgrade, or I've hit an actual bug.

ProjectPhysX,
@ProjectPhysX@mast.hpc.social avatar

@giuseppebilotta yes they report dual-CUs instead of CUs for some reason. Estimating TFlops/s of hardware based on reported CUs and clock frequency has required a table of device name fragments before already, cores/CU can be 0.5, 1, 8, 16, 64, 128, 192, 256.
https://github.com/ProjectPhysX/OpenCL-Wrapper/blob/master/src/opencl.hpp#L56

giuseppebilotta, to Nvidia
@giuseppebilotta@fediscience.org avatar

This thread
https://mk.absturztau.be/notes/9lain2utf5untfm4
by @niconiconi is both fascinating and frustrating. has a bad habit of doing market segmentation in software (there have been some infamous cases of NVIDIA releasing driver updates that just uncrippled their desktop GPU performance on new releases, bringing their performance on par with the equivalent workstation and server GPUs). I wouldn't be surprised if this were the case for the hardware @niconiconi is experimenting on.

ProjectPhysX,
@ProjectPhysX@mast.hpc.social avatar

@giuseppebilotta @niconiconi Nvidia have a long history of artificially crippling their GeForce lineup to not eat into the more profitable Quadro market. Either with drivers, or by motivating certain companies to make their "professional" software run like shit if the GPU name is not on the Quadro list. https://youtu.be/uwCu-b7htV8
I suspect similar marketing reasoning with software-crippling the mining cards. Software locks are a major root cause for e-waste unfortunately.

ProjectPhysX, to github German
@ProjectPhysX@mast.hpc.social avatar

has passed 2000 Stars! It is the most popular software on now! 🖖😊⭐️
https://github.com/ProjectPhysX/FluidX3D
Feeling blessed that my work is useful to so many people across the globe, with users in 75 countries already! 🌍
42% EU, 30% Americas, 25% Asia, 3% Oceania+Africa

ProjectPhysX,
@ProjectPhysX@mast.hpc.social avatar

The red lightning bolt continues: has passed 3000 Stargazers on - from 82 countries! 🖖🥳⭐
Releasing this software for free really has turned out win-win: I've received so much valuable feedback, and answered with as many bug fixes and updates, with many more to come. I am enabling cutting-edge simulations for everyone, with very little hardware resources, on literally every computer that has a , regardless of vendor.
👉 https://github.com/ProjectPhysX/FluidX3D

ProjectPhysX,
@ProjectPhysX@mast.hpc.social avatar
  • All
  • Subscribed
  • Moderated
  • Favorites
  • megavids
  • kavyap
  • DreamBathrooms
  • normalnudes
  • magazineikmin
  • InstantRegret
  • GTA5RPClips
  • thenastyranch
  • Youngstown
  • rosin
  • slotface
  • osvaldo12
  • ngwrru68w68
  • ethstaker
  • JUstTest
  • everett
  • Durango
  • Leos
  • cubers
  • mdbf
  • khanakhh
  • tester
  • modclub
  • cisconetworking
  • anitta
  • tacticalgear
  • provamag3
  • lostlight
  • All magazines