amonakov

@amonakov@mastodon.gamedev.place

This profile is from a federated server and may be incomplete. Browse more on the original instance.

amonakov, 15 hours ago to random

PSA for people writing Arm SIMD code in C or C++: unlike x86, where you can cast any pointer to __m128* and be able to dereference it regardless of the dynamic type of the pointed-to memory, that is not the case on Arm: Neon types do not carry the may_alias attribute and standard type compatibility rules apply. Compare the differences between 'f' and 'g' on the first pic, and Arm codegen on the second pic.

image/png

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ pervognsen

pervognsen, 4 days ago to random

Digging through old stuff, here's a fairly extreme example of branch predictor training effects in a benchmark of Robin Hood linear probing vs conventional linear probing. Look at the branch misses for 1 vs 1000 outer loop iterations for RH (and the impact on wall clock time): https://gist.github.com/pervognsen/e818251d52f725db7c67e562577a12f6

reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

amonakov, 4 days ago

@pervognsen Is that Rust's built-in perf_event interface in action for {branch,cache} miss counts?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 12 days ago to random

In 'less', you can interactively add command-line arguments without leaving the pager by pressing '-': you can press '-S' to flip wrapping/chopping of long lines, and '-j11' to spawn 10 wor^W^W^W see extra ten lines of context above the match when searching!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

vegard, 15 days ago to random

Are these all the special sections that run code on program init/exit? Did I miss any?

.section .preinit_array; .quad fun
.section .init, "ax"; callq fun
.section .init_array; .quad fun
.section .ctors, "aw"; .quad fun
.section .dtors, "aw"; .quad fun
.section .fini_array; .quad fun
.section .fini, "ax"; callq fun

reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 15 days ago

@vegard you absolutely did, because you're not asking the right question. The right question is, what makes code from those sections run at startup?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

forrestthewoods, 21 days ago to random

One of the arguments against immediate mode UI is that it will drain your battery. I think that's bollocks!

Here is my "proof of life" test to measure power draw. Preliminary result is that Dear ImGui consumes less power than YouTube. That feels like a fairbar to compare against

I'll of course take some much more rigorous measurements and tests over longer periods of time. This is just the first time I have actual data.

reply

expand (8)

collapse (8)

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 19 days ago

@wolfpld @forrestthewoods @castano I am well aware of the joke that goes like "to learn how to do <X> on Linux, loudly claim 'Linux cannot do <X>! WTF!' and wait to be corrected", and I am not sure what you were looking for when both perf and turbostat can work with RAPL sensors, but I'm curious how you solved it in the end.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 21 days ago to random

shower thought re. rgb 565 vs. theoretical 555|1 (common least significant bit for each component): it introduces higher "distortion" for darker colors (i.e. the closest encodable to dark purple { 1, 0, 1 }/64 is dark grey { 1, 1, 1 }/64), but our vision loses color sensitivity in low-light conditions anyway, so that probably would have been a better fit

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov, aeva

amonakov, 10 days ago

Oh, it's not theoretical at all: at least Sharp X68000 and Neo Geo employed it. Thanks, Wikipedia!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

regehr, 25 days ago to random

this has to be one of my all-time favorite bug-finding techniques: in your widely deployed software, at very low probability, you put a new heap allocation next to a protected page. performance is unaffected and the bugs that you find are those that actually matter to users in practice.

https://arxiv.org/pdf/2311.09394v2.pdf

reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ c0de517e, pervognsen, slink, penguin42 +3 more

amonakov, 25 days ago

@regehr how do you rate this verbatim quote from the paper:

GWP-ASan is neither GWP nor ASan.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

castano, 26 days ago to random

Apparently PVR has a Vulkan layer that enables support gpu timestamps. Has anybody had any success enabling it?

https://developer.imaginationtech.com/downloads/latest-release-notes/#pvrcarbon

reply

expand (7)

collapse (7)

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 26 days ago

@castano What. Why is it a layer. Is it because it adds overhead to everything.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 28 days ago to random

We can
typedef float vec4 attribute((ext_vector_type(4)));

But what if we could

typedef void lane_fn(...);
typedef lane_fn vec_fn attribute((ext_vector_size(4)));

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ aeva

amonakov, 1 month ago to random

With xz backdoor opening an RCE pathway, have you thought "hey, it would be nice if the sshd sub-process doing the key/cert parsing would not be able to fork/exec anything?" Ideally the only thing it should be able to do is read/write to already-open fds and die a peaceful death, right?

Now, this particular backdoor was embedded deep enough that it might be able to workaround such privilege separation, but in general dropping privs for risky computations is an important part of defence-in-depth

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 1 month ago

And that reminds me of another scenario where we parse untrusted certificates: WPA2-Enterprise authentication. Venerable wpa_supplicant does have some privilege-separation code (which I believe is rarely enabled on Linux), but what iwd does is completely incomprehensible to me: they pass certs from the access point straight to the kernel keyring subsystem, using the kernel as a fancy SSL library. Any weakness in the involved kernel code is thus open for exploitation by rogue access points.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ ljrk

demofox, 1 month ago to random

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ eniko

amonakov, 1 month ago

@demofox the adversity of the effect will fall more on the waterer than the wateree

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 1 month ago to random

Inside of you there are two reviewers

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov, aeva

fclc, 1 month ago to hpc

Ehhhh the newest big GPU has arrived!

And you can have two of them connected to Grace!

#B100 and #GB200 for the GPU itself and the #grace +2X GPU version

#hpc #GTC

reply

expand (14)

collapse (14)

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 1 month ago

@ProjectPhysX @fclc @fay59
that's not for real, right? it does not follow IEEE conventions (Inf should be next to 2.0, and 2.0 should be 1.5), and surely for fp4 you want a two-bit exponent so you have only one NaN of each sign:
0000 0.0
0001 0.5
0010 1.0
0011 1.5
0100 2.0
0101 3.0
0110 +Inf
0111 +NaN

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

castano, 2 months ago to random

I don't get why IHVs hide their tools within layers of security and identity validation steps, that then don't work as intended and prevent developers from using those tools. What are they trying to protect?
On the other hand, AMD's approach is so refreshing.

reply

expand (7)

collapse (7)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov, ataylor

amonakov, 2 months ago

@castano So much this, but which facet of "AMD's approach" are you praising? I'm mostly observing from the perspective of compute workloads on Linux, where AMD consumers are too often left to their own devices.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

demofox, 3 months ago to random

Anyone out there do IIR filters on images? In real time, or at all?

reply

expand (7)

collapse (7)

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 3 months ago (edited 3 months ago)

@demofox Yes! Young and Van Vliet described a fast approximation for Gaussian filters via IIR. It is used in RawTherapee and uncovered a very peculiar bug in GCC!

(Young – Van Vliet gives separate IIRs for each dimension, so at least one of them is very SIMD-friendly)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 3 months ago to random

My kingdom for a distro with smooth cross-installation (preparing a rootfs for a foreign architecture without qemu).

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 3 months ago

@lanodan @wolf480pl good to know, thanks!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

shafik, 4 months ago to programming

'long long long' is too long for GCC

is such a good ol'diagnostic

#programming

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ jpm, lanodan

amonakov, 4 months ago

@shafik and yet sometimes a quad-long is okay with g++ if you twist its arm in exactly the right way:

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ lanodan

retr0id, 5 months ago to random

#define SIZEOF_CHAR (sizeof 'A')

reply

expand (7)

collapse (7)

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 5 months ago

@retr0id hey look what Debian CodeSearch turns up

sizeof(';') in h1_encoder.c
sizeof(' ') in ngx_http_memc_request.c

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ penguin42, raptor

simontatham, 5 months ago to random

I wish 'git push' had a less hardcore mode for -n.

It connects to the server using the push protocol, and stops short of actually uploading anything. So it uses the push URI instead of the pull one, if they're different.

This means it checks that the push URI actually works. Great!

But if the push URI needs awkward authentication (e.g. SSH key on a token) then I'd also like a mode where it uses the pull URI, so I can easily check 'is this command line asking for what I meant to ask for?'

reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ quixoticgeek

amonakov, 5 months ago

@simontatham Here's a possible alternative. At work we have a repo created with 'git clone --mirror' from upstream. When I clone that repo for development, 'git push -n' from my worktree is a local operation, and I do 'git push upstream my-branch:remote-branch' when everything looks ok (after setting up the 'upstream' remote).

(if the upstream updated in the meantime, I have to 'git fetch' in the mirror repo, and 'git pull --rebase' in my tree)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 5 months ago

@simontatham there's also 'git worktree add' if your git is not too old.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mjp, 6 months ago to random

My very brief summary of Apple's new "dynamic caching" as described in this video: https://developer.apple.com/videos/play/tech-talks/111375

There's a shared pool of on-chip memory on the shader core that can be dynamically split up to serve as register file, tile cache, shared memory, general buffer L1, or stack. Since it's dynamic even within the lifetime of a thread/wave, the registers can be allocated dynamically as the program needs them rather than needing registers to be statically-allocated up-front.

reply

expand (11)

collapse (11)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ demofox, bitinn, aras, pervognsen

amonakov, 6 months ago

@aras @pervognsen @mjp for OoO CPU you mainly look at critical path length and throughput limits without regards to specific instruction schedule, which is way easier compared to in-order. It's not like you can add up instruction latencies for a pipelined in-order core to get a good estimate. @rygorous tweeted about this too iirc.

(I don't suppose there's an implied ";)" after your "?")

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mbr, 6 months ago to random

Does anybody know what the current usable state-of-the-art method of converting an unsigned integer multiply by constant is? I'd imagine it's not still a variant Bernstein.

reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 6 months ago

@mbr @fatlimey ”expanding” or ”synthesizing” multiplication via shifts/adds might be clearer :)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...