Posts - amonakov - kbin.social

This profile is from a federated server and may be incomplete. Browse more on the original instance.

amonakov, 24 days ago to random

(prompted by discussion of detecting bitwise and-not earlier in GCC's optimization pipeline)

My ideal compiler IR would not have and/or/xor as distinct bitwise ops, just generic ternlog and probably the corresponding two-operand function ("bilog"?) too.

reply

expand (6)

collapse (6)

report

activity

copy /kbin url

copy original url

open original url

Loading...

rygorous, 20 days ago

@amonakov @pervognsen It's just about how early in their development they were committed to having 3 source operands. Same with 2-source-reg-plus-index-reg shuffles. The generic crossbar network is usually already there: sometime past the 8th dedicated unpack/pack/shuffle pattern for different type sizes, it's easier to build the general crossbar and just supply constants for the index vector in the "canned shuffle" cases.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 20 days ago

@rygorous @amonakov What about the increased RF/operand forwarding port pressure from three input operands? Don't GPU cores usually have some additional tricks they can play with RF ports due to their latency-tolerant design? Does this figure into the CPU vs GPU difference at all?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 26 days ago to random

PSA for people writing Arm SIMD code in C or C++: unlike x86, where you can cast any pointer to __m128* and be able to dereference it regardless of the dynamic type of the pointed-to memory, that is not the case on Arm: Neon types do not carry the may_alias attribute and standard type compatibility rules apply. Compare the differences between 'f' and 'g' on the first pic, and Arm codegen on the second pic.

image/png

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ pervognsen

amonakov, 1 month ago to random

In 'less', you can interactively add command-line arguments without leaving the pager by pressing '-': you can press '-S' to flip wrapping/chopping of long lines, and '-j11' to spawn 10 wor^W^W^W see extra ten lines of context above the match when searching!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 1 month ago to random

shower thought re. rgb 565 vs. theoretical 555|1 (common least significant bit for each component): it introduces higher "distortion" for darker colors (i.e. the closest encodable to dark purple { 1, 0, 1 }/64 is dark grey { 1, 1, 1 }/64), but our vision loses color sensitivity in low-light conditions anyway, so that probably would have been a better fit

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov, aeva

pervognsen, 1 month ago (edited 1 month ago)

@amonakov The X68k is still my dream retro computer. One day...

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 1 month ago (edited 1 month ago)

@amonakov Also, I was so happy when I saw an X68k easter egg the first time I played one of my now-favorite indie games, ZeroRanger:

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 1 month ago to random

We can
typedef float vec4 attribute((ext_vector_type(4)));

But what if we could

typedef void lane_fn(...);
typedef lane_fn vec_fn attribute((ext_vector_size(4)));

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ aeva

oblomov, 1 month ago

@amonakov you mean as a way to vectorize kernels in a GPGPU-like way?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 2 months ago to random

With xz backdoor opening an RCE pathway, have you thought "hey, it would be nice if the sshd sub-process doing the key/cert parsing would not be able to fork/exec anything?" Ideally the only thing it should be able to do is read/write to already-open fds and die a peaceful death, right?

Now, this particular backdoor was embedded deep enough that it might be able to workaround such privilege separation, but in general dropping privs for risky computations is an important part of defence-in-depth

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 2 months ago

And that reminds me of another scenario where we parse untrusted certificates: WPA2-Enterprise authentication. Venerable wpa_supplicant does have some privilege-separation code (which I believe is rarely enabled on Linux), but what iwd does is completely incomprehensible to me: they pass certs from the access point straight to the kernel keyring subsystem, using the kernel as a fancy SSL library. Any weakness in the involved kernel code is thus open for exploitation by rogue access points.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ ljrk

amonakov, 2 months ago to random

Inside of you there are two reviewers

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov, aeva

amonakov, 4 months ago to random

My kingdom for a distro with smooth cross-installation (preparing a rootfs for a foreign architecture without qemu).

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

lanodan, 4 months ago

@wolf480pl @amonakov cross-compilation without qemu? (note: I'm including binfmt here)
For cross-install without qemu I guess you'd want something like Alpine, in fact pmbootstrap literally is a tool for a cross-install.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 4 months ago

@lanodan @wolf480pl good to know, thanks!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 8 months ago to random

I still can't figure out the intended use-case for AMD's IBS (instruction-based sampling). You select a period N, and then for each N'th instruction you get info about that particular instruction (in which caches it missed, was a branch, was it mispredicted, ...). Which seems... completely unworkable for rare events? If I want to sample on mispredicted branches, and they account for 1% of all instructions, I'll have to discard 99% of IBS data, and my effective sampling period is 0.01 of nominal?

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 8 months ago

@amonakov (Only asking because if it's cycle based you'd expect mispredicts and cache misses to get higher weight with that kind of periodic sampling scheme.)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 8 months ago

@pervognsen With back-end "Ops" IBS you have a choice of cycle-based or uop-based period, but the front-end "Fetch" IBS only does instruction-based sampling (but that's the side which sees the L1i and iTLB misses).

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 9 months ago to random

Fun fact: there are at least five distinct choices for type T (not counting typedefs) such that a C compiler targeting a POSIX system cannot optimize out the call to 'aux' in

void *alloc(T *psize)
{
size_t sz = *psize;
void *a = malloc(sz);
if (sz != *psize)
aux();
return a;
}

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ daridrea