Posts

This profile is from a federated server and may be incomplete. Browse more on the original instance.

amonakov, to random
@amonakov@mastodon.gamedev.place avatar

(prompted by discussion of detecting bitwise and-not earlier in GCC's optimization pipeline)

My ideal compiler IR would not have and/or/xor as distinct bitwise ops, just generic ternlog and probably the corresponding two-operand function ("bilog"?) too.

rygorous,
@rygorous@mastodon.gamedev.place avatar

@amonakov @pervognsen It's just about how early in their development they were committed to having 3 source operands. Same with 2-source-reg-plus-index-reg shuffles. The generic crossbar network is usually already there: sometime past the 8th dedicated unpack/pack/shuffle pattern for different type sizes, it's easier to build the general crossbar and just supply constants for the index vector in the "canned shuffle" cases.

pervognsen,
@pervognsen@mastodon.social avatar

@rygorous @amonakov What about the increased RF/operand forwarding port pressure from three input operands? Don't GPU cores usually have some additional tricks they can play with RF ports due to their latency-tolerant design? Does this figure into the CPU vs GPU difference at all?

amonakov, to random
@amonakov@mastodon.gamedev.place avatar

PSA for people writing Arm SIMD code in C or C++: unlike x86, where you can cast any pointer to __m128* and be able to dereference it regardless of the dynamic type of the pointed-to memory, that is not the case on Arm: Neon types do not carry the may_alias attribute and standard type compatibility rules apply. Compare the differences between 'f' and 'g' on the first pic, and Arm codegen on the second pic.

image/png

amonakov, to random
@amonakov@mastodon.gamedev.place avatar

In 'less', you can interactively add command-line arguments without leaving the pager by pressing '-': you can press '-S' to flip wrapping/chopping of long lines, and '-j11' to spawn 10 wor^W^W^W see extra ten lines of context above the match when searching!

amonakov, to random
@amonakov@mastodon.gamedev.place avatar

shower thought re. rgb 565 vs. theoretical 555|1 (common least significant bit for each component): it introduces higher "distortion" for darker colors (i.e. the closest encodable to dark purple { 1, 0, 1 }/64 is dark grey { 1, 1, 1 }/64), but our vision loses color sensitivity in low-light conditions anyway, so that probably would have been a better fit

pervognsen, (edited )
@pervognsen@mastodon.social avatar

@amonakov The X68k is still my dream retro computer. One day...

pervognsen, (edited )
@pervognsen@mastodon.social avatar

@amonakov Also, I was so happy when I saw an X68k easter egg the first time I played one of my now-favorite indie games, ZeroRanger:

amonakov, to random
@amonakov@mastodon.gamedev.place avatar

We can
typedef float vec4 attribute((ext_vector_type(4)));

But what if we could

typedef void lane_fn(...);
typedef lane_fn vec_fn attribute((ext_vector_size(4)));

oblomov,
@oblomov@sociale.network avatar

@amonakov you mean as a way to vectorize kernels in a GPGPU-like way?

amonakov, to random
@amonakov@mastodon.gamedev.place avatar

With xz backdoor opening an RCE pathway, have you thought "hey, it would be nice if the sshd sub-process doing the key/cert parsing would not be able to fork/exec anything?" Ideally the only thing it should be able to do is read/write to already-open fds and die a peaceful death, right?

Now, this particular backdoor was embedded deep enough that it might be able to workaround such privilege separation, but in general dropping privs for risky computations is an important part of defence-in-depth

amonakov,
@amonakov@mastodon.gamedev.place avatar

And that reminds me of another scenario where we parse untrusted certificates: WPA2-Enterprise authentication. Venerable wpa_supplicant does have some privilege-separation code (which I believe is rarely enabled on Linux), but what iwd does is completely incomprehensible to me: they pass certs from the access point straight to the kernel keyring subsystem, using the kernel as a fancy SSL library. Any weakness in the involved kernel code is thus open for exploitation by rogue access points.

amonakov, to random
@amonakov@mastodon.gamedev.place avatar

Inside of you there are two reviewers

amonakov, to random
@amonakov@mastodon.gamedev.place avatar

My kingdom for a distro with smooth cross-installation (preparing a rootfs for a foreign architecture without qemu).

lanodan,
@lanodan@queer.hacktivis.me avatar

@wolf480pl @amonakov cross-compilation without qemu? (note: I'm including binfmt here)
For cross-install without qemu I guess you'd want something like Alpine, in fact pmbootstrap literally is a tool for a cross-install.

amonakov,
@amonakov@mastodon.gamedev.place avatar

@lanodan @wolf480pl good to know, thanks!

amonakov, to random
@amonakov@mastodon.gamedev.place avatar

I still can't figure out the intended use-case for AMD's IBS (instruction-based sampling). You select a period N, and then for each N'th instruction you get info about that particular instruction (in which caches it missed, was a branch, was it mispredicted, ...). Which seems... completely unworkable for rare events? If I want to sample on mispredicted branches, and they account for 1% of all instructions, I'll have to discard 99% of IBS data, and my effective sampling period is 0.01 of nominal?

pervognsen,
@pervognsen@mastodon.social avatar

@amonakov (Only asking because if it's cycle based you'd expect mispredicts and cache misses to get higher weight with that kind of periodic sampling scheme.)

amonakov,
@amonakov@mastodon.gamedev.place avatar

@pervognsen With back-end "Ops" IBS you have a choice of cycle-based or uop-based period, but the front-end "Fetch" IBS only does instruction-based sampling (but that's the side which sees the L1i and iTLB misses).

amonakov, to random
@amonakov@mastodon.gamedev.place avatar

Fun fact: there are at least five distinct choices for type T (not counting typedefs) such that a C compiler targeting a POSIX system cannot optimize out the call to 'aux' in

void *alloc(T *psize)
{
size_t sz = *psize;
void *a = malloc(sz);
if (sz != *psize)
aux();
return a;
}

  • All
  • Subscribed
  • Moderated
  • Favorites
  • megavids
  • thenastyranch
  • rosin
  • GTA5RPClips
  • osvaldo12
  • love
  • Youngstown
  • slotface
  • khanakhh
  • everett
  • kavyap
  • mdbf
  • DreamBathrooms
  • ngwrru68w68
  • provamag3
  • magazineikmin
  • InstantRegret
  • normalnudes
  • tacticalgear
  • cubers
  • ethstaker
  • modclub
  • cisconetworking
  • Durango
  • anitta
  • Leos
  • tester
  • JUstTest
  • All magazines