Playing with "overflow byte" approach to metadata+simd augmented hashtable design.
I've read that this is part of the F14 design, but I found the explanations for Boost's new unordered_flag_map to be what got the point across.
I implemented the core of this in my own way on top of my hashtable -- rather than stealing a byte from the metadata the way Boost does, I just tack on an array of 1-byte-per-group "probe markers" (a term that makes more sense to me than "overflow byte".
@pervognsen Not whose probe sequence starts, but reaches that group. Which is ... a bit weirder. Especially with quadratic probing.
The bits are only ever set once the group is full, but despite being full 7/8th of probes should terminate when hitting it is I think the theory.
But that doesn't seem to really materialize for my implementation. Now I'm wondering if I've got bugs or something, and this is just a weirder way of tracking "was there an empty metadata byte".
Intersetingly, on Arm where I use a group size of 8 (and SWAR instead of SIMD as its faster), I'm seeing the probe marker result in significant more probing. More keys probe, and probe further (likely as a consequence). But only for some keys data types. Weird. But not encouraging for this technique despite it being conceptually simpler.
It's not an accident that the image of the guy with the "change my mind" poster is a racist, homophobic, abuser. And most of the memes using that template were horrible. It was never an invitation to change his mind.
P.S.: Don't use that meme template. Was horrified when I learned the full story. And no, don't ask me to cite my sources, use a search engine.
@zwarich@chandlerc (BTW, I think that storage proposal would work at least as well in C++ as in Rust and I believe Matthieu said he had first used the idea in the C++ code base at his day job.)
More wild in some ways is one of our nest cameras looking out over half moon bay. It ... is sparkling. Like, you can see a few bright spots in this screen capture, but its constantly sparkling throughout the dark regions (need to turn up brightness to see).
I don't have any explanation for this other than the CME... We've zoomed in like this to look at the lights from half moon bay glowing in the night before, no sparkles. So weird.
This one discusses Carbon's "unformed state" as a way to address safety risks of uninitialized data but maximizing the control over exactly which tradeoff should be taken, and maximizing the ability to diagnose or mitigate bugs.
And also just arrived in Vienna Austria for EuroLLVM! Looking forward to seeing folks after way too long and catching up. Can even pester us with questions about unformed state on the Carbon panel!
Needed a term for the unsustainable, irrational, and often quite bumbling and foolish excitement when you're first fully healthy after being really sick for a few days...
ZOOMIES!!! YES I HAVE POST-SICK ZOOMIES!!!!
Now to go contain myself and not slide right back to being sick....
I continue to think this is one of the biggest insights I have had professionally:
Communicating "up" to "senior leadership"[1] is almost entirely about iterative synthesis and refinement of a reasonable abstraction for them to use to understand and communicate about what you're actually doing/proposing/asking/etc.
[1]: In whatever form this takes. But specifically folks as far away as you might call "executives" in a business context or similarly breadth/scale in another org context
Like, yes, you need to ask for a thing, or propose a thing, or ....
But 90% of the effort is causing there to be a suitable abstraction at the relevant leadership level for them to understand, make any decision, and communicate w/ peers or their leadership about it.
And mostly, finding that abstraction I think is what is the most difficult, impactful, etc.
A random and weird AArch64 performance question I'm mulling...
Which instruction sequence in a linear dep chain is better?
a)
lsr ... #7
rbit ...
clz ...
b)
rev ...
clz ...
It seems like (b) should be clearly superior.... unless some uarch does fusion of rbit and clz to a single uop (ctz-esque). If there is fusion (a) could easily be better...
Anyone know of such a uarch? Worth avoiding (b) in case of a future one?
(I'm working on getting measurements for the M1...)
@chandlerc I'd stick with (b) – if there were fusion, then they'd be equal on uops, but (a) would be worse on instruction count and code size, so still worse in general. Latency should be 1c for each of these instructions just about everywhere, though there are a lot of different CPUs, so feel free to double check that.
I also don't believe such fusion exists – I spent time looking through various manuals to try to find patterns to test for on M1, and I don't recall anything like that.
This is also the direction I liked due to code size... But I also couldn't help wondering if I'm just too excited about the fun of using byte-reverse instead of bit-reverse to avoid moving the set bit around. ;] Good to know that you've not seen anything crop up that would point away from it or towards fusion of this case.
For C++ library folks -- should const on containers propagate to the elements in the container? why? (or why not?)
And if "yes", why should span (or equivalent) not take the same path? Or should it?
(To be clear, I have lots of my own thoughts on all of these questions. I'm not asking because I'm unaware of any possible answers, but to see how others think about them.)
@Paxxi@resistor He mentioned that to an extent, it's an analogous case of propagation of const making it harder to use.
But all I was clarifying is whether the requirement to reach for tools like mutable would be a negative aspect. Some folks see it as a code smell, others don't.
@chandlerc@resistor ahh right.
I'm fighting myself on this one. For e.g. String I'm thinking const should mean const but for collections I'm thinking const should not propagate to elements.
I don't have any real arguments either way, just my feelings 😀
@TomF@steve I mean, I'm somewhat aware of the diversity of uarch's out there.... And I don't really want more knobs in the compiler. I hate them.
But I'm specifically saying that thresholds where encoding A vs. encoding B results in 2 vs. 1 uop seem very important to document and teach compilers about. Not every other difference. =D Nicer to not have them at all, but if they exist, we need to know? And this doesn't seem like a terribly frustrating threshold to model.
I'm really exhausted of complaining about the #Bazel project's auto-close response that has instructions no one outside the Bazel team can follow for keeping actual issues that are impacting users open.
Has anyone played with Buck2? Any example C++ projects using it that I could look at?