@chandlerc@hachyderm.io avatar

chandlerc

@chandlerc@hachyderm.io

Software, performance, optimization, programming languages, security, open source, #CarbonLang lead, #LLVM, #Clang, C++. 🏳️‍🌈 http://pronoun.is/he or http://pronoun.is/they

This profile is from a federated server and may be incomplete. Browse more on the original instance.

steve, to TVTooHigh
@steve@discuss.systems avatar

TFW but you left multiple feet of wall above it anyway. Aim higher! Do better!

chandlerc,
@chandlerc@hachyderm.io avatar

@steve See, I feel like people hav ethe TV-too-high thing all wrong.

Y'all seem to be watching TV sitting upright like it's time to play the piano or something. You need to be laid back watching TV. The call it a recliner because it RECLINES people.

Then the TV's gonna be at a fine height, just tilt it down a bit. I'd rather a bit higher TV and not have to be looking down my nose all the damn time.

🤡

chandlerc, to random
@chandlerc@hachyderm.io avatar

Playing with "overflow byte" approach to metadata+simd augmented hashtable design.

I've read that this is part of the F14 design, but I found the explanations for Boost's new unordered_flag_map to be what got the point across.

I implemented the core of this in my own way on top of my hashtable -- rather than stealing a byte from the metadata the way Boost does, I just tack on an array of 1-byte-per-group "probe markers" (a term that makes more sense to me than "overflow byte".

chandlerc,
@chandlerc@hachyderm.io avatar

Initially, I just wanted to confirm the statistical benefits that are espoused for this approach.

Because I'm really quite bad at math, statistics, and modelling, the easiest way for me was to implement it and some detailed metrics, and then run my benchmarks and see what happens to average probe length.

The big idea is using one bit of the probe marker (or "overflow byte") to track whether a particular key needs to keep probing past the current group. How does this reduce probe length?

chandlerc,
@chandlerc@hachyderm.io avatar

My understanding (please lemme know if I've misunderstood):

You use more bits from the hash to select the bit of the byte for a particular key. This basically lets you probabilistically end the probe early.

Compared to the tombstone approach of SwissTable, where the only end to a probe sequence is a group with an empty slot. All the keys share this terminating condition, but with the 8 bits, you should only share the termination with 1/8th of keys?

And so it should terminate (much) sooner?

chandlerc,
@chandlerc@hachyderm.io avatar

This does seem to work. All my fancy testing passes.

But the probe distance improvements? Tiny. Like, miniscule. Way less than I would have hoped.

Or is it? sometimes I see 30% drop in average distance. But the average goes from 0.072 to 0.051...

Even the max distance reductions are very moderate.

I don't really see large probe distances. With the most probe-prone data set (identifier like strings), average across all keys is at worst 0.10.

chandlerc,
@chandlerc@hachyderm.io avatar

And the measured timings don't seem to improve.

I've not yet really looked at how optimal the generated code is yet because I was mostly hoping to see an improvement in the computed average probe distances first...

Maybe I will though. It is in some ways nicer code, simpler than searching for empty and tombstone metadata. And the added memory is tiny. But a little perplexed at the results.

chandlerc,
@chandlerc@hachyderm.io avatar

@pervognsen Not whose probe sequence starts, but reaches that group. Which is ... a bit weirder. Especially with quadratic probing.

The bits are only ever set once the group is full, but despite being full 7/8th of probes should terminate when hitting it is I think the theory.

But that doesn't seem to really materialize for my implementation. Now I'm wondering if I've got bugs or something, and this is just a weirder way of tracking "was there an empty metadata byte".

chandlerc,
@chandlerc@hachyderm.io avatar

Intersetingly, on Arm where I use a group size of 8 (and SWAR instead of SIMD as its faster), I'm seeing the probe marker result in significant more probing. More keys probe, and probe further (likely as a consequence). But only for some keys data types. Weird. But not encouraging for this technique despite it being conceptually simpler.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • megavids
  • thenastyranch
  • rosin
  • GTA5RPClips
  • osvaldo12
  • love
  • Youngstown
  • slotface
  • khanakhh
  • everett
  • kavyap
  • mdbf
  • DreamBathrooms
  • ngwrru68w68
  • provamag3
  • magazineikmin
  • InstantRegret
  • normalnudes
  • tacticalgear
  • cubers
  • ethstaker
  • modclub
  • cisconetworking
  • Durango
  • anitta
  • Leos
  • tester
  • JUstTest
  • All magazines