@chandlerc@hachyderm.io avatar

chandlerc

@chandlerc@hachyderm.io

Software, performance, optimization, programming languages, security, open source, #CarbonLang lead, #LLVM, #Clang, C++. 🏳️‍🌈 http://pronoun.is/he or http://pronoun.is/they

This profile is from a federated server and may be incomplete. Browse more on the original instance.

chandlerc, to random
@chandlerc@hachyderm.io avatar

So, in non-tech news...

My partner and I are embarking on an exciting new adventure: building our dream home!

But by "dream home", it's looking like it will be more of a mountain-fortress-of-solitude... although we do hope to still entertain & throw parties there from time to time. =]

Current phase: find and acquire a site/lot. The region we're interested in is the Santa Cruz mountains where I can still commute to work.

We've struggling to evaluate two candidate sites...

1/

chandlerc, to random
@chandlerc@hachyderm.io avatar

Seems really hard to do low-latency small-size SIMD string manip on Apple's ARM cores.

AFAICT, you need something like fmov or umov to switch from SIMD to scalar even to do a conditional branch for the back edge of the loop.

And the tail of the loop really has to switch to scalar to compute any kind of byte offset.

The firestorm core has huge (10 cycle+) latency on this operation.

So short-string dominated string manip basically always loses with SIMD?

Am I missing anything?

chandlerc, to random
@chandlerc@hachyderm.io avatar

Why write blog posts when you can write PR descriptions instead?

https://github.com/carbon-language/carbon-lang/pull/3278

Also, I'm having entirely too much fun playing with Carbon's lexer while on vacation. Such a great little piece of puzzle code.

chandlerc, to random
@chandlerc@hachyderm.io avatar

C++ data structure API design question...

What are folks favorite ways to design a data structure that supports users providing two closely coupled custom functions? Why that pattern?

Specifically, imagine a hash table data structure that wants to allow users to deeply customize both the hash function and the equality comparison.

Current ideas, w/o ranking or even saying I like them, and interested in others:

  • A type parameter with static functions
  • two lambda template parameters
  • CRTP
chandlerc, to random
@chandlerc@hachyderm.io avatar

For C++ library folks -- should const on containers propagate to the elements in the container? why? (or why not?)

And if "yes", why should span (or equivalent) not take the same path? Or should it?

(To be clear, I have lots of my own thoughts on all of these questions. I'm not asking because I'm unaware of any possible answers, but to see how others think about them.)

chandlerc, to llvm
@chandlerc@hachyderm.io avatar

Is there a good reason targeting doesn't seem to fold shifts into operands when it would require shifting in multiple operands?

I'm seeing lots of:

lsr xN, xN, #7  
and x?, x?, xN  
...  
and x?, x?, xN  

With no other uses of xN.

Is there a reason to prefer this over:

and x?, x?, xN, lsr #7  
...  
and x?, x?, xN, lsr #7  

While "duplicated", it seems like it would save an instruction at least in decode?

chandlerc, to random
@chandlerc@hachyderm.io avatar

Well, got my new fancy fast hashing integrated into my Abseil/SwissTable-like hash table (partially) and it works!

And the result is slightly faster, roughly as much as I expect given that only the hash function should really be different.

But I also connected it to LLVM's venerable DenseSet... And with this fast of a hash function that thing is preposterously fast for (very) small sizes.

...
...
...

Or is it?

chandlerc, (edited ) to random
@chandlerc@hachyderm.io avatar

Noooo... cppreference is down, how will I write code?

EDIT: back up now for me it seems....

chandlerc, to random
@chandlerc@hachyderm.io avatar

Playing with "overflow byte" approach to metadata+simd augmented hashtable design.

I've read that this is part of the F14 design, but I found the explanations for Boost's new unordered_flag_map to be what got the point across.

I implemented the core of this in my own way on top of my hashtable -- rather than stealing a byte from the metadata the way Boost does, I just tack on an array of 1-byte-per-group "probe markers" (a term that makes more sense to me than "overflow byte".

chandlerc, to random
@chandlerc@hachyderm.io avatar

What's the current state of hash flooding and DoSing?

Is this still something worth defending against in the face of hash table designs like SwissTable and HashBrown and F14 where collisions are handled well with a large constant factor?

chandlerc, to random
@chandlerc@hachyderm.io avatar

I'm thinking optimizing compilers were a mistake.

Or abstractions. Can't decide which one.

Sitting here trying to understand why extracting an inner loop to a helper function causes LLVM to generate bizarrely different instructions.

Like, reloading a value already loaded earlier in the loop, in a live register.

There are no stores, no pointers, no escape of anything.

And ... I think I know why this is happening (yay post-inlining vs. pre-inlining pass diffs)... but....

WTF!!@!!@$!

chandlerc, to random
@chandlerc@hachyderm.io avatar

Fixed the RSS feed on my blog, and hopefully can now post without weird clutter from slides!

https://chandlerc.blog/posts/fixing_rss_feed/

chandlerc, to random
@chandlerc@hachyderm.io avatar

FYI & note to future self for easier finding and referencing Arm and Neon intrinsics:

https://arm-software.github.io/acle/main/acle.html
https://arm-software.github.io/acle/neon_intrinsics/advsimd.html

These are much more effective than the ARM developer site -- can just use normal search, and they even include things bizarrely missing on the ARM developer site like vst1_*.

(I've probably been directed at these at least twice before, but maybe by posting this will help me remember the right place to go for the reference...)

chandlerc, to random
@chandlerc@hachyderm.io avatar

Which bit ranges within a 64-bit pointer is randomized by ASLR? Interested in answers for the following permutations:

{heap, stack} x {x86, aarch64} x {Linux, Windows, macOS}

For "heap" I mean "mmap-to-acquire-heap" as typically used by memory allocators like tcmalloc and such.

I assume the answer is somewhat a function of page size -- I care about 4k, 64k (AArch64), and 2m (x86-64) page sizes.

Tried searching around for this, and all I find are heap and stack on 32-bit x86.

chandlerc, to random
@chandlerc@hachyderm.io avatar

A random and weird AArch64 performance question I'm mulling...

Which instruction sequence in a linear dep chain is better?
a)

lsr ... #7  
rbit ...  
clz ...  

b)

rev ...  
clz ...  

It seems like (b) should be clearly superior.... unless some uarch does fusion of rbit and clz to a single uop (ctz-esque). If there is fusion (a) could easily be better...

Anyone know of such a uarch? Worth avoiding (b) in case of a future one?

(I'm working on getting measurements for the M1...)

chandlerc, to random
@chandlerc@hachyderm.io avatar

(Finally) posted one of my recent fun-coding results:
https://github.com/carbon-language/carbon-lang/pull/3327

(will be contributing back to Abseil as well if the benefits prove out)

chandlerc, to random
@chandlerc@hachyderm.io avatar

I'm still super new to Arm ISA-specific stuff, but maybe folks here can maybe help me out with some maybe-basic questions?

Why does vget_lane_u64 produce fmov with Clang rather than umov? Maybe the two are actually the same instruction under the hood and the disassembler doesn't know which to use? Or should I worry about having the right architecture flags?

Also, is that instruction super slow in practice (I'm on an Apple M CPU)? Or does my sampling profiler just attribute lots to it?

chandlerc, to random
@chandlerc@hachyderm.io avatar

How does one get attention of the GitHub powers-that-be when your actions start queuing endlessly without progress?

chandlerc, to random
@chandlerc@hachyderm.io avatar

Northern lights in the SF bay area (up on our mountain), wild.

Thanks to @hanadusikova for having the right phone camera to get the great photo.

steve, to TVTooHigh
@steve@discuss.systems avatar

TFW but you left multiple feet of wall above it anyway. Aim higher! Do better!

chandlerc,
@chandlerc@hachyderm.io avatar

@steve See, I feel like people hav ethe TV-too-high thing all wrong.

Y'all seem to be watching TV sitting upright like it's time to play the piano or something. You need to be laid back watching TV. The call it a recliner because it RECLINES people.

Then the TV's gonna be at a fine height, just tilt it down a bit. I'd rather a bit higher TV and not have to be looking down my nose all the damn time.

🤡

chandlerc, to random
@chandlerc@hachyderm.io avatar

I'm really exhausted of complaining about the project's auto-close response that has instructions no one outside the Bazel team can follow for keeping actual issues that are impacting users open.

Has anyone played with Buck2? Any example C++ projects using it that I could look at?

chandlerc, to random
@chandlerc@hachyderm.io avatar

Love to see how mad the Onion is here. More folks should be this mad about this.

If you want a really solid critique, it's actually here too. They put their all into this piece and while it's sarcastic as usual, it really does cover so many aspects of how horrible the current trend of "news" in this space has been.

Go read it:
https://www.theonion.com/it-is-journalism-s-sacred-duty-to-endanger-the-lives-of-1850126997

chandlerc, to random
@chandlerc@hachyderm.io avatar

I continue to think this is one of the biggest insights I have had professionally:

Communicating "up" to "senior leadership"[1] is almost entirely about iterative synthesis and refinement of a reasonable abstraction for them to use to understand and communicate about what you're actually doing/proposing/asking/etc.

[1]: In whatever form this takes. But specifically folks as far away as you might call "executives" in a business context or similarly breadth/scale in another org context

chandlerc, to random
@chandlerc@hachyderm.io avatar

Second edition of the Carbon Copy is out!

https://github.com/carbon-language/carbon-lang/discussions/3869

This one discusses Carbon's "unformed state" as a way to address safety risks of uninitialized data but maximizing the control over exactly which tradeoff should be taken, and maximizing the ability to diagnose or mitigate bugs.

chandlerc, to random
@chandlerc@hachyderm.io avatar

Saw this again, and it remains excellent.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • megavids
  • thenastyranch
  • rosin
  • GTA5RPClips
  • osvaldo12
  • love
  • Youngstown
  • slotface
  • khanakhh
  • everett
  • kavyap
  • mdbf
  • DreamBathrooms
  • ngwrru68w68
  • provamag3
  • magazineikmin
  • InstantRegret
  • normalnudes
  • tacticalgear
  • cubers
  • ethstaker
  • modclub
  • cisconetworking
  • Durango
  • anitta
  • Leos
  • tester
  • JUstTest
  • All magazines