chandlerc

@chandlerc@hachyderm.io

Software, performance, optimization, programming languages, security, open source, #CarbonLang lead, #LLVM, #Clang, C++. 🏳️‍🌈 http://pronoun.is/he or http://pronoun.is/they

This profile is from a federated server and may be incomplete. Browse more on the original instance.

chandlerc, 11 months ago to random

So, in non-tech news...

My partner and I are embarking on an exciting new adventure: building our dream home!

But by "dream home", it's looking like it will be more of a mountain-fortress-of-solitude... although we do hope to still entertain & throw parties there from time to time. =]

Current phase: find and acquire a site/lot. The region we're interested in is the Santa Cruz mountains where I can still commute to work.

We've struggling to evaluate two candidate sites...

1/

reply

expand (31)

collapse (31)

report

activity

copy /kbin url

copy original url

open original url

Loading...

chandlerc, 7 months ago to random

Seems really hard to do low-latency small-size SIMD string manip on Apple's ARM cores.

AFAICT, you need something like fmov or umov to switch from SIMD to scalar even to do a conditional branch for the back edge of the loop.

And the tail of the loop really has to switch to scalar to compute any kind of byte offset.

The firestorm core has huge (10 cycle+) latency on this operation.

So short-string dominated string manip basically always loses with SIMD?

Am I missing anything?

reply

expand (28)

collapse (28)

report

activity

copy /kbin url

copy original url

open original url

Loading...

chandlerc, 8 months ago to random

Why write blog posts when you can write PR descriptions instead?

https://github.com/carbon-language/carbon-lang/pull/3278

Also, I'm having entirely too much fun playing with Carbon's lexer while on vacation. Such a great little piece of puzzle code.

reply

expand (27)

collapse (27)

report

activity

copy /kbin url

copy original url

open original url

Loading...

chandlerc, 16 days ago to random

C++ data structure API design question...

What are folks favorite ways to design a data structure that supports users providing two closely coupled custom functions? Why that pattern?

Specifically, imagine a hash table data structure that wants to allow users to deeply customize both the hash function and the equality comparison.

Current ideas, w/o ranking or even saying I like them, and interested in others:

A type parameter with static functions

two lambda template parameters

CRTP

reply

expand (23)

collapse (23)

report

activity

copy /kbin url

copy original url

open original url

Loading...

chandlerc, 4 months ago to random

For C++ library folks -- should const on containers propagate to the elements in the container? why? (or why not?)

And if "yes", why should span (or equivalent) not take the same path? Or should it?

(To be clear, I have lots of my own thoughts on all of these questions. I'm not asking because I'm unaware of any possible answers, but to see how others think about them.)

reply

expand (16)

collapse (16)

report

activity

copy /kbin url

copy original url

open original url

Loading...

chandlerc, 5 months ago to llvm
Is there a good reason #LLVM targeting #AArch64 doesn't seem to fold shifts into operands when it would require shifting in multiple operands?

I'm seeing lots of:
lsr xN, xN, #7  
and x?, x?, xN  
...  
and x?, x?, xN  
With no other uses of xN.

Is there a reason to prefer this over:
and x?, x?, xN, lsr #7  
...  
and x?, x?, xN, lsr #7  
While "duplicated", it seems like it would save an instruction at least in decode?
reply

expand (11)

collapse (11)

report

activity

copy /kbin url

copy original url

open original url

Loading...

chandlerc, 7 months ago to random

Well, got my new fancy fast hashing integrated into my Abseil/SwissTable-like hash table (partially) and it works!

And the result is slightly faster, roughly as much as I expect given that only the hash function should really be different.

But I also connected it to LLVM's venerable DenseSet... And with this fast of a hash function that thing is preposterously fast for (very) small sizes.

...
...
...

Or is it?

reply

expand (8)

collapse (8)

report

activity

copy /kbin url

copy original url

open original url

Loading...

chandlerc, 12 days ago (edited 12 days ago) to random

Noooo... cppreference is down, how will I write code?

EDIT: back up now for me it seems....

reply

expand (8)

collapse (8)

report

activity

copy /kbin url

copy original url

open original url

Loading...

chandlerc, 6 days ago to random

Playing with "overflow byte" approach to metadata+simd augmented hashtable design.

I've read that this is part of the F14 design, but I found the explanations for Boost's new unordered_flag_map to be what got the point across.

I implemented the core of this in my own way on top of my hashtable -- rather than stealing a byte from the metadata the way Boost does, I just tack on an array of 1-byte-per-group "probe markers" (a term that makes more sense to me than "overflow byte".

reply

expand (7)

collapse (7)

report

activity

copy /kbin url

copy original url

open original url

Loading...

chandlerc, 8 months ago to random

What's the current state of hash flooding and DoSing?

Is this still something worth defending against in the face of hash table designs like SwissTable and HashBrown and F14 where collisions are handled well with a large constant factor?

reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

chandlerc, 7 months ago to random

I'm thinking optimizing compilers were a mistake.

Or abstractions. Can't decide which one.

Sitting here trying to understand why extracting an inner loop to a helper function causes LLVM to generate bizarrely different instructions.

Like, reloading a value already loaded earlier in the loop, in a live register.

There are no stores, no pointers, no escape of anything.

And ... I think I know why this is happening (yay post-inlining vs. pre-inlining pass diffs)... but....

WTF!!@!!@$!

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

chandlerc, 6 months ago to random

Fixed the RSS feed on my blog, and hopefully can now post without weird clutter from slides!

https://chandlerc.blog/posts/fixing_rss_feed/

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

chandlerc, 6 months ago to random

FYI & note to future self for easier finding and referencing Arm and Neon intrinsics:

https://arm-software.github.io/acle/main/acle.html
https://arm-software.github.io/acle/neon_intrinsics/advsimd.html

These are much more effective than the ARM developer site -- can just use normal search, and they even include things bizarrely missing on the ARM developer site like vst1_*.

(I've probably been directed at these at least twice before, but maybe by posting this will help me remember the right place to go for the reference...)

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Paxxi

chandlerc, 7 months ago to random

Which bit ranges within a 64-bit pointer is randomized by ASLR? Interested in answers for the following permutations:

{heap, stack} x {x86, aarch64} x {Linux, Windows, macOS}

For "heap" I mean "mmap-to-acquire-heap" as typically used by memory allocators like tcmalloc and such.

I assume the answer is somewhat a function of page size -- I care about 4k, 64k (AArch64), and 2m (x86-64) page sizes.

Tried searching around for this, and all I find are heap and stack on 32-bit x86.

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

chandlerc, 3 months ago to random
A random and weird AArch64 performance question I'm mulling...

Which instruction sequence in a linear dep chain is better?
a)
lsr ... #7  
rbit ...  
clz ...  
b)
rev ...  
clz ...  
It seems like (b) should be clearly superior.... unless some uarch does fusion of rbit and clz to a single uop (ctz-esque). If there is fusion (a) could easily be better...

Anyone know of such a uarch? Worth avoiding (b) in case of a future one?

(I'm working on getting measurements for the M1...)
reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ chandlerc

chandlerc, 7 months ago to random

(Finally) posted one of my recent fun-coding results:
https://github.com/carbon-language/carbon-lang/pull/3327

(will be contributing back to Abseil as well if the benefits prove out)

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

chandlerc, 7 months ago to random

I'm still super new to Arm ISA-specific stuff, but maybe folks here can maybe help me out with some maybe-basic questions?

Why does vget_lane_u64 produce fmov with Clang rather than umov? Maybe the two are actually the same instruction under the hood and the disassembler doesn't know which to use? Or should I worry about having the right architecture flags?

Also, is that instruction super slow in practice (I'm on an Apple M CPU)? Or does my sampling profiler just attribute lots to it?

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

chandlerc, 7 months ago to random

How does one get attention of the GitHub powers-that-be when your actions start queuing endlessly without progress?

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

chandlerc, 29 days ago to random

Northern lights in the SF bay area (up on our mountain), wild.

Thanks to @hanadusikova for having the right phone camera to get the great photo.

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ andrewwillmott

steve, 4 days ago to TVTooHigh

TFW #TVTooHigh but you left multiple feet of wall above it anyway. Aim higher! Do better!

#RealEstateShitposting

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

chandlerc, 4 days ago

@steve See, I feel like people hav ethe TV-too-high thing all wrong.

Y'all seem to be watching TV sitting upright like it's time to play the piano or something. You need to be laid back watching TV. The call it a recliner because it RECLINES people.

Then the TV's gonna be at a fine height, just tilt it down a bit. I'd rather a bit higher TV and not have to be looking down my nose all the damn time.

🤡

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chandlerc, 5 months ago to random

I'm really exhausted of complaining about the #Bazel project's auto-close response that has instructions no one outside the Bazel team can follow for keeping actual issues that are impacting users open.

Has anyone played with Buck2? Any example C++ projects using it that I could look at?

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

chandlerc, 1 year ago to random

Love to see how mad the Onion is here. More folks should be this mad about this.

If you want a really solid critique, it's actually here too. They put their all into this piece and while it's sarcastic as usual, it really does cover so many aspects of how horrible the current trend of "news" in this space has been.

Go read it:
https://www.theonion.com/it-is-journalism-s-sacred-duty-to-endanger-the-lives-of-1850126997

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

chandlerc, 2 months ago to random

I continue to think this is one of the biggest insights I have had professionally:

Communicating "up" to "senior leadership"[1] is almost entirely about iterative synthesis and refinement of a reasonable abstraction for them to use to understand and communicate about what you're actually doing/proposing/asking/etc.

[1]: In whatever form this takes. But specifically folks as far away as you might call "executives" in a business context or similarly breadth/scale in another org context

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

chandlerc, 2 months ago to random

Second edition of the Carbon Copy is out!

https://github.com/carbon-language/carbon-lang/discussions/3869

This one discusses Carbon's "unformed state" as a way to address safety risks of uninitialized data but maximizing the control over exactly which tradeoff should be taken, and maximizing the ability to diagnose or mitigate bugs.

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

chandlerc, 28 days ago to random

Saw this again, and it remains excellent.

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...