chandlerc,
@chandlerc@hachyderm.io avatar

Is there a good reason targeting doesn't seem to fold shifts into operands when it would require shifting in multiple operands?

I'm seeing lots of:

lsr xN, xN, #7  
and x?, x?, xN  
...  
and x?, x?, xN  

With no other uses of xN.

Is there a reason to prefer this over:

and x?, x?, xN, lsr #7  
...  
and x?, x?, xN, lsr #7  

While "duplicated", it seems like it would save an instruction at least in decode?

steve,
@steve@discuss.systems avatar

@chandlerc @TomF not sure it’s “the” reason, but a lot of arm64 designs will crack those shifted ands into 2 uOps, so you save a uOp by pulling it out.

steve,
@steve@discuss.systems avatar

@chandlerc @TomF it’s also pretty common for a design to allow “simple” ALU ops on every pipe and “complex” ops on only a subset, which would also tip the balance (see simple vs complex address generation on some x86 uArches, for example).

TomF,
@TomF@mastodon.gamedev.place avatar

@steve @chandlerc The "free" shifter is one of those perfect examples where what was a GREAT idea on one implementation of the arch turns out to be a TERRIBLE idea on later ones.

Other examples are MIPS branch-delay slot, SPARC sliding register window, and every load-link/store-conditional implementation ever.

chandlerc,
@chandlerc@hachyderm.io avatar

@TomF @steve Totally makes sense.

So from a performance perspective, seems reasonable to think of this purely as a encoding density hack with no real benefit once decoded compared to normal shifts?

(This isn't a case where I have any data or evidence that says anything to the contrary, I was just noticing it in the compiler output and wondered what was up, so appreciate the pointers.)

steve,
@steve@discuss.systems avatar

@chandlerc @TomF yep, that’s exactly right

steve,
@steve@discuss.systems avatar

@chandlerc @TomF at least some designs special-case left-shift by 1-3, however, since those come up all the time in addressing, and handle them like an unshifted op.

chandlerc,
@chandlerc@hachyderm.io avatar

@steve @TomF Ooof, that both makes perfect sense, but also makes me really want that aspect of any u-arch to be clearly documented given that the compiler needs a pretty sharply different strategy to generate efficient code there.

Maybe LLVM already has this info? I've not seen surprising stuff in more "normal" addressing instruction sequences so far.

TomF,
@TomF@mastodon.gamedev.place avatar

@chandlerc @steve So... while this is a reasonable request from a "control ALL the things" perspective, just be aware that (a) there's a huge variety of uarchs out there (b) they are far more complex internally than you probably think and (c) 99.9999% of the time this will not affect your performance in any measurable way :-)

chandlerc,
@chandlerc@hachyderm.io avatar

@TomF @steve I mean, I'm somewhat aware of the diversity of uarch's out there.... And I don't really want more knobs in the compiler. I hate them.

But I'm specifically saying that thresholds where encoding A vs. encoding B results in 2 vs. 1 uop seem very important to document and teach compilers about. Not every other difference. =D Nicer to not have them at all, but if they exist, we need to know? And this doesn't seem like a terribly frustrating threshold to model.

chandlerc,
@chandlerc@hachyderm.io avatar

@TomF @steve I'm much more salty about the uarch thresholds of "only N instructions fitting criteria X within each aligned 32-byte encoded sequence" on Intel CPUs which swing perf by 10% - 20% and are nigh impossible to model even when they are documented...

TomF,
@TomF@mastodon.gamedev.place avatar

@steve @chandlerc Haha - "lea" is such an ugly weird little instruction, but it turns out it's so annoyingly useful it sneaks into every arch :-)

  • All
  • Subscribed
  • Moderated
  • Favorites
  • llvm
  • kavyap
  • InstantRegret
  • ethstaker
  • DreamBathrooms
  • mdbf
  • magazineikmin
  • thenastyranch
  • Youngstown
  • GTA5RPClips
  • slotface
  • Durango
  • khanakhh
  • rosin
  • everett
  • provamag3
  • vwfavf
  • tacticalgear
  • osvaldo12
  • cisconetworking
  • cubers
  • modclub
  • ngwrru68w68
  • Leos
  • anitta
  • normalnudes
  • tester
  • megavids
  • JUstTest
  • All magazines