whitequark,
@whitequark@mastodon.social avatar

is assembly language (for the sake of argument, x86 assembly as understood by an assembler released by intel) typed?

whitequark,
@whitequark@mastodon.social avatar

many of you have never implemented a CPU and it shows o:3

(that in itself isn't a problem, it's just funny to see someone who've never implemented a CPU to have absolutely ironclad confidence about proprietary details of Intel's x86 implementation specifically)

demofox,
@demofox@mastodon.gamedev.place avatar

@whitequark how much can a div cost Michael, 1000 cycles?

natty,
@natty@astolfo.social avatar

@whitequark ​:neofox_melt_3:​ we gave up on trying to do FPGAs after uni made us do VHDL in a Windows XP VM running at 5 FPS and later failing finals on a randomly chosen question about FPGAs

The world doesn't want us to make chips

whitequark,
@whitequark@mastodon.social avatar

@natty this is why I dedicated a non-trivial fraction of my entire life to building tools like Amaranth and YoWASP, yeah

whitequark,
@whitequark@mastodon.social avatar

on the topic of types, the synthesis of the very large amount of replies I got so far is "assembly language has rules and classifying these rules into syntactic rules and typing rules (among others) requires drawing a clear line between the two, which can be hard; however, one could argue that type information has to propagate through the program, and assembly language generally lacks that"

whitequark,
@whitequark@mastodon.social avatar

the next time i want to feel awake but coffee isn't cutting it i'll ask a question about RISC vs CISC, i think

whitequark,
@whitequark@mastodon.social avatar

if I think of what a typechecker does, it involves maintaining some kind of type environment, where syntactic positions (such as names, but could be stack slots for example) are corresponded to types of values stored in them

most assemblers don't do anything like that, so i think i went from "it's probably typed" to "it's probably not typed" on the basis of that, and despite the fact that there's several incompatible kinds of values involved

raggi,
@raggi@rag.pub avatar

@whitequark yeah, that's definitely a fair abstraction as we tend to live there all the time and i would probably hold that position in some context too.

it's very tied to value use in a specific context though, and also breaks down at the other end - the very very high level space - where there are many types that never manifest and are never really subject to checks in the common ways, but humans talk about entirely independently of any program or implementation

whitequark,
@whitequark@mastodon.social avatar

@raggi certainly; it's not any more possible to say exactly what a type is than what a man is. but this was still insightful, and that was one of my goals!

WomanCorn,
@WomanCorn@schelling.pt avatar

@whitequark @raggi

What is a type? A miserable little pile of values!

(Sorry)

SludgePhD,
@SludgePhD@mastodon.social avatar

@whitequark I think the important part is that those values are typically incompatible on a syntactic level already, so there's no need for an additional type-checking layer

Paxxi,
@Paxxi@hachyderm.io avatar

@whitequark I'm guessing yes, kinda? Instructions are typed right? operating on a signed/unsigned of size X.
Do you get any help to track types? Not that I know of

mcc,
@mcc@mastodon.social avatar

@whitequark I haven't yet implemented a CPU but I HAVE implemented an x86-64 instruction decoder* and this is why (see other reply) I flat out refuse to attempt to answer questions about the x86 ISA

  • This is a lie. I implemented the test suite for an x86-64 instruction decoder. This by itself was unspeakably complicated
chrisvest,
@chrisvest@mastodon.social avatar

@whitequark I have no idea how hardware is put together, but isn’t it just like you throw a bunch of numbers into a spreadsheet and then feed it to an eldritch abomination until a netlist comes out?

whitequark,
@whitequark@mastodon.social avatar

@chrisvest you got the basics down right!

chrisvest,
@chrisvest@mastodon.social avatar

@whitequark thinking about it, the closest I’ve ever gotten to hardware is probably that I’ve worked on the neo4j graph database, which IBM has used to do cycle timing verification of the Power9 chip (and maybe other stuff, I don’t know)

trcwm,
@trcwm@mastodon.social avatar

@whitequark no, it’s usually toggled from the front panel.

squid,

@whitequark Possibly a silly question, but does the answer to this change under CHERI, where there's a stronger separation between arbitrary bytestrings and actual pointers?

whitequark,
@whitequark@mastodon.social avatar

@squid yeah CHERI is pretty unambiguously dynamically typed

xgranade,
@xgranade@wandering.shop avatar

@whitequark It's typed, it's just that there's only one type.

(This is not serious, I have no earthly idea.)

whitequark,
@whitequark@mastodon.social avatar

@xgranade this is a take some people have very seriously and i think it's, at least, not well motivated

agocke,
@agocke@hachyderm.io avatar

@whitequark type checking is a form of semantic analysis and I’m not aware of any assemblers that do semantic analysis, but maybe I missed something.

whitequark,
@whitequark@mastodon.social avatar

@agocke does deciding whether the combination of prefixes, mnemonic, and operands valid count as syntactic or semantic analysis? I could see it either way on its face

raggi,
@raggi@rag.pub avatar

@whitequark yes, trivial gnu example:

_start:
add r1,0x1234

<source>:2: Error: immediate expression requires a # prefix -- `add r1,0x1234'

whitequark,
@whitequark@mastodon.social avatar

@raggi isn't that a syntax error rather than a type error?

(also I don't think x86 has an r1?)

raggi,
@raggi@rag.pub avatar

@whitequark it’s valid syntax in other contexts, so the syntax indicates some kind of property about the value, but it isn’t the value itself. Doesn’t that approximate the behavior of a type?

whitequark,
@whitequark@mastodon.social avatar

@raggi to you, what is a type?

raggi,
@raggi@rag.pub avatar

@whitequark at a fundamental level a type defines a set of allowed values, and gives that set an identifier. at an increasing level of practicality typed operations accept values from a set of types. at a more practical level a type validator accepts or rejects types in program context based on some implementations ability to handle them in that context

whitequark,
@whitequark@mastodon.social avatar

@raggi right! I guess if your context begins and ends at the boundary of a single operation it's hard to distinguish that from just syntax

mcc,
@mcc@mastodon.social avatar

@whitequark I feel like I have an answer to this but the problem is you specified x86 which is so large I don't feel I can make any particular statements about it with confidence

whitequark,
@whitequark@mastodon.social avatar

@mcc replace x86 with ARMv8?

mcc,
@mcc@mastodon.social avatar

@whitequark hm ok this is imo answerable but I am by the side of the road so it will be later

typeswitch,
@typeswitch@gamedev.lgbt avatar

@whitequark At the instruction level, there are a few types for operands, but we could call this syntax. It's not much of a type system if there's no propagation of type information throughout the program.

But the main thing that is untyped with assempler are the input & output states of basic blocks, or the arrangement of control flow, or the ABIs, and so on. This distinguishes assembler from a high-level language like C.

whitequark,
@whitequark@mastodon.social avatar

@typeswitch this is a really good point!

crzwdjk,
@crzwdjk@mastodon.social avatar

@whitequark It's a pretty weak type system but it's there, the thing is that it does implicit bitcasts between types. Though it even has for example different pointer types.

8051enthusiast,
@8051enthusiast@mastodon.social avatar

@whitequark hmm, would you consider "value has to be known at compiletime" to be part of the type system?

whitequark,
@whitequark@mastodon.social avatar

@8051enthusiast this just in: x86 assembly stabilized const generics decades before rust did

tsturm,
@tsturm@famichiki.jp avatar

@whitequark @steve On the most basic level, it’s all bytes and bits. I’d call that untyped. There were ways in 8-bit CPUs to deal with floating point numbers and strings through code, but the registers in these CPU had no type system.

Not an expert on modern x86, but all the way into the late 1990s that was true there, too.

whitequark,
@whitequark@mastodon.social avatar

@tsturm @steve on the most basic level Rust is all bytes and bits (types are erased during compilation) but no one would call Rust untyped

tsturm,
@tsturm@famichiki.jp avatar

@whitequark @steve Yeah, but the language (Rust or C) has types.

Assembly (at least for the old 8/16 bit CPUs) didn't have that concept. It's all just "move a byte from A to B" and "shift the bits to the left by one". No language concepts of what the bytes represent.

whitequark,
@whitequark@mastodon.social avatar

@tsturm @steve is assembly not a kind of programming language? (you'll note I've specified a very particular type and implementation of assembly in my post)

steve,
@steve@discuss.systems avatar

@tsturm @whitequark This would be equally true for C though. It's all char *.

whitequark,
@whitequark@mastodon.social avatar

@steve @tsturm should we be talking about strict aliasing here

julia,

@whitequark not really

whitequark,
@whitequark@mastodon.social avatar

@julia why?

is the C language typed? why?

julia,

@whitequark assembly works on bytes, you can tell it to use a floating point instruction on an integer all willy nilly, it doesn't care

whitequark,
@whitequark@mastodon.social avatar

@julia how would you use e.g. pshufb on an integer register (say, eax)?

julia,

@whitequark I don't mean a register when I say integer, I mean a byte in memory in the shape of an integer

julia,

@whitequark for whatever it's worth, I see C as weakly typed

whitequark,
@whitequark@mastodon.social avatar

@julia are registers the same type as memory?

julia,

@whitequark not really

whitequark,
@whitequark@mastodon.social avatar

@julia so, that's at least two different types then?

julia,

@whitequark they're not really types? they're just different regions of the hardware

whitequark,
@whitequark@mastodon.social avatar

@julia but i'm not talking about hardware (indeed, nothing requires that x86 be implemented in hardware); i'm talking about the abstraction that is the intel x86 assembly language

julia,

@whitequark I suppose there are "types" in the abstraction above the true CPU instruction set but shrug

whitequark,
@whitequark@mastodon.social avatar

@julia what's a true CPU instruction set?

julia,

@whitequark whatever instruction set is used by the CPU after it's JIT from x86

julia,

@whitequark Most modern CPUs don't actually execute x86 instructions directly, they translate it into another instruction set that is specific to that CPU- not too unlike webassembly.

whitequark,
@whitequark@mastodon.social avatar

@julia can you show me an example of an instruction in that ISA for any x86 CPU you like?

julia,

@whitequark they're usually proprietary and not very well documented, so no

whitequark,
@whitequark@mastodon.social avatar

@julia if you can't name a single instruction from it how can you be certain it's something that exists and can be considered an Instruction Set Architecture?

julia,

@whitequark okay, thats just outright conspiratorial- it's been publicly talked about quite a bit by both Intel and AMD.

whitequark,
@whitequark@mastodon.social avatar

@julia it's not, you're just repeating something you have no evidence of. I designed enough CPUs to know how they work; that knowledge leads me to conclude that comparing what's going on to "CISC JITted to RISC" is somewhere between "wrong" and "actively misleading"

julia,

@whitequark whatever, I'm playing stardew. I don't care to argue about it.

Modern CPUs run a whole damn OS under the hood, usually Minix. They're not as simple as just executing what they're given.

whitequark,
@whitequark@mastodon.social avatar

@julia that's just outright false

julia,

@whitequark google intel management engine and scheduler

whitequark,
@whitequark@mastodon.social avatar

@julia so, you're telling me that if I take apart the ME firmware image for my "Intel(R) Core(TM) i9-9880H CPU" that i have in this laptop, I'll find Minix in it, is that right? shall we place bets? ^_^

puppygirlhornypost,
puppygirlhornypost,

@whitequark @julia And while these are sensationalist (AMT isn't enabled by default) the point stands that part of the module has MINIX3 running.

puppygirlhornypost,

@whitequark @julia It's not exactly like it's something we can look at? I mean Haswell maybe. Not modern ME though.

whitequark,
@whitequark@mastodon.social avatar

@puppygirlhornypost @julia yes, exactly my point--a few Intel SoCs definitely used to run Minix3, but as far as I know modern ones don't, and "modern CPUs run a whole damn OS under the hood, usually Minix" is just outright false

puppygirlhornypost,

@whitequark @julia well, the problem is that ME still exists? It does things? What it does we're not exactly sure. I have a laptop that is Haswell based and was bought under the dell program for intel me disablement. I can still play drm videos on it fine? The wikipedia page seems a bit off. Still, again it's not like we can take a look at what it's running. We'd destroy the damn thing. I find it funny that the wikipedia page makes it seem like those dell laptops are so hard to get.

whitequark,
@whitequark@mastodon.social avatar

@puppygirlhornypost @julia there's actually a TOCTTOU bug in firmware signature checking on Haswell (I think? I'd have to double-check but I think HSW is old enough and KBL is def too new) which lets you run arbitrary unsigned firmware on production-fused CPUs

so you could just... not put the ME firmware image in. if I recall, on many of the CPUs, if you don't load ME at all they shut down themselves in 30 min after bootup

whitequark,
@whitequark@mastodon.social avatar

@puppygirlhornypost @julia (basically, it first reads the firmware to check its signature and then it reads it again to load the code. with an FPGA and very little effort you can make those two different, unrelated firmwares. it's like a weekend project)

whitequark,
@whitequark@mastodon.social avatar

@julia an ISA is a contract between a producer and a consumer. in an x86 CPU, those are the same entity, and there isn't anything you can point at and conclusively call it an ISA since there's no meaningful boundaries between any of it

is the functional unit control word "the true ISA"? is the microcode word "the true ISA"? nah, it's a category error

toadjaune,
@toadjaune@hostux.social avatar

@whitequark @julia now you got me curious about what the microcode actually does.

I'm pretty sure my CS teachers told me it was doing on-the-fly-CISC-to-RISC-translation, back then, but well, that wouldn't be the first simplication teachers do ^^

By any chance, do you have any insights into this ?

whitequark,
@whitequark@mastodon.social avatar

@toadjaune the way microcode generally works is that some instructions (not all. e.g. xor eax, eax isn't going to be microcoded, although i think in case of modern x86 specifically, based on RE, there is a possibility to trap on most/any instructions and run microcode instead) cause a change of state where instead of decoding an instruction directly to a control word for execution units, it fetches that control word for execution units from the microcode ROM/RAM

whitequark,
@whitequark@mastodon.social avatar

@toadjaune there is often some possibility for branching (jumping and sometimes looping) in the microcode as well, like to implement rep movsb type instructions

the EU control word is rarely something that is recognized as "RISC" because it is usually quite wide and has all sorts of weird functionality in it that you wouldn't ever expose to an assembly language programmer in first place, unless you're doing VLIW

whitequark,
@whitequark@mastodon.social avatar

@toadjaune this is like a bird's eye view. there is a decent amount of resources that goes into more detail, plus some open source microcoded CPUs

righto.com has some really good histrorical examples, though modern ones would be fairly different

toadjaune,
@toadjaune@hostux.social avatar

@whitequark oh, ok, so if I get this right, what you mean is that it's not exactly CISC->RISC per se, more like a free-for-all where they do whatever they want, and only microcode when it's actually useful ?

That's kinda what I would have expected, I guess.

I'd be very curious to see what the resulting post-microcode IS actually looks like on modern hardware though. See what tradeoffs they make.

whitequark,
@whitequark@mastodon.social avatar

@toadjaune yes

I think Intel published some papers?

niconiconi,

@whitequark @toadjaune Intel CPUs have a performance counter IDQ.MS_UOPS for counting µops generated by the microcode sequencer. Hardware-based profilers (like perf, toplev.py and VTune) use them to generate two metrics called Retiring.Heavy_Operations.Microcode_Sequencer.{Assists,CISC}. This is usually a very rare occurrence. A typical "CISC" is for integer division before Ice Lake, a typical "Assist" is for denormalized floating-point values.

toadjaune,
@toadjaune@hostux.social avatar

@niconiconi @whitequark oh, I'm surprised that they expose that.

One would think that it could be used to reverse engineer what the microcode/cpu is actually doing, and considering the lengths they go to keep secrets in that space...

whitequark,
@whitequark@mastodon.social avatar

@toadjaune @niconiconi oh, people have reverse-engineered far more than that; look up Mark Ermolov's work

whitequark,
@whitequark@mastodon.social avatar

@julia is that something that exists? if it does, why is it more important than any other level of abstraction?

foobarsoft,
@foobarsoft@mastodon.social avatar

@whitequark I would argue yes. Different registers/names for different sizes and possibly even separate FPU registers.

You can’t meaningfully operate on a floating point number in BX, or use normal instruction on an ARM FPU register.

And matrix registers are their own thing too.

gsuberland,
@gsuberland@chaos.social avatar

@whitequark hmm. almost certainly in the context of SIMD. and arguable that most GPR instructions behave according to specific types. I guess "typed but with lots of implicit direct casting"?

gsuberland,
@gsuberland@chaos.social avatar

@whitequark hmm actually maybe that's not the best way to describe it. more like "almost no type equality enforcement".

thepi,
@thepi@urusai.social avatar

@gsuberland @whitequark square hole-ly typed language

gsuberland,
@gsuberland@chaos.social avatar

@thepi @whitequark haaaaa, yeah that's perfect

mega,
@mega@chaos.social avatar

@gsuberland @whitequark isn't that C too? (Automatic casts between integers and floats)

gsuberland,
@gsuberland@chaos.social avatar

@mega @whitequark hence the clarification reply

steve,
@steve@discuss.systems avatar

@whitequark Sure, you've got the GPR type and the SSE/AVX/etc type.

whitequark,
@whitequark@mastodon.social avatar

@steve right! that's my take on it as well. but I've also seen assembly used as an example of a prototypical untyped language, which just seemed wrong

tef,
@tef@mastodon.social avatar

@whitequark i think it depends on what structures you're modifying with assembly

it might be fairer to call it the byte/word oriented version of "stringly typed programming", rather than plain untyped

zarbet,

@whitequark @steve and then there's the VAX arithmetic shift and round a packed decimal string...

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark @steve I would say "minimally typed" for that reason.

There's a few special registers and GPRs, but there's no propagation of type information outside of register type (e.g. nothing stopping you from writing a bunch of fp64s to an AVX register and then using them as fp32.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • kavyap
  • ngwrru68w68
  • osvaldo12
  • DreamBathrooms
  • mdbf
  • magazineikmin
  • thenastyranch
  • Youngstown
  • khanakhh
  • everett
  • slotface
  • tacticalgear
  • rosin
  • cubers
  • megavids
  • normalnudes
  • modclub
  • ethstaker
  • InstantRegret
  • GTA5RPClips
  • cisconetworking
  • Durango
  • Leos
  • provamag3
  • tester
  • anitta
  • JUstTest
  • lostlight
  • All magazines