janneke, (edited )
@janneke@todon.nl avatar

If you run "guix pull" today, you get a package graph of more than 22,000 nodes rooted in a 357-byte program---something that had never been achieved, to our knowledge, since the birth of Unix: a Full-Source Bootstrap.

Edit: Add blog post link inline https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-building-from-source-all-the-way-down/






@reproducible_builds
@fsf
@fsfe

janneke,
@janneke@todon.nl avatar
bugaevc,
@bugaevc@floss.social avatar

@janneke not to underpaint the importance and coolness of this achievement, here's an uninformed question that you probably get a lot: how does this work wrt to depending on a Linux kernel (which is tons of C), some basic userland (or can it run as PID 1-and-only?), and x86 hardware (which... who knows what it does) to run this 357 byte binary?

If you can't trust a compiler to build your program correctly, why can you trust a kernel and some hardware to run your binary correctly?

janneke,
@janneke@todon.nl avatar

@bugaevc
Good question! Of course: you can't.

There is currently no good answer to that other than that we chose to start on getting rid of the obviously unnecessary and "easy" binary seeds first. Or: different people have different interests and competences, if we start then eventually we'll probably get there someday. There are some ideas, though.

The least elegant but easiest "solution" would be to revert to Diverse Double Compliing (DDC, https://dwheeler.com/trusting-trust/). The low level tools (stage0, m2-planet, and mes) can easily do cross builds. You could build on different architectures, and kernels if you like and compare package checksums.

We did something like this for Mes (all x86_64-linux, though) at the fifth reproducible builds conference (RB-V, https://guix.gnu.org/en/blog/2019/reproducible-builds-summit-5th-edition/)

Running as PID 1: During the same RB-V conference, Ludovic Courtès prototyped building a Guix package in the initial ramdisk. After the build the package is discarded, but before that its checksum is printed and can be checked with a build under GNU/Linux.

People have been working to build tiny kernels, such as: https://github.com/ironmeld/boot2now.

Also, Stage0 was designed to also run on the Knight VM, one could imagine running that on simpler hardware, or running the VM on different machines/architectures, dunno.

civodul,
@civodul@toot.aquilenet.fr avatar

@bugaevc Speaking of the role of the kernel, an interesting question is how to implement isolated builds on the —see “Isolated build environments” at https://guix.gnu.org/en/blog/2020/childhurds-and-substitutes/ for an overview.

I’m curious what you think of this!

@janneke

bugaevc,
@bugaevc@floss.social avatar

@civodul hi!

I'm probably not Guix-savvy enough to fully comprehend the issue here — but as I understand it, you want to be super explicit about what each package needs to be built. Do you include libc, cc, binutils into this list of dependencies? (I imagine you do, otherwise it wouldn't be reproducible.) Apparently you do include /bin/sh.

@janneke

bugaevc,
@bugaevc@floss.social avatar

@civodul

So yeah, the Hurd servers aren't much different or any more "external" to the environment than /bin/sh. I don't think you should be firmlinking stuff from the host; you should probably just spawn a mini subhurd for each build. You want pipes and fork/exec, so you need pflocal, proc, and exec servers.

@janneke

bugaevc,
@bugaevc@floss.social avatar

@civodul

(Also /servers/proc, mentioned in your mail, is not a thing, of course 🙂 — the proc server is one of the two servers, the other one being auth, that are not accessible through the file system, but only through _hurd_ports.)

@janneke

bugaevc, (edited )
@bugaevc@floss.social avatar

@civodul

Your mail about /bin/sh also raises an interesting topic of paths. Do you want to change /dev/null and /servers/exec to some other (hash-derived I would imagine) paths? Sounds wild but you totally could!

You could then either patch glibc (and everyone who expects to find /dev/null at its usual place), or provide symlinks. But then again I don't know enough about Guix to judge here.

@janneke

bugaevc,
@bugaevc@floss.social avatar

@civodul

Unfortunately all this wouldn't help you too much with bootstrapping from source, since you cannot do I/O easily on the Hurd like you can on Linux with a few instructions; you need to do RPCs and all that (even to get your argv). This is of course hidden from you when you're using glibc.

@janneke

civodul,
@civodul@toot.aquilenet.fr avatar

@bugaevc Exactly! So the question becomes: assuming you have nothing but the Mach syscalls at your disposal, what chain of programs building on each other would eventually let you run a proc and an exec server so you have the beginning of a POSIX build environment?

The whole stage0/M2/Mes story on Linux was quite a puzzle; its Hurd version would push it further. :-)

@janneke

bugaevc,
@bugaevc@floss.social avatar

@civodul

> Also, one could argue that things like /dev/null have a well-defined interface that’s set in stone and that, consequently, how they’re implemented does not matter at all.

Yes, but also no: there certainly can be differences in behavior that are allowed by the interface (where it explicitly doesn't guarantee something), but (due to bugs) can influence the outcome. For instance, does every write to /dev/null always write the whole buffer, or can there be short writes?

@janneke

bugaevc,
@bugaevc@floss.social avatar

@civodul

Or: can a signal interrupt a write to /dev/null? (On SerenityOS the answer used to be no, on the Hurd it's a resounding yes, dunno about Linux.)

@janneke

bugaevc,
@bugaevc@floss.social avatar
civodul,
@civodul@toot.aquilenet.fr avatar

@bugaevc The Hurd code lives in /gnu/store/…-hurd-, but the translation points in the build environment would remain /dev/ and /servers/*. Changing that would be impractical and bring nothing.

@janneke

bugaevc,
@bugaevc@floss.social avatar

@civodul

Here's a fun little problem: if you have lost your proc and auth ports, but still have your fs root dir port, how can you recover those two?

@janneke

civodul,
@civodul@toot.aquilenet.fr avatar

@bugaevc Possibly (but not necessarily) by looking up /servers/proc for the first one; as for auth, it’s forever lost?

@janneke

civodul,
@civodul@toot.aquilenet.fr avatar

@janneke @bugaevc Actually I keep making the same mistake: there’s no /servers/proc but for some reason we have it in childhurds, just with no translator on it (I may be the guilty party :-)).

bugaevc,
@bugaevc@floss.social avatar

@civodul

Yes, /servers/proc is not it :)

I was thinking of the following scheme, which I have not tried, so this is just a theory.

You create an executable (perhaps as an unnamed file) that is setuid to yourself, and then exec it (not over your own task, unless you want that), without passing an auth or proc ports (as you have none).

@janneke

bugaevc,
@bugaevc@floss.social avatar

@civodul

The translator notices this and creates a new auth handle based on its idea of your effective uids/gids (see libfshelp/exec-reauth.c); and then the exec server gives the new task a fresh proc port. You cannot access the new task because of setuid/EXEC_SECURE, but as you created the executable you still control what it does.

@janneke

bugaevc,
@bugaevc@floss.social avatar

@civodul

In particular it may send its proc/auth ports back to the original task, and the original proc port may then be recovered by a simple

proc_task2proc (other_proc, mach_task_self (), &my_proc)

The exact auth port I don't think can be recovered, but at least you now have another auth port with your effective uids/gids.

@janneke

civodul,
@civodul@toot.aquilenet.fr avatar

@bugaevc The build environment includes nothing bug the explicitly-declared userland dependencies. If a package depends on GCC and Binutils, it gets them; if not, it doesn’t.

There’s no /bin/sh there—no /bin, no /usr, nothing.

On Linux, there’s /dev and /proc, but for separate namespaces.

@janneke

theruran,

deleted_by_author

  • Loading...
  • bugaevc,
    @bugaevc@floss.social avatar

    @theruran @janneke I was thinking something along these lines:

    find an "open source hardware" board where you can somehow verify the hardware aren't playing games on you (in particular not running all of your code in a nearly undetectable hypervisor, like we know Intel does...), probably some RISC-V board

    bugaevc,
    @bugaevc@floss.social avatar

    @theruran @janneke

    run you bootstrapping code on it with no OS whatsoever; hopefully it doesn't need much from the OS

    you'd have to build in a serial driver or something like that (blinking LEDs is cool but you can't input program source this way), not that I have any idea about hardware

    theruran,

    deleted_by_author

  • Loading...
  • theruran,

    @bugaevc @janneke and could be another approach, right? it can host GCC to build Linux already?

    bugaevc,
    @bugaevc@floss.social avatar

    @theruran @janneke the Hurd surely can run GCC and cross-compile Linux; but I'm not sure you would be winning much, for two reasons:

    1. It's nowhere near as trivial to do "syscalls" as on Linux — on Linux you place some values into some registers and perform "int 0x80" or "syscall", and that's it, you've called write or exit. On the Hurd, these all are implemented in glibc on top of Mach IPC, and that needs quite a lot of code to happen.
    bugaevc,
    @bugaevc@floss.social avatar

    @theruran @janneke Here's a project of mine where I simply print "Hello world" without relying on glibc: https://github.com/bugaevc/hello-hurd — but that too is written in C, imagine writing it all in hex.

    1. Linux is huge, but you can build it in a minimal configuration (see https://tiny.wiki.kernel.org/). Mach may be a microkernel, but it's minimal in functionality, not size. In fact it's a meme in the microkernel community just how large for a microkernel Mach is. But I don't have any numbers to quantify this.
    bugaevc,
    @bugaevc@floss.social avatar

    @theruran @janneke another issue that arises once you take hardware into the picture is: you cannot cross-compile hardware; nor can you hash your hardware and compare it to some known-good hardware.

    You may trust your board to really run the code you give it; you may manage to bootstrap a x86_64-linux-gnu GCC on it and confirm it's identical to what your distro ships. But that still doesn't guarantee you that your Intel processor actually runs what your binary says.

    civodul,
    @civodul@toot.aquilenet.fr avatar

    @bugaevc We need awareness without resignation: awareness that many other issues are yet to be addressed, while making progress in every way we can.

    That Linux userland is “addressed” by what @janneke et al. have been doing is already a huge step forward, one long considered unachievable. Others will work on hardware and someday folks will meet halfway. :-)

    @janneke @theruran

    akyle,

    @civodul @bugaevc @janneke @theruran

    For the hardware side of things, @bunnie has been making some awesome progress towards methods for trusting modern chips, for example the infra-red in situ method of silicon inspection: https://www.bunniestudios.com/blog/?p=6712

    janneke,
    @janneke@todon.nl avatar

    @akyle @civodul @bugaevc @theruran @bunnie
    Oh, that's waay cool!

    vertigo,

    @bugaevc @theruran @janneke I have ideas on how to handle hardware interfaces.

    I tried to realize this before with my Kestrel Computer Project. But, zero people showed any willingness to help, because it was still kinda sorta too abstract. Or, maybe I was just too nostalgic (the idea was to "revert progress" and develop a neo-retro platform from which we could once again move forward from).

    One of my core ideas was to standardize (memory-mapped) I/O register interfaces for a wide variety of peripheral classes using an I/O fabric like (or even forking) RapidIO, kind of like how the most sophisticated video cards of today still support true-blue VGA emulation. I reject the idea that it's "impossible" to achieve with other classes of peripherals.

    But, that project is dead now due to (1) nobody willing to help out, and (2) me burning out.

    janneke,
    @janneke@todon.nl avatar

    @vertigo @bugaevc @theruran
    Yeah. One of the things that helped a lot, especially in the first couple of years, was the enormous amount of mental support that I got.

    stikonas,
    @stikonas@fosstodon.org avatar

    @bugaevc @janneke https://github.com/fosslinux/live-bootstrap project has some initial code to bootstrap Linux. It can build Linux but we still need to kexec into it (which shouldn't be too hard).

    davidak,

    @janneke what an impressive and historic achievement!

    I appreciate the efforts of all those involved. You have my greatest respect!

    is also implementing it: https://github.com/NixOS/nixpkgs/pull/227914

    @civodul

    civodul,
    @civodul@toot.aquilenet.fr avatar

    @davidak @janneke That’s great news!

    janneke,
    @janneke@todon.nl avatar

    @civodul @davidak
    If we knew we would have mentioned it in our blog post!

    Just found that it was @emilyposting who started this recent @nixos_org effort, yay!

    aziz,

    @janneke @fsf @fsfe @reproducible_builds very inspiring very happy best wishes

    janneke,
    @janneke@todon.nl avatar

    @aziz @fsf @fsfe @reproducible_builds
    Thank you! We (well, mostly @ekaitz_zarraga and @stikonas) are getting very close to bootstrapping tinycc on riscv64. Exciting times ahead!

    stikonas,
    @stikonas@fosstodon.org avatar

    @janneke @aziz @fsf @fsfe @reproducible_builds @ekaitz_zarraga Indeed! Right now we can bootstrap all the way from to , then use to build very first build of (we can call it mes-tcc). mes-tcc can then build the next build of tinycc (boot0-tcc). Unfortunately, at the moment boot0-tcc segfaults. Today, I fixed one crash which was due to Global Offset Table being all zeros but it turns out we are now hitting another segfault, so more work is needed.

    kirschwipfel,
    @kirschwipfel@nerdculture.de avatar

    Gigantic achievement! My respect and congratulations!
    @janneke @fsf @civodul @reproducible_builds

    janneke,
    @janneke@todon.nl avatar

    @kirschwipfel @fsf @civodul @reproducible_builds
    Thanks! Still lots to do, but yeah we're very happy with where we are now!

    mattodon,
    janneke,
    @janneke@todon.nl avatar

    @mattodon @fsf @fsfe @reproducible_builds
    Thanks, we think so too 😄

    maltimore,
    @maltimore@social.tchncs.de avatar

    @janneke @fsf @fsfe

    What I don't understand: such a bootstrap must've been done at least once before in history, otherwise we wouldn't have any compiled programs, right? Why wasn't it possible to go the route of the previous bootstrap again? I'd appreciate a link to some further reading on this. 🙏

    janneke,
    @janneke@todon.nl avatar

    @maltimore @fsf @fsfe
    Some reasons for this include: The process wasn't documented, the code was lost, used many different hardwares, it took about 50 years, untangling history is hard.

    For that last remark, just look at the Java or Rust bootstraps (they needed many, many steps) or the sheer impossibility to bootstrap the NPM/Node distaster.

    stikonas,
    @stikonas@fosstodon.org avatar

    @janneke @maltimore @fsf @fsfe Also historical bootstrap might have used proprietary programs, so even historical bootstrap documentation might not help.

    janneke,
    @janneke@todon.nl avatar

    @stikonas @maltimore @fsf @fsfe
    When I said "source code was lost" I didn't even think of this, but you're right: GNU's not Unix!

    jonny,
    @jonny@neuromatch.social avatar

    @janneke
    @hipsterelectron this seems like something up ur alley

    csolisr,

    Today I learned that you managed to make the build of literally everything get based on a single binary file

    janneke,
    @janneke@todon.nl avatar

    @csolisr @fsf @fsfe
    Of course, lots of people were involved, but yeah \o/

    csolisr,

    @janneke @fsf @fsfe

    And here is where I lament the singular/plural you merger, meant y'all there, great job

    janneke,
    @janneke@todon.nl avatar

    @csolisr @fsf @fsfe ah, of course you did, oh well! ;)

    NGIZero,
    @NGIZero@mastodon.xyz avatar

    Wow congratulations @janneke that is truly impressive!

    janneke,
    @janneke@todon.nl avatar

    @NGIZero
    Thanks! And thank you so much for your continuing support!!!

    byterhymer,

    @janneke @fsf @fsfe

    What next? Pairing something such as this with LiveHD (e.g. https://github.com/masc-ucsc/livehd) to bootstrap the "hardware" from source too?

    janneke,
    @janneke@todon.nl avatar

    @byterhymer @fsf @fsfe
    For me personally (see the blog post) that would be: cleaning-up the FSB---we cut quite some corners---, getting rid of the ancient gcc-2.95.3 dependency (directly build gcc-4.6.4), getting Gash/Gash-Utils to run on top of Mes, and and RISC-V support, hopefully followed by ARM/AArch64.

    But yeah, I really hope that others will address the hardware and kernel bits of the bootstrap!

    ArneBab,
    @ArneBab@rollenspiel.social avatar

    Congratulations @janneke ! It’s so awesome to see you reach this goal!

    And I’m very much looking forward to seeing what else you accomplish.

    janneke,
    @janneke@todon.nl avatar

    @ArneBab
    Thanks, Arne! Maybe something Hurd'ish?

    ArneBab,
    @ArneBab@rollenspiel.social avatar

    @janneke That would be very cool, yes!

    janneke,
    @janneke@todon.nl avatar

    @ArneBab
    I got Mes built on the Hurd once, so there is some support. No support in Stage0 yet, though.

    civodul,
    @civodul@toot.aquilenet.fr avatar

    @janneke It took a while to get it merged, but now we can celebrate! 🎉

    Also, great to see that is funding the next steps, with @ekaitz_zarraga and others hard at work!

    clacke,

    I was going to ask, @janneke , how much of your time in these last several years has been funded and how much have you done on your own time?

    janneke, (edited )
    @janneke@todon.nl avatar

    @clacke
    I'd like to think that I'm well funded which enables me to do things that are important and fun.
    I'd also like to think that all my time is my own; technically my current Hurd work isn't funded, but that's how Mes started off too, so yeah.

    joeyh,

    @janneke great accomplishment

    Needing a kernel is an important footnote though.

    giomasce,

    @joeyh @janneke I had begun working on that part a few years ago: https://gitlab.com/giomasce/asmc. Then life happened and I ran out of steam for that, but hopefully I was not too far from compiling and running Linux at computer boot time, from a handful of KB of binary code.

    dthompson,

    @janneke holy shit I didn't know that this much progress had been made in bootstrapping. I thought we were still maybe years away from full source bootstrap. incredible work!

    dthompson,

    @janneke does this work for all architectures that guix supports or a subset?

    janneke,
    @janneke@todon.nl avatar

    @dthompson This is currently only i686-linux and x86_64-linux.

    Work has been ongoing for ARMv7 (and AArch64) for quite some time now, but is stalled, probably until we have RISC-V.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • DreamBathrooms
  • ngwrru68w68
  • tester
  • magazineikmin
  • thenastyranch
  • rosin
  • khanakhh
  • InstantRegret
  • Youngstown
  • slotface
  • Durango
  • kavyap
  • mdbf
  • tacticalgear
  • JUstTest
  • osvaldo12
  • normalnudes
  • cubers
  • cisconetworking
  • everett
  • GTA5RPClips
  • ethstaker
  • Leos
  • provamag3
  • anitta
  • modclub
  • megavids
  • lostlight
  • All magazines