7️⃣ Here's the 7th installment of my series of posts highlighting key new features of the upcoming v256 release of systemd.
In systemd we put a lot of focus on operating with disk images, specifically file system images that carry an expressive GPT partition table – something that we call DDIs ("Discoverable Disk Images").
DDIs are supposed to carry dm-verity authentication information, i.e. every single access to them is typically cryptographically protected, and linked back to a set of signing keys maintained by the system (ideally in the kernel keyring). systemd uses DDIs for the system itself, for systemd-nspawn containers, for systemd portable services, for systemd-sysext system extensions, for systemd-confext configuration extensions and more.
Many of systemd's tools have a --image= switch that tells them to operate on a DDI rather than directly on the file system.
In my personal view, I am pretty sure an OS (specifically: all the code and immutable vendor shipped resources) should be composed entirely from DDIs, because they bring a very high security level (i.e. every single read is validated when it is made), but are nicely composable, …
… so that you can have the basic OS image, layers of extensions on top, and finally app images as payload – all shipped as DDIs with strongest cryptographic guarantees.
So, while systemd has been strong on DDIs already, there's one thing we did not provide until v256: the ability to work with DDIs from unprivileged code. Mounting file systems is after all a privileged operation on its lowest level and (with some exceptions) not accessible to unprivileged users.
And that for a reason: kernel file system developers mostly do not consider attacks on the kernel through rogue file system images a security vulnerability. File systems are very complex data structures after all, and guaranteeing that a rogue fs image can't exploit the kernel (or just guarantee algorithmic boundedness) is very very hard. Moreover, file systems can carry dangerous things, such as SUID and SGID binaries, or executables with file system capabilities set.
Allowing unprivileged users to just arbitrarily mount file systems is hence a security issue on many levels.
With v256 we are opening this up nonetheless – within limits. Specifically, there's now a small IPC interface where clients can pass an fd to a disk image file to, and get back a mount fd they can attach to a location in the file system. To lock this down securely, a couple of requirements are enforced however.
Primarily this means: the DDI must come with valid dm-verity data and a signature recognized by the system's keyring (well, if this is missing a polkit authorization is attempted – the user might possibly allow this anyway, if polkit is letting them). And the client must also pass in a user namespace fd (which cannot be the system's main one) to which the mount is restricted.
Various tools (including: systemd-nspawn, systemd-dissect, RootImage= in service files) have been updated to make use of this new IPC service, and thus can now operate without privileges. Or in other words: there's now unprivileged systems-npsawn containers. Yay!)
And that's all for today. See you soon for the 8th installment of this series.
6️⃣ Here's the 6th installment of my series of posts highlighting key new features of the upcoming v256 release of systemd.
In the 2nd installment of this series we have already discussed system and service credentials in systemd a bit. Quick recap: these are smallish blobs of data that can be passed into a service in a secure way, to parameterize, configure it, and in particular to pass secrets to it (passwords, PINs, private keys, …).
@flexagoon For now the focus was privileged services, but we laid some groundwork in v256 to eventually make this available to user services too. (i.e. there are user-scoped encrypted credential's now, see 2nd installment of this series)
I have no experience with podman. I doubt they support credentials though, the OCI container world is generally a bit "laissez faire" on security topics. i.e. env vars, because yolo.
5️⃣ Here's the 5th installment of my series of posts highlighting key new features of the upcoming v256 release of systemd.
I am pretty sure all of you are well aware of the venerable "sudo" tool that is a key component of most Linux distributions since a long time. At the surface it's a tool that allows an unprivileged user to acquire privileges temporarily, from within their existing login sessions, for just one command, or maybe for a subshell.
@tsrberry sudo has a dlopen() based plugin interface. It is used to use LDAP directories as source for sudo rules. (I think that's pretty much it's only use.)
For this, highly privileged sudo, running in an inherited execution context controlled by an unpriv user is doing complex network protocol parsing. What would possibly go wrong?
@swick i have no idea what toolbox is but if that is some container tech that allows you to escape the container via sudo then I probably don't even want to know.
@quitelost it's a reference to SO_PEERCRED, SO_PEERPIDFD, SO_PEERSEC, SO_PEERGROUPS. It's how a process can securely determine the peer of an AF_UNIX socket, by asking the kernel.
@mattdm ah, i wasnt aware of what it did, i thought it was just visudo by another name.
My first reaction to the concept is that it wouldnt fit into run0 because run0 is really just a service mgr frontend and and what you describe is more of a file system access thing.
I am curious though: if I use sudo -e on /etc/shadow, does it place a copy of it temporarily in /tmp chowned to my unpriv user? My first reaction to that is that it sounds dangerous, dunno, you just exfiltrated a file there.
@muvlon@TheStroyer there's a knob for enabling system-wide nnp in systemd's system.conf. but for a general purpose distro its probably too early to enable. But i think getting there would be a very worthy goal.
@lachlan yeah, su and sudo function very similar. both are suid binaries. The biggest differences are in the interface: sudo reads a complex config file, while su just takes commands from the cmdline. Other than that, on the technical level they are the same.
@funkylab@mattdm that wouldn't work. editors are generally expected these days to update files atomically, i.e. when saving an edited file, they create a new one, write the whole new data to it, copy access mode and so on over, and then move them over the old file. That means other tools either see the old or the new file, never half-written ones. But that of course falls apart with any memfd or /proc/pid/fd/ shenanigans.
@mattdm well, exfiltrating explicitly is fine. run0 is a priv escalation tool, and if that's what you want to do with your privs then that's entirely fine.
I am more concerned about implicit exfiltration: i.e. you copy out protected data to /tmp of all places, and when your editor crashes it will be left there, and noone even thinks about it, because it's invisibly hidden in sudo.
But dunno, i am not totally against the scheme. It's probably better to accept this than making people…
But it certainly raises alarm bells. Probably means it's something to think about for a bit more, maybe we can figure out something better. Dunno.
I like the gvfs thing, it's conceptually quite smart. But of course, outside of the GNOME world not really gonna work. And as much as I am a GNOME guy I certainly would never have the idea to edit config files with GNOME's editor.
3️⃣ Here's the 3rd installment of my series of posts highlighting key new features of the upcoming v256 release of systemd.
You might have heard of the sd_notify() protocol that services running on systemd can use to notify the service manager about status changes, in particular about service readiness. systemd uses that to synchronize start, reload and stop operations between daemon code and service manager (as well as a multitude of other things).
@hsaliak unlike af_unix and af_vsock, af_inet is unreliable, needs encryption/authentication and so on, comes available only during late boot, typically after dhcp and so on, and hence is a very very different beast.