Though these days you may want to look into things like systemd-nspawn instead of plain chroot.
This sounds very interesting! What's the scenario where you'd do this? Would you be, for example, emulating an ARM processor with qemu on an x86 computer and chrooting into Android on an eMMC?
~# sudo apt install qemu-user-static debootstrap
~# mkdir /tmp/arm
~# debootstrap --foreign --arch=armhf buster /tmp/arm http://deb.debian.org/debian
~# cp /usr/bin/qemu-arm-static /tmp/arm/usr/bin/
~# chroot /tmp/arm # from that point, you're running ARM!
~# /debootstrap/debootstrap --second-stage
Probably even faster on Asahi Linux, but having both macOS and a fast Debian at the same time is soo neat :)
systemd-nspawn is a great tool for this.
It's also useful for reverse-engineering router/IoT firmware.
systemd-nspawn --directory /tmp/rescue --boot -- --unit rescue.target
It should automatically find the boot partition and mount it as well.
Just looked and it looks like "Recovery Is Possible" hasn't been updated in a dozen years which dates my story, but I fondly remember overnight phone calls from panicked new sysadmins and telling them to be calm and "RIP it and get chrooted in" and then waking up to help them troubleshoot.
SR also has some rather handy SAM database editing facilities. Mount Windows at /mnt and then enable and reset the Administrator password. Jolly handy for getting super duper user access on a Windows box.
Its been a while since I installed Gentoo but you can probably quite easily add more stuff to your Gentoo install CD. I don't know if Gentoo have added a script to do all the bind mounts but "arch-chroot /bin/bash" is very convenient. I used to forget about /sys.
A linux-naive developer would expect to spawn a new process from a payload with access to nothing. It can't see other processes, it has a read only root with nothing in it, there are no network devices, no users, etc. Then they would expect to read documentation to learn how to add things to the sandbox. They want to pass in a directory, or a network interface, or some users. The effort goes into adding resources to the sandbox, not taking them away.
Instead there is this elaborate ceremony where the principal process basically spawns another version of itself endowed with all the same privileges and then gives them up, hopefully leaving itself with only the stuff it wants the sandboxed process to have. Make sure you don't forget to revoke anything.
A lot of things break if there's no /proc/self. A lot more things break if the terminfo database is absent. More things break if there's no timezone database. Finally, almost everything breaks if the root file system has no libc.so.6.
When you write Dockerfiles, you can easily do it FROM scratch. You can then easily observe whether the thing you are sandboxing actually works.
> no users
Now you are breaking something as fundamental as getuid.
> no users
I mean running as root. I think all processes on Linux have to have a user id. Anything inside a sandbox should start with all the permissions for that environment. If the sandbox process wants to muck around with the users/groups authorization model then it can create those resources inside the sandbox.
What I think you might mean is something like: "in modern statically linked applications written with languages like Go and Zig, it is much less likely for the them to call on OS services that require these sorts of resources".
The flags to unshare are copies of clone3 args, so you're actually free to do this. There's some song and dance though, because it's not actually possible to exec an arbitrary binary will access to nothing.
But I think the big discrepancy is that there is inherently a two step process to "spawn a new process with a new executable." Doesn't work that way - you clone3/fork into a new child process, inheriting what you will from the parent based on the clone args/flags (which could be everything, could be nothing), do some setup work, and then exec.
What bothers me most about sandboxing with linux namespaces is that edge cases keep turning up that allow them to trick the kernel into granting more privileges than it should.
I wonder if Landlock can/will bring something more like FreeBSD jails to the table. (I haven't made time to read about it in detail yet.)
posix_spawn can do much, but not all, of what is possible with clone + exec. Presumably the standard editors have been scared to add too complex function parameters for its invocation, though that should not have been a problem if all parameters had reasonable default values.
I was pretty puzzled when Docker and LXC came around as this whole new thing believed to have "never been done before"; FreeBSD had supported a very similar concept for years before security groups were added in Linux.
Jails and ezjail were stellar to make mini no-overhead containers when running various services on a server. Being able to archive them and expand them on a new machine was also pretty cool (as long as the BSD version was the same.)
Nobody with knowledge of sandboxing believed this, Virtuozzo and later OpenVZ had been on Linux for a long time after all. Virtuozzo was even from a similar time frame as FreeBSD jails (2000-ish).
The key innovation of Docker was to provide a standardized way to build, distribute, and run container images.
unshare --mount
Most examples you'll find put it in the context of containers, like https://www.redhat.com/en/blog/mount-namespaces
Among many other things, Docker (and Podman etc) has
1. Images and OverlayFS
2. Networking
3. User namespace mappings
4. Resource management
---
If all you want is file system isolation, then docker (and postman, etc) is massive overkill chroot is correct.
Shame Plan9 blew its weirdness budget.
People liked Unix because it was free – either really free, via BSD, or as a Unix derivative provided at no cost when people bought their workstations. A new revolutionary operating system had absolutely no reason for anybody to buy it: No commercial developers wanted to develop to a platform without users, and no users wanted a platform without software.
Plan 9 only changed their license many years later, when it was too late for anybody to care, and Unix had become the established standard.
I wrote https://github.com/aidanhs/machroot (initially forked from bubble wrap) a while ago to lean into the pure "pretend I see another filesystem" aspect of chroot with additional conveniences (so no security focus). For example, it allows setting up overlay filesystems, allows mounting squashfs filesystems with an overlay on top...and because it uses a mount namespace, means you don't need to tear down the mount points - just exit the command and you're done.
The codebase is pretty small so I just tweaked it with whatever features I needed at the time, rather than try and make it a fully fledged tool.
(honestly you can probably replicate most of it with a shell script that invokes unshare and appropriate mount commands)
Besides that I have a simple script that starts an ephemeral docker with debian-full + tools.
I also have some scripts that leverage macOS's 'sandbox-exec'
https://github.com/mtseet/proot-docker
We need more people to improve it!
but most professional world use systemd to bootstrap isolated processes nowadays, which is kinda if what you are hinting at. cgroups2 and namespaces are what you want.
Oh wow, wow. That all sounded so intensely complex, incomprehensible. What we are going to need to do is build a program to handle all that, highly formalized. Let's make it so formalized it's one of those things like taxes or AWS where people can just make a living from understanding the beast. It can be like systemd meets multics meets java. have it's own various complicated commands, complicated file formats, and so on. The chroot() is only historically understood by everyone, so let's steal a page from the java playbook and just rename everything with our own terminology. The product will be so outstanding, wow, I call it "Shocker"
That requirement is pretty legitimate, since its easier and suitable enough for many applications for which we currently use OCI containers. For example, isolated builds, development environments, sandboxes etc. (I have an isolated build tool for Gentoo).
But Linux already has multiple solutions that fit the bill, like systemd-nspawn, LXC, bubblewrap, etc. Too bad, they aren't as widely known as chroot.
It sounds like people want "better exec"
This was my original motivation in creating machroot (mentioned elsewhere in this thread) and having it use namespaces.
[1] https://www.usenix.org/legacy/event/lisa04/tech/full_papers/...
This is frequently very convenient if you want to install Gentoo by compiling everything from sources on a cheap and small computer, e.g. one with some Intel Atom CPU.
Instead of compiling anything on the resource-constrained computer, you install a fresh Gentoo system for it, in a chroot environment on a fast desktop computer, which supports a superset of the ISA of the small computer, so you can still execute the programs intended for the target computer.
Then you just copy the installation result over the SSD/HDD of the destination computer. If you have many identical computers, you can copy the installed Gentoo over all of them without any problems, removing the need for multiple installations.
If desired, you can keep the chroot environment with the installation result and perform any later updates on it. Then you synchronize the updated Gentoo from the chroot with the one or more target computers.
I just wish the script could figure out a BTRFS drive without me manually mounting volumes :(
for btrfs, if you use a consistent volume mapping on your systems, its pretty easy. in arch setup i typically enable ssh and have a pretty simple set of bashisms for target device, compression, and mount points. then its copy-pasta since its a boiler plate for when i need to recover or fresh install
not glanced at manjaro so not familiar with its install-methods
> sudo mount /dev/nvme0n1p3 /rescue/boot
This is a little extra. What you can generally do is immediatelly after chroot just run 'mount -a' to mount everything from the chroot's fstab. The empty `/boot` probably already exists.
-the lack of access to efi subsystem from wsl means you need to pass some extra flags to help grub/etc along, and you may need to set it as the boot partition in the bios manually
-you'll have to mount the drive to wsl with `wsl --mount <DiskPath> --bare`, after finding the right DiskPath with `Get-CimInstance -query "SELECT * from Win32_DiskDrive"`, and you might have to offline the disk first in Windows disk manager
Now I'm just wondering if it would work from a Linux running in the browser, accessing the USB through WebUSB ;D
I've got a version of the mounting command that I think is easier to use:
for f in proc sys dev run dev/pts ; do mount --bind /$f /mnt/$f ; done
Change the "/mnt/$f" to whatever mountpoint that you're using which would be "/rescue/$f" to align with TFA.I don't know what difference it makes to have /run mounted, but once you chroot into the mountpoint you can mount the boot partition etc and run whatever grub or mkinitramfs command you need to fix stuff.
I would leave the /boot mounting to later in the process - after you chroot. This way you can easily check /etc/fstab for where the boot partition lives (or if there is one), so you only need to locate the root partition initially which is generally easy to figure out from the disk sizes.
There's extra steps needed however if the system uses LVM.
There are various handy (chroot) techniques that are probably considered "old school" now. For example, having a "rescue partition" which can be booted into remotely, and from there reinstall or repair the "main os". This is necessary when repartitioning remotely, for example.
It’s still used for recovery but recovery partitions have kinda gone out of fashion as the ability to consistently boot has gotten better. Additionally, thumb drives and net boot make the partition a little less necessary.
If a system is screwed up enough, then a chroot strategy won't work because it relies on the path you are chrooting to to be generally valid and functional. If it's missing libraries you may well be screwed.
If you're wanting to do something like that fairly regularly, then it'd be easier to just run virtual machines (e.g. QEMU/KVM) to do that kind of thing.
arch-chroot [1], despite its name pretty much does all the `mount -t proc` stuff the post says. It's also available on other distros like debian [2]. I have used it in the past to chroot into fedora as well.
[1] https://man.archlinux.org/man/arch-chroot.8 [2] https://packages.debian.org/arch-install-scripts
This is now my default tool setting up raspberry like computers.
We used rcp to keep passwords in sync. Add the account on the main machine, rcp the password file to the other machine. sudo rcp /etc/password other:/etc/passwd was muscle memory.
One day, someone was getting added to the groups file to be able to work in the server web project. sudo rcp /etc/group other:/etc/passwd
Ooops. Couldn't log in to fix it.
"Is anyone logged into the other machine?" (someone said yes). "Type while 1 sync" ... (ok) ... And we flipped the power switch and brought it up in single user mode (since the password file was invalid). Next, need to establish a minimal /etc/passwd ... emacs /etc/passwd (nope) vi /etc/passwd (nope - invalid terminal 300h not in termcap). "Uhm... cat > /etc/passwd ?" (possible, but a PITA when there is a typo in transcription)
I was a wizard on a lpmud. "I know ed".
And we got a minimal password file restored while reading the hashed values over (no way where we going to have root::0:0:... as the file even for a second) and then rcp'ed the proper /etc/passwd and /etc/group file over to the other machine.
Such systems are rarely very pragmatic, but do show off a bunch of weird concepts.
On non Linux systems, it's fancy chroot inside a Linux VM.
Another semi-related but equally odd view: Docker is an operating system and containers are processes.
The difference between Docker and chroot, by the way, is that Docker does a bunch more system calls to ch the root of several other things that are not the filesystem. It also sets up things inside those roots for you by default, so your containers can access the internet, for example.
https://github.com/p8952/bocker
Basically, it runs docker containers using chroot to prove that it's possible.
Docker uses Layered Filesystems (Multiple Filesystems mounted under the same folder on top of one another), something that used to be mostly used when you have a read only Filesystem, like a cdrom, then mount a writeable folder over the same mount point to make that folder writeable.
Bocker does a similar mounting of the layers with chroot to run docker containers.