内容简介:NOTE: if you just want to have a working image quickly, then head over to thisThe world of system administration (a term now falling in disuse) has seen dramatic changes in the course of the years. The incredibly quick rising of the “
NixOS
has a great out-of-the-box support for ARM64v8
systems, but that comes with a catch: you have to use the prebuilt images to install the system, which are (obviously) not customizable, and come without OpenSSH enabled by default. Unfortunately, this requires to attach a display to the Raspberry Pi to complete an installation – not ideal! This article is the story of my journey to build a custom NixOS image for my Raspberry Pi, with all the pitfalls and errors I had to solve to eventually reach the objective.
NOTE: if you just want to have a working image quickly, then head over to this GitHub repo and follow the instructions. If you don’t want to use Docker, you might want to just jump toThe VM approach: using Vagrant. Or, if you feel brave and want to get this done in absolutely the quickest way possible, check out how to build an SD imagenatively on EC2 in 5 minutes.
Table of contents
- An introduction to Nix
- NixOS on a Raspberry Pi
-
Emulating
AArch64
: QEMU andbinfmt_misc
-
The VM approach: using Vagrant
- Nix packages and image configuration
- Building, failing and building again
- One step forward: the Docker approach
-
The VM approach: using Vagrant
- Pushing it to the limit: native AArch64 build on EC2 in 5 minutes
An introduction to Nix
The world of system administration (a term now falling in disuse) has seen dramatic changes in the course of the years. The incredibly quick rising of the “ cattle, not pets ” mantra has permanently changed the way people deploy their applications, and for a good reason.
Nowadays, in such a stateless and containerized world, a technology is increasingly gaining traction: Nix . The promise is simple: get a package manager, add a functional and declarative programming language to define packages on top and season with an isolated and sandboxed build process. Voilà! You now have a way to build reproducible, stable and easily rollbackable packages.
Now take the same recipe and extend the concept to an entire Linux distribution: that’s exactly what NixOS aims to. You can have a configuration as minimal as:
{ boot.loader.grub.device = "/dev/sda"; fileSystems."/".device = "/dev/sda1"; services.sshd.enable = true; }
(taken from https://nixos.org/nixos/about.html )
And you’re just a command away from having a fully functional system with OpenSSH
already set up.
NixOS on a Raspberry Pi
Given the obvious advantages of all of this, it seems natural to extend the concept even to a platform like the Raspberry Pi. A practical example of why this could be particularly useful is given by the fragility of SD cards: in the event of a catastrophic filesystem failure due to an SD card committing seppuku
, having a .nix
file which holds everything needed to get my Pi from zero to ready sounds quite amazing.
Fortunately, it looks like the people and contributors behind NixOS thought the same and have provided first-class support for AArch64 (ARM64v8) , with the Raspberry Pi use case in mind. They also provide ready-to-boot SD card images on their build system . If you just want to try NixOS on your AArch64 system, just take the last successful build of the latest stable release of NixOS ( look here for the stable at the time of writing, 20.03) and flash it on your SD card. It’s that easy! Keep in mind, however, that you have to attach an HDMI display and a keyboard to your RPi.
Note that at the time of writing, according to the unofficial wiki , NixOS has first class support only for the Raspberry Pi 3, but there’s loads of activity regarding the Raspberry Pi 4, and configs which seem to work for other users.
However, I want my SD image to be as close to the final configuration as possible – at the very minimum, I want OpenSSH
already set up with my SSH key so I don’t have to painstakingly connect my RPi to a display. But there’s a catch: images for foreign architectures can only be built by a system which understands that architecture.
This leaves three ways to build a custom AArch64 image with your own configuration. In ascending order of complexity:
-
Use a remote builder, such as an actual Raspberry Pi with Nix on it or ask access to the
community
aarch64
builder . - Just build it on the Raspberry Pi (i.e. without a remote build). Be sure to use a fair amount of swap if you do that, as the image creation process is memory hungry.
-
Emulate
AArch64
and comfortably build the image on my PC, which isx86_64
.
Clearly, the only possible option for me was to go with the
x86_64
, which requires emulation. Buckle up, this is going to be quite a ride!
Emulating AArch64
: QEMU and binfmt_misc
Fortunately, there are loads of good indications on the wiki which I used as the basis for my endeavor.
To do this, there are two ways:
-
booting an emulated
AArch64
system ( system emulation ), so basically an emulated VM. This is fine but on the heavier side of things, as you need to allocate a fixed amount of RAM to the guest system and need a boot image. - using QEMU in user emulation mode, which allows it to execute foreign binaries on the fly, without requiring a running guest system.
At this point, I decided to proceed with the user emulation way, but there is one more trick needed to make the whole thing work: binfmt_misc
. binfmt_misc
is an awesome capability of the Linux kernel which allows the kernel to understand foreign executable file formats and delegate execution to any userspace program. When you try to execute something which the kernel does not know how to handle, it will try to see if it matches any of the binfmt_misc
handlers via its “magic”
– if it does, it calls the specified executable (which will be the emulator in our case) with the original command line. Pretty cool! The kernel uses a very similar mechanism to parse the shebangs on top of your scripts. Pair this with QEMU
, an incredibly extensive emulator, and you get the possibility to run AArch64 binaries on your box! Keep in mind, however, that since this will emulate the actual architecture it will be quite slow.
The VM approach: using Vagrant
To start, my first approach was to use Vagrant to spin up a VM.
vagrant init generic/debian9
Make sure to start the VM with enough RAM – the image building step of the process takes more than 4 GiB of RAM, but if you have enough swap it’s alright. If you’re using VirtualBox as the backend:
patch Vagrantfile <<EOF 52c52 < # config.vm.provider "virtualbox" do |vb| --- > config.vm.provider "virtualbox" do |vb| 57,58c57,59 < # vb.memory = "1024" < # end --- > vb.memory = "4096" > vb.cpus = 4 > end EOF
Setup
After I got into the VM with vagrant ssh
, I set up QEMU:
sudo apt update sudo apt install -y qemu-user-aarch64 binfmt-support qemu-user-static
Then I checked that QEMU was correctly registered as a binfmt_misc
handler:
$ sudo update-binfmts --display [...] qemu ... (enabled): [...]
At this point, I set up Nix:
# installs Nix (run this as a normal user) sh <(curl https://nixos.org/nix/install) --no-daemon # loads nix into the env without reopening the shell . $HOME/.nix-profile/etc/profile.d/nix.sh
Nix packages and image configuration
It is now time to choose what revision of nixpkgs
to use. In this case, I want to build the latest stable release of NixOS (20.03 at the time of writing), so I cloned the release-20.03
branch of nixpkgs
:
git clone --depth=1 -b release-20.03 https://github.com/NixOS/nixpkgs
So far so good! I then set up a basic configuration:
cat > $HOME/sd-image.nix <<EOF { lib, ... }: { imports = [ <nixpkgs/nixos/modules/installer/cd-dvd/sd-image-aarch64.nix> ]; # The installer starts with a "nixos" user to allow installation, so add the SSH key to # that user. Note that the key is, at the time of writing, put in `/etc/ssh/authorized_keys.d` users.extraUsers.nixos.openssh.authorizedKeys.keys = [ "ssh-ed25519 ..." ]; # bzip2 compression takes loads of time with emulation, skip it. sdImage.compressImage = false; # OpenSSH is forced to have an empty `wantedBy` on the installer system[1], this won't allow it # to be started. Override it with the normal value. # [1] https://github.com/NixOS/nixpkgs/blob/9e5aa25/nixos/modules/profiles/installation-device.nix#L76 systemd.services.sshd.wantedBy = lib.mkOverride 40 [ "multi-user.target" ]; # Enable OpenSSH out of the box. services.sshd.enabled = true; } EOF
To use the cloned nixpkgs
, Nix needs to be instructed on the location of the default expressions. By default, it will use ~/.nix-defexpr
which is not what I want, so while still in my home directory I executed this:
# with this, `<nixpkgs>` will resolve to the checkout of the `nixpkgs` in this directory rather # than ~/.nix-defexpr # NOTE: this needs to be executed in the parent directory of the Git repo export NIX_HOME="$(pwd)"
Building, failing and building again
Everything is now ready to go! Time to kick start the build:
cd nixpkgs/nixos nix-build -A config.system.build.sdImage --option system aarch64-linux -I nixos-config=$HOME/sd-image.nix default.nix
This took quite a while, but eventually it reached the stage where it started to create the .img
file. cptofs
(the utility used to copy the system files to the image being built) is an incredible memory hog and it’s the most memory intensive process, in my case peaking at about 8 GiB of used RAM. Then, suddenly, it started spitting out loads of this:
error while reading directory /nix/store/[...]: Cannot allocate memory
Ouch, that doesn’t look good. I monitored RAM usage and the system was definitely not out of memory (nor swap), so something weird was going on. However, the build did not error out, it produced an image anyway.
Some Googling revealed very few results, except for some poor souls on the IRC channels #nixos
/ #nixos-aarch64
who had the same issues, reporting inability to boot with the resulting images, and without a solution [1]
[2]
[3]
. Other sources say that the build actually works anyway, but I didn’t feel comfortable booting an (apparently) half-baked image. Not wanting to give up, I intensified my Googling, and found similar issues that people found on other software when emulating. Specifically, Debian maintainers
found that on PIE-compiled binaries
allocations made with brk(2)
were failing randomly. The issue, originally reported in 2018, was fixed
in January 2020. Still not sure if this was going to be the fix for what was going on, and using a distro not exactly known for its up-to-date software, I decided to build the latest stable of QEMU from its source. At the time of writing, v5.0.0 just came out!
Note: it is very much possible that a distribution with more up to date packages won’t need this. Attempt a normal build first!
Note: Debian unstable already bundles QEMU 5.0 – it’s perfectly sufficient to use the official package if available.
# remove system QEMU sudo apt remove qemu-system-aarch64 qemu-user-static # clone qemu git clone --depth=1 -b v5.0.0 https://git.qemu.org/git/qemu.git; cd qemu # install deps sudo apt install git libglib2.0-dev libfdt-dev libpixman-1-dev zlib1g-dev # configure minimally and build ./configure --enable-linux-user --target-list=aarch64-linux-user --disable-bsd-user \ --disable-system --disable-vnc --disable-curses --disable-sdl --disable-vde \ --disable-kvm --static --disable-tools --cpu=x86_64 make -j$(( $(nproc --all) + 1 ))
A bit of time later I had a working binary of QEMU in ./aarch64-linux-user/qemu-aarch64
. However, I’ve lost the binfmt_misc
registration done by the Debian package. Fortunately, QEMU comes with a script to setup binfmt_misc
out of the box. I altered it slightly to only register the signature for aarch64
binaries:
patch scripts/qemu-binfmt-conf.sh <<EOF 4,7c4,8 < qemu_target_list="i386 i486 alpha arm armeb sparc sparc32plus sparc64 \ < ppc ppc64 ppc64le m68k mips mipsel mipsn32 mipsn32el mips64 mips64el \ < sh4 sh4eb s390x aarch64 aarch64_be hppa riscv32 riscv64 xtensa xtensaeb \ < microblaze microblazeel or1k x86_64" --- > qemu_target_list="aarch64" EOF sudo ./scripts/qemu-binfmt-conf.sh --qemu-path $(pwd)/aarch64-linux-user
Note that this won’t persist after a reboot and isn’t the “standard way” of doing it, but it’s more than sufficient for a quick build.
And voilà! Time to attempt a new build:
nix-build -A config.system.build.sdImage --option system aarch64-linux -I nixos-config=$HOME/sd-image.nix default.nix
This time, no allocation failures! Woo-hoo, thanks QEMU contributors! Unfortunately, my celebrations were quickly stopped by what I saw next:
building '/nix/store/q4kcsy4f1jcxxa2kc6x02rjhg8z1911y-ext4-fs.img.zst.drv'... [...] copying store paths to image... copying files to image... e2fsck 1.45.5 (07-Jan-2020) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information NIXOS_SD: 93738/177408 files (0.1% non-contiguous), 581365/708701 blocks Resizing to minimum allowed size resize2fs 1.45.5 (07-Jan-2020) Please run 'e2fsck -f temp.img' first. builder for '/nix/store/q4kcsy4f1jcxxa2kc6x02rjhg8z1911y-ext4-fs.img.zst.drv' failed with exit code 1 cannot build derivation '/nix/store/ari7yzjkhif5d7q256dy5rrdfkjhqb8f-nixos-sd-image-20.09pre221814.10100a97c89-aarch64-linux.img.drv': 1 dependencies couldn't be built error: build of '/nix/store/ari7yzjkhif5d7q256dy5rrdfkjhqb8f-nixos-sd-image-20.09pre221814.10100a97c89-aarch64-linux.img.drv' failed
This proved quite difficult to debug and solve, and I separated the investigation for this issue in another post:
“Why doesn’t resize2fs
resize my filesystem?”
.
The gist of it is that I had to patch nixpkgs
to sort it out – check this PR
to see if this is was merged and you don’t need to take care of it anymore. Otherwise, applying the patch to the local checkout is pretty easy:
curl -L "https://github.com/NixOS/nixpkgs/pull/86366.patch" | git am
After applying the patch and building, the build finally finished leaving me with a fancy .img
file:
$ ls result/sd-image/ nixos-sd-image-20.03pre-git-aarch64-linux.img
Flashing
After getting the image file back to a machine where I had an SD card reader (a MacBook), I took the device name of my SD card (which was /dev/disk2
, retrievable via “Disk Utility”) and flashed it:
sudo gdd if=nixos*.img of=/dev/rdisk2 bs=64K status=progress
Note: this permanently destroys data on the target device. Use with care.
Note: gdd
comes from the brew package coreutils
, and basically corresponds to an up-to-date GNU version of dd
which has the status=
parameter. It is perfectly fine to use the built-in one as well.
After plopping the SD card back into my Raspberry Pi, I connected it to the network and powered it up. Crossing all the fingers that I have, I waited for it to come online, and moment of truth time…
$ ssh nixos@10.0.0.x Enter passphrase for key '<key>': [nixos@nixos:~]$ uname -a Linux nixos 5.4.33 #1-NixOS SMP Fri Apr 17 08:50:26 UTC 2020 aarch64 GNU/Linux
Eureka! It’s alive, and it was born with my SSH public key! It was not without its efforts, but it was a success.
One step forward: the Docker approach
I could have stopped here, but it wasn’t fun enough! Since I know that I won’t be the only one that wants to do this, I want to at least make it as easy as possible for the next one who wants to do the same. Thus, I decided to create a Dockerfile
which, along with some docker-compose
magic, allows to build a NixOS SD image with one command in about 15-20 minutes
.
It’s available on GitHub , and the documentation should be easy to follow. You can stop reading this post now if you just need to build NixOS – keep reading to have some background.
Background
Originally, I was planning to use multiarch/qemu-user-static
from the official Docker hub to be able to set up QEMU and binfmt_misc
for free. Unfortunately, it has not been updated to use QEMU 5.0 yet, which according to my testing is the only one that does not throw random memory errors when copying the files to the filesystem. I looked into contributing to their repo, but unfortunately they depend on a package supplied by Fedora, which has only been updated to the latest release candidate of QEMU 5.0, rather than the final version which came out a few days ago.
To solve this, I built my own Dockerfile
(with blackjack and hookers)
which downloads a statically compiled usermode QEMU 5.0 exclusively for AArch64 from the official Debian repositories. After verifying its integrity, it also downloads the official script to enable binfmt_misc
from the main repository of QEMU and, via the privileged
Docker flag, enables it on the host system.
I used a few tricks to make the thing really painless:
-
I employed the
fix-binary
flag when registering the QEMU binary viabinfmt_misc
. The kernel usually loads the “interpreter” (emulator in this case) lazily, reading and executing it for each invocation – instead, this flag makes the kernel open and retain the QEMU binary in memory. This means that the actual executable can be removed without any issues, which is exactly what happens since the container lives shortly. Pretty amazing! -
Since containers do not await each other to finish, I compiled a very small AArch64 binary which simply runs a
printf
. This is used by the builder to wait until QEMU has been correctly set up to start the build. -
I wanted the whole process to be as secure as possible, so the containers that interact with the host kernel (using the
privileged
flag) are only in charge of the task of setting up QEMU: the actual build is done on another container which does not require special privileges. -
The previous point, however, imposed another roadblock: containers are started in parallel and execution is (rightfully) not sequential. I wanted to make sure to clear up the
binfmt_misc
handler once the build was done, to get rid of the only residues that are effectively left on the host system. To do that, there is a final container (using the same image of the one that sets up QEMU) which waits on its local network to be notified by the main container when the build is over. I achieved that by simply listening on a TCP port withnc
(which comes out of the box on Alpine, thanksbusybox
!) on the cleanup container, which will patiently wait until the builder sends a message to that port.
With this, I had all the necessary ingredients to make this painless for everyone else. Check it out on GitHub if you haven’t already.
Pushing it to the limit: native AArch64 build on EC2 in 5 minutes
Not too long ago, Amazon added native AArch64 machines to EC2, which are perfect for our use case. The cost is already pretty low, but you can go even lower with Spot instances !
Initially, I just spun up an EC2 instance, cloned the repository mentioned in the previous paragraph and let it run. As a sneak peek, here is exactly how much time it took to build the entire image:
$ time sudo docker-compose up ... build-nixos_1 | /nix/store/56khbas8w2y9xv5m6lihpmadw73nfvkd-nixos-sd-image-20.03pre-git-aarch64-linux.img nixos-docker-sd-image-builder_build-nixos_1 exited with code 0 real 5m16.864s user 0m1.519s sys 0m0.194s
Did I already mention that this includes the time to build the Docker image too? That’s pretty good!
However, I decided to go one last
step forward and create a Terraform
configuration which automatically creates an EC2 Spot instance and builds an SD image of NixOS on it
. It works almost magically, as all it takes is cloning
nixos-docker-sd-image-builder
and doing the following:
cd terraform terraform init terraform apply ./pull_image.sh terraform destroy
Check out the documentation if you want to learn more!
Conclusion
Woah, this was one hell of a journey! Having weird and hard to debug problems and the willingness to always go one step further than the previous made it certainly very interesting.
I hope this was informative – feel free to send me an e-mail if you have any questions or comments!
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
分布式服务架构:原理、设计与实战
李艳鹏、杨彪 / 电子工业出版社 / 2017-8 / 89.00
《分布式服务架构:原理、设计与实战》全面介绍了分布式服务架构的原理与设计,并结合作者在实施微服务架构过程中的实践经验,总结了保障线上服务健康、可靠的最佳方案,是一本架构级、实战型的重量级著作。 《分布式服务架构:原理、设计与实战》以分布式服务架构的设计与实现为主线,由浅入深地介绍了分布式服务架构的方方面面,主要包括理论和实践两部分。理论上,首先介绍了服务架构的背景,以及从服务化架构到微服务架......一起来看看 《分布式服务架构:原理、设计与实战》 这本书的介绍吧!