Nix: Hypervisor, Kubernetes, and Containers
This guide accompanies my 2023 Kubecon talk, Nix, Kubernetes, and the Pursuit of Reproducibility.
Nix, the language, packages, and operating system, is seeing increased popularity with its promise of providing a highly-composable way to create reproducible software. I've been intrigued by the Nix ecosystem for some time as I often struggle to reproduce the exact configuration found on my VMs and, at times, hypervisors. The reasoning for this struggle varies, but often it's simply laziness of ensuring changes I've made on a host make it into automation like Packer or Ansible. Trying to build my environment with Nix seemed like a good way to rid myself of th is bad habit, while also learning a new stack.
This guide will demonstrate the various configuration I've made in making this transition to Nix(OS). I'll attempt to keep this guide pretty concise to the actual steps taken, but see my Kubecon presentation for additional context (and some jokes) around the use case:
Hypervisor
My hypervisors are run on an amalgamation of computers. These computers range from decommissioned enterprise gear to old consumer hardware (often utilizing a laptop or 2). For evidence of the "jankyness", see below:
Given that hardware may come and go, it's important I can provision the hypervisor consistently. For setting up a hypervisor, the exercise largely entails configuring networking and qemu/libvirt. Of course, this all comes after the base-OS installation.
OS Install via ISO
Since the hypervisor is installed on baremetal, we need to start by installing the base NixOS operating system. Then we'll bring the custom configuration in. I'll leave creating install media and installing the base OS to you. The Nix community also has this documented in their manuals.
One convenience I've created is a filesystem provisioning script and a base
install script. These scripts can be run from a remote machine. When you boot
your machine from the ISO, sshd
will already be enabled. To utilize ssh, run
passwd
and set a password for root
, then you can SSH in from any remote
machines. The shell scripts I run from the remote machines look as follows.
#!/bin/sh
# inspired by https://github.com/mitchellh/nixos-config/blob/0547ecb427797c3fb4919cef692f1580398b99ec/Makefile#L51-L77
ssh root@${NIXADDR} " \
umount -R /mnt; \
wipefs -a /dev/sda; \
parted -s /dev/sda -- mklabel gpt; \
parted /dev/sda -- mkpart primary 512MiB 100\%; \
parted /dev/sda -- mkpart ESP fat32 1MiB 512MiB; \
parted /dev/sda -- set 2 esp on; \
sleep 1; \
mkfs.ext4 -L nixos /dev/sda1; \
mkfs.fat -F 32 -n boot /dev/sda2; \
sleep 1; \
mount /dev/disk/by-label/nixos /mnt; \
mkdir -p /mnt/boot; \
mount /dev/disk/by-label/boot /mnt/boot; \
"
The above sets up system partitions and uses any remaining space for root. For simplicity I've left volume encryption out of this guide, however you may want to consider setting this up. The next script performs the base install:
#!/bin/sh
ssh root@${NIXADDR} " \
nixos-generate-config --root /mnt; \
sed --in-place '/system\.stateVersion = .*/a \
nix.extraOptions = \"experimental-features = nix-command flakes\";\n \
services.openssh.enable = true;\n \
services.openssh.passwordAuthentication = true;\n \
services.openssh.permitRootLogin = \"yes\";\n \
users.users.root.initialPassword = \"root\";\n \
' /mnt/etc/nixos/configuration.nix; \
nixos-install --no-root-passwd; \
"
Assuming the above scripts are called setup-filesystem.sh
and
install-nix.sh
respectively, I can install against the hypervisor by running
the following from a remote client.
NIXADDR=192.168.33.23 ./setup-filesystem.sh
NIXADDR=192.168.33.23 ./install-nix.sh
Now we have NixOS installed and layed out on a simple partition scheme. From here we can reboot and start configuring the system.
Networking and Software
For my hypervisors, the networking configuration involves a bridge interface that acts a virtual switch for the VMs. The bridge interface is connected to a physical nic's interface. In this model, VMs get a routable (LAN) IP address from DHCP. For details on hypervisor networking, you may find my post VM Networking interesting. Visually, you can think of the networking setup as follows:
After installing the base system, a file named /etc/nixos/configuration.nix
was created. This is the file to update with networking and software details. I
like to keep this file as is, but add an extra one to /etc/nixos/extra.nix
. I
went ahead and added a # TODO(you)
comment to each area you're likely to need
to update.
{ config, pkgs, ... }:
{
# TODO(you): switch this for kvm-amd if using AMD instead.
boot.kernelModules = [ "kvm-intel" ];
# disable dhcpcd since we'll use systemd-networkd
networking.useDHCP = lib.mkDefault false;
systemd.network = {
enable = true;
netdevs = {
# Create the bridge interface
"20-br0" = {
netdevConfig = {
Kind = "bridge";
Name = "br0";
};
};
};
networks = {
# TODO(you): update `30-enp2s0` to your NIC's interface (run `ifconfig`)
# Connect the bridge ports to the bridge
"30-enp2s0" = {
# TODO(you): update `enp2s0` to your NIC's interface (run `ifconfig`)
matchConfig.Name = "enp2s0";
networkConfig.Bridge = "br0";
linkConfig.RequiredForOnline = "enslaved";
};
"40-br0-dhcp" = {
matchConfig.Name = "br0";
networkConfig = {
DHCP = "ipv4";
};
};
};
};
virtualisation.libvirtd.enable = true;
virtualisation.libvirtd.allowedBridges = [
"br0"
];
# required by libvirtd
security.polkit.enable = true;
environment.systemPackages = with pkgs; [
neovim
wget
jq
curl
virt-manager
htop
prometheus
prometheus-node-exporter
prometheus-process-exporter
];
environment.variables.EDITOR = "nvim";
# Enable the OpenSSH daemon.
services.tailscale.enable = true;
services.openssh.enable = true;
services.openssh.passwordAuthentication = true;
services.openssh.permitRootLogin = "yes";
services.openssh.extraConfig = ''
AllowStreamLocalForwarding yes
AllowTcpForwarding yes
'';
# note this setting, in cause you wish to be more restrictive at the OS-level.
networking.firewall.enable = false;
}
Reading the configuration above tells most of the story, but lets call out a few things.
- We are setting
dhcpcd
to false in favor of usingsystemd-networkd
. Otherwise it would be defaulted to true. - Enabling
virualization.libvirt
handles much of the plumbing and ensures associated packages are installed. - If your networking does not work on boot, use
journalctl -u systemd-networkd
to view the logs. - Nix performs a hardware scan, which imports the majority of required hardware-specific configuration, however we do enable KVM above (change this setting if using AMD).
To use the extra.nix
file in our system build, we need to import it to /etc/nixos/configuration
.
{ config, pkgs, ... }:
{
imports =
[ # Include the results of the hardware scan.
./hardware-configuration.nix
./extra.nix
];
Now we're set to re-build the operating system. Normally these changes can happen in place, but since we're configuring network interfaces, I recommend a full reboot.
nixos-rebuild switch
VM
Now lets create VM images capable of running Kubernetes. There are a variety of
ways to approach this, one of which is to use the Kubernetes modules provided
by
NixOS.
However, in my lab, I prefer setting up the base packages, unit files, and then
leveraging kubeadm
to manage the Kubernetes bits. Admittedly, mutating the
system with kubeadm
is a little impure as far as Nix is concerned, but I'm ok
with that. So, for the VMs, the following things are going to be setup.
kubeadm
.- Kubernetes bits (e.g.
kubelet
,kubectl
). - A container runtime (
containerd
).
These have some dependencies we need to consider as well, but for now, let's
examine a configuration.nix
file that may encompass the above.
{ config, pkgs, ... }:
{
# text that shows up when you ssh in. Makes for an easy parameter to change
when testing builds too.
users.motd = "Hello Kubecon Chicago!!!";
networking.hostName = "";
system.stateVersion = "23.05";
virtualisation.containerd = {
enable = true;
configFile = ./containerd-config.toml;
};
# kernel modules and settings required by Kubernetes
boot.kernelModules = [ "overlay" "br_netfilter" ];
boot.kernel.sysctl = {
"net.bridge.bridge-nf-call-iptables" = 1;
"net.bridge.bridge-nf-call-ip6tables" = 1;
"net.ipv4.ip_forward" = 1;
};
# List packages installed in system profile. To search, run:
# $ nix search wget
environment.systemPackages = with pkgs; [
neovim
wget
ripgrep
prometheus-node-exporter
prometheus-process-exporter
kubernetes
containerd
cri-tools
ebtables
ethtool
socat
iptables
conntrack-tools
(import ./hostname.nix)
];
systemd.services.kubelet = {
enable = true;
description = "kubelet";
serviceConfig = {
ExecStart = "${pkgs.kubernetes}/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS";
Environment = [
"\"KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf\""
"\"KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml\""
"PATH=/run/wrappers/bin:/root/.nix-profile/bin:/etc/profiles/per-user/root/bin:/nix/var/nix/profiles/default/bin:/run/current-system/sw/bin"
];
EnvironmentFile = [
"-/var/lib/kubelet/kubeadm-flags.env"
"-/etc/default/kubelet"
];
Restart = "always";
StartLimitInterval = 0;
RestartSec = 10;
};
wantedBy = [ "network-online.target" ];
after = [ "network-online.target" ];
};
systemd.services.hostname-init = {
enable = true;
description = "set the hostname to the IP";
serviceConfig = {
ExecStart = "/run/current-system/sw/bin/hostname-init";
};
wantedBy = [ "network-online.target" ];
after = [ "network-online.target" ];
};
environment.variables.EDITOR = "nvim";
# Enable the OpenSSH daemon.
users.users.root.initialPassword = "root";
services.openssh.enable = true;
services.openssh.passwordAuthentication = true;
services.openssh.permitRootLogin = "yes";
networking.firewall.enable = false;
}
The above references some external files we'll examine now. The first is the
containerd-config.toml
. This is needed as there is some specific
configuration required per the Kubernetes
docs, namely the cgroup driver.
The config file is really long, so in this guide I'll show pulling from a link:
wget https://raw.githubusercontent.com/joshrosso/hypernix/main/vm-images/containerd-config.toml
The last extra config is hostname.nix
. This is...less than ideal...but works
really well for me until I have a more capable DHCP setup. The idea is that
when a host boots the unit file resolves the DHCP-provided IP address and
generates a hostname. It then reboots the system and continues forward. This
approach (hack?) assumes the IP address will never change for the life of the
VM. The nix
definition is:
with import <nixpkgs> {};
writeShellApplication {
name = "hostname-init";
text = ''
#!/bin/sh
SN="hostname-init"
# do nothing if /etc/hostname exists
if [ -f "/etc/hostname" ]; then
echo "''${SN}: /etc/hostname exists; noop"
exit
fi
SN="hostname-init"
echo "''${SN}: creating hostname"
# set hostname
/run/current-system/sw/bin/ip -o -4 addr show | /run/current-system/sw/bin/awk '!/ lo | docker[0-9]* /{ sub(/\/.*/, "", $4); gsub(/\./, "-", $4); print $4 }' > /etc/hostname
if [ -f "/etc/hostname" ]; then
/run/current-system/sw/bin/reboot
fi
'';
}
If you use the above and it gives you problems, use journalctl -u hostname-init
to see the logs.
The above expresses a base configuration that could be used in a variety of
image formats. Since our hypervisor stack is built on KVM/qemu/libvirt, we'll
plan to use qcow2
as the format. The
nixos-generators project
is where I went to get details on how to build for a variety of outputs. I've
since translated some things into this
repo, which if
cloned down, you can fun the following command:
nix-build ./nixos-generate.nix \
--argstr formatConfig /root/hypernix/vm-images/formats/qcow.nix \
-I nixos-config=configuration.nix \
--no-out-link \
-A config.system.build.qcow
However, rather than using mine, you may wish to see if you can get the
nixos-generators
project to work since it's actually maintained. Once the
above command builds, you'll see an output in the /nix/store
, as seen below.
[root@nixos:~/hypernix/vm-images] ls /nix/store/aa494ixhf52l295c7isdkylvr135j84q-nixos-disk-image
nixos.qcow2
I tend to move these base images into
/var/lib/libvirt/images/{SOME_DIRECTORY}
. Although the exact location is
entirely up to you.
With the base image in place, it's a matter of standing up virtual machines. I control this through Terraform, but that's a bit too long for this exercise. Instead, here's a script that will spawn 3 VMs based on the image generated above.
# TODO(you): change this to where you move your image to.
PATH_IMG=/var/lib/libvirt/images/
NAME=k8s_base
for i in {1..3}
do
cp -v ${PATH_IMG}/${NAME}.qcow2 ${PATH_IMG}/${NAME}-${i}.qcow2
virt-install \
--name kubecon_$i \
--ram 6000 \
--vcpus 2 \
--os-variant generic \
--console pty,target_type=serial \
--bridge=br0 \
--graphics=vnc,password=foobar,port=592${i},listen=0.0.0.0 \
--disk=${PATH_IMG}/${NAME}-${i}.qcow2 \
--import &
done
Once the VMs are up, determine their IPs, SSH into them, and run kubeadm init
and kubeadm join
to create your multi-node cluster.
Containers
Containers will be run in the cluster via a Pod. Nix can be used to create
OCI-compliant container images, which then run via a container runtime such as
containerd
. As far as I know, pkgs.dockerTools
is the most ubiquitous was
to produce images with Nix. It has a bunch of mapping that relate to what you'd
expect in a Dockerfile, and many more benefits such as the ability to ensure
each /nix/store
asset built is put in its own layer. Don't let the
docker part of the tool name throw you off. The outputted image will run
with any container runtime that supports OCI-based images (which should be
most/all of them).
Below is an example of an image building nginx
, this is largely copied from
the upstream
examples.
You can put the example in any directory and name it nginx-container.nix
.
{ pkgs ? import <nixpkgs> { }
, pkgsLinux ? import <nixpkgs> { system = "x86_64-linux"; } }:
let
conf = {
nginxWebRoot = pkgs.writeTextDir "index.html"
" <html><body><center><marquee><h1>all ur PODZ is belong to ME</h1></marquee> <img src=\"https://m.media-amazon.com/images/M/MV5BYjBlODg3ZTgtN2ViNS00MDlmLWIyMTctZmQ2NWYwMzE2N2RmXkEyXkFqcGdeQVRoaXJkUGFydHlJbmdlc3Rpb25Xb3JrZmxvdw@@._V1_.jpg\" width=\"100%\"></center></body></html>\n";
nginxPort = "80";
nginxConf = pkgs.writeText "nginx.conf" ''
user nobody nobody;
daemon off;
error_log /dev/stdout info;
pid /dev/null;
events {}
http {
access_log /dev/stdout;
server {
listen ${conf.nginxPort};
index index.html;
location / {
root ${conf.nginxWebRoot};
}
}
}
'';
};
in pkgs.dockerTools.buildLayeredImage {
name = "joshrosso/kubecon";
tag = "1.4";
contents = [ pkgs.fakeNss pkgs.nginx ];
extraCommands = ''
mkdir -p tmp/nginx_client_body
# nginx still tries to read this directory even if error_log
# directive is specifying another file :/
mkdir -p var/log/nginx
'';
config = {
Cmd = [ "nginx" "-c" conf.nginxConf ];
ExposedPorts = { "${conf.nginxPort}/tcp" = { }; };
};
}
With the above declared, we can run nix-build
and create an image.
nix-build nginx-container.nix
This will create a multi-layer image and create a symlink named result
. You
can now load the tarball into a container tool like docker
and push it to a
remote repository.
docker load < result
docker push joshrosso/kubecon:1.4
In my Kubecon talk (video above) I validated the container's functionality by deploying the pod, then port-forwarding to it and opening it in a web browser.
The manifest looked as follows:
apiVersion: v1
kind: Pod
metadata:
name: a-message-from-the-underworld
spec:
containers:
- name: message
image: joshrosso/kubecon:1.4
ports:
- containerPort: 80
The port-forward command:
kubectl port-forward --address 0.0.0.0 pods/a-message-from-the-underworld 8080:80
The app could then be opened in a web browser as seen below.
Next Steps
There's a lot more to explore in the Nix ecosystem, but here are some specific things you may wish to look into if you decided to build on what's in this guide.
- Add non-root users to your hypervisor and VMs.
- Read nix-pills to better understand the language, packages, and OS.
- Checkout farcaller's NixCon talk on Kuberentes deployments with Nix, it covers using Nix for manifest generation.
- Consider running NixOS on your desktop/laptop.