An update on my BHyve hypervisor setup
Intro¶
I should give this some proper context; I recently moved to using FreeBSD exclusively after using Linux exclusively for something like 20 years. There’s a variety of reasons why, but mainly I just like FreeBSD better. Sometime I think back in 2004, I took a free class / workshop on FreeBSD that took place over about 3-4 days. I had to get my ass up at the ass crack of dawn to travel from Bellevue to downtown Seattle, but it was worth it because it gave me a perspective that I might not have gotten anywhere else, from a guy who just really loved FreeBSD. This workshop was really above board. Arguably it was better than what you might pay money for these days. Each participant received their own copy of the FreeBSD manual in a 3 ring binder that contained literally everything there is to know about installing and configuring every aspect of the complete operating system. You can still find that manual here: https://docs.freebsd.org/en/books/handbook/
It hasn’t changed a hell of a lot over the years and looking through it I can’t think of anything that needs to change about it. I might be stretching the truth a bit, but old version are available and you can figure that one out for yourself. But FreeBSD is characteristically a complete operating system and the same can’t really be said for Linux. There is a concept called “Software of an Unknown Pedigree” which if the opposite “Software of a known Pedigree” were a thing, it would say a lot about what you can expect in FreeBSD; with exception to FreeBSD Ports which is all software of sometimes unknown and often known pedigree. From the kernel down to the base userland, things are reasonably consistent. If you’ve experienced coming back to FreeBSD after 20 years you might also say it’s very familiar despite the disparity in years:
/etc/rc.conf
/etc/rc.local
/etc/make.conf
/boot/loader.conf
/usr/src/
/usr/src/sys
/usr/src/sys/<arch>/conf
/usr/ports
/usr/local
What is new for me are the amenities that I’ve come to expect for a modern operating system:
- ZFS; subvolumes and zvols that operationally function as geoms
pkg
for when you just really don’t feel like building ports- Routing tables; the so-called
setfib(1)
orsetfib(2)
as opposed to just a single routing table (Linux has this, too) - Paravirtualization; BHyve similar to KVM (also uses virtio)
- Containers; BSD has had this for a long time called “jails” and there is even an OCI runtime in the works: https://github.com/dfr/ocijail
- Reliable contributions and continued development
Developing for FreeBSD also feels incredibly human; I can’t even explain it:
https://github.com/paigeadelethompson/exfat/commit/187c6694c68554f7961b427501373984a0742366
I made this using an LLM and the process is so straight forward that between having an NFS mount of this source code
where I can work on it with Cursor and an NFS mount on a dev FreeBSD VM I can make the changes in Cursor, and
build/test it and there’s no extra bullshit that I’ve gotta do. If it crashes I get a kernel debugger on the serial
console. If I really wanted to push the development of this to completion, I could I just don’t really have a need at
the moment. All of the build files for this kernel module and it’s associated user land tools (newfs etc) extend the
build assets of /usr/src
and it totally builds out of tree as-is, and it’s JUST bsdmake
I won’t get too much into what issues I have with the Linux kernel but allmodconfig:
and also allmodconfig
is not what it sounds like, if it was I wouldn’t have to always compile some stupid module for
one of these weirdo wireless USB adapters. The TLDR of this is there’s a lot of weird shit in the Linux kernel like
here’s another example:
https://docs.kernel.org/admin-guide/ufs.html
Why is UFS2 write support “experimental” in 2025? https://code.fe80.eu/lynxis/linux/-/blob/v2.6.34-rc1/Documentation/filesystems/ufs.txt?ref_type=tags
And then there’s Linus dude.. I’ve got nothing against the guy really but usually every time I hear his name come up
it is the subject of some open source community schadenfreude or it’s about him sperging out and going Gordon Ramsay
on some poor asshole. From my perspective, the whole build config system is woefully neglected and is long overdue
for a redesign–it doesn’t scale you can’t realistically build a kernel config with scripts/config
anymore and
essentially everybody who is redistributing the .config
you know the one that says:
which to me the glaring DO NOT EDIT seems to be a sign that it should always just be generated, rather than redistributed. But more to my original point if maintainability really mattered it makes you wonder how the actual fuck UFS2 (of all things) write support has been experimental for 18 years, doesn’t it?
And sure I’m really one to talk about being profesional but I’m not Linus Torvalds that’s for damn sure and I don’t personally attack people and make ridiculous claims about what Intel instruction sets are just a scam while being as ubiquitous and Linus Torvalds; I’m basically nobody and my reputation is already in the gutter. One thing that is shockingly apparent to me though is how little it matters. Very few people have their own opinions about anything anymore which is sad. Unfortunately it kinda seems to be the sentiment of the open source community as late which is to say:
- Everything is too complicated just take my word for it
- Everything is too complicated to do so don’t even bother
- Being wrong about something is unforgivable; this includes suggesting new ideas that are wrong
- Be afraid of everything, everything is a conspiracy, LLMs are just plagiarism machines
- The foot eater was “right” about “everything”
Just absolute horse shit that makes no sense coming from everybody.
It just seems to me if Linus really gave a shit about code maintainability he could catch more flies with honey than shit if he just said “why don’t you guys start your own fork of the kernel and leave mine the hell alone” because that’s actually more likely what it’s really about anyway. And really that’s the only issue that I have with Linus. I sure as hell don’t look up to him, but a lot of people do and I think it’s ridiculous that somebody who people look up to so much would be such a dick to people. Totally fair to have his own interests at heart though, he’s definitely earned it.
There’s tons of stuff in the Linux kernel that I’ve wanted to see improved that really just seem hopeless like:
- VRF
- nftables
- i915 11th gen was supposed to have SRIOV then they pushed it up to 12, now we’re up to gen 15 and I still don’t think they’ve even got it. One of Linus’ pet trolls, Phoronix even pointed this out, at least we can agree on something: https://www.phoronix.com/news/Intel-More-i915-For-Linux-6.7 looking at my wrist wondering where my watch is..
Hard to want to approach any of these with as many problems that Linux has going on right now. Git has this wonderful thing called “subtrees” and that’s probably more characteristic of what the Linux kernel should look like in 2025 given the sheer volume of shit that is there; where each subtree is a repository for a specific subsystem or driver that really has no relation to the core tree except for being a subtree, and the entire build system should be based loosely on that idea. People have tried to make offloading kernel development easier with shit like dracut and dracut is an absolute piece of crap (no offense to the dude who wrote it) but I hate it and it makes me cringe everytime I see the name. I also thing that building the kernel from a source tree just shouldn’t be as cumbersome as it is but it’s really no surprise given that it’s literally the same shit that since 2.6 and a lot of it probably predates even that.
With something like sub trees, you could probably inject them into a build process. I sorta always liked golang’s
rudimentary dependency management where packages are <repository>/owner/package
but I think in the case of golang
it might be too little for some. I think that could work for something like the Linux kernel, though because without
injecting anything into the tree, you’d be left with only what lives up to Linus’ standard and essentially that is
what is most important after all. At the same time I think it’s valuable to be able to build kernel code out of tree
when possible, and that should be taken into consideration as well.
git subtree is available in stock version of Git since May 2012 – v1.
This is not to be confused with git submodule
they are two separete things enitrely:
https://stackoverflow.com/questions/12349931/what-came-first-git-subtree-merge-strategy-or-git-submodule
My BHyve setup¶
Yeah anyway, sometimes you just need a different perspective, libvirt, kvm / qemu, Linux VRF/NetNS wasn’t doing it for
me. I really don’t like ip rule
on Linux either and there’s none of that on FreeBSD. When it matters, as far as routing
tables are concerned your interfaces have either fib
or tunnelfib
and interfaces which possess the ability to specify
tunnelfib
can operate on two different FIBs at the same time. Another thing that makes a lot more sense to me and
even before I really got deep into Linux networking back in 2005 bridges on FreeBSD where “enslaved” interfaces are still
configurable as interfaces while enslaved by the layer 2 bridge, and this is intended. You can also assign an address to
the bridge in addition to assigning addresses to your enslaved devices which is nice in my opinion.
A FIB or routing table alone is not quite the same thing as a VRF, but it is essentially how I’ll be using them. There is a small part of this, as it relates to FRR that I still need to figure out because the concept of a VRF in FRR still needs to apply and each VRF will need to identify with a particular FIB on FreeBSD (more on that later.)
bhyve-vm alone wasn’t enough¶
All of the configuration I’ll be referencing is available here: https://gist.github.com/paigeadelethompson/a94ff2e7cc4916d7feecef96936bb2d7 Typically to create a VM with my setup I run either:
./create_freebsd_vm.sh FBSDDEV1 -t fbsd-dev -v 14.2 -i tap5
./create_void_vm.sh SWARM3 -t swarm -i tap2
So it is necessary for now to create a tap
interface and assign it to the correct FIB before running these:
1ifconfig tap<N> create fib <N>
I wanted something that will:
- Create a VM from scratch (both Linux and FreeBSD) without needing to boot an ISO and manually go through an installer, partitioning, etc. (I made my own scripts for both Void and FreeBSD, which I’ve already mentioned)
bhyve-vm switch
doesn’t support FIB specification; it doesn’t need to really I can do all of the networking setup that I need inrc.conf
:
1chronyd_enable=YES
2dnsmasq_enable=YES
3sshd_enable=YES
4hostname=stelleri.netcrave.network
5powerd_enable=YES
6moused_nondefault_enable=NO
7dumpdev=NO
8zfs_enable=YES
9gateway_enable=YES
10#ipv6_gateway_enable=YES
11lldpd_enable=YES
12linux_enable=YES
13pf_enable=YES
14nfs_server_enable=YES
15nfsv4_server_enable=YES
16nfsuserd_enable=YES
17rpcbind_enable=YES
18mountd_enable=YES
19mountd_flags=-r
20vm_enable=YES
21vm_dir=zfs:storage/vm
22frr_enable=YES
23
24# LAN
25ifconfig_ix1="inet 192.168.1.128/24 fib 0"
26#ifconfig_ix1_ipv6="inet6 fcff:fff0::/64 fib 0"
27
28# Docker swarm
29ifconfig_igb0="inet 198.18.2.1/23 fib 8"
30#ifconfig_igb0_ipv6="inet6 fcff:8::/64 fib 8"
31
32# Home servers
33ifconfig_igb1="inet 192.168.65.129/25 fib 10"
34#ifconfig_igb1_ipv6="inet6 fcff:12::/64 fib 10"
35
36# Docker swarm servers VGW
37ifconfig_epair0a="192.0.0.0/31 fib 0 up"
38#ifconfig_epair0a_ipv6="inet6 fcff:ffff:8::a/64 fib 0 up"
39ifconfig_epair0b="192.0.0.1/31 fib 8 up"
40#ifconfig_epair0b_ipv6="inet6 fcff:ffff:8::b/64 fib 8 up"
41
42# Home servers VGW
43ifconfig_epair1a="192.0.0.2/31 fib 0 up"
44#ifconfig_epair1a_ipv6="inet6 fcff:ffff:10::a/64 fib 0 up"
45ifconfig_epair1b="192.0.0.3/31 fib 10 up"
46#ifconfig_epair1b_ipv6="inet6 fcff:ffff:10::b/64 fib 10 up"
47
48# Tailscale VGW
49ifconfig_epair2a="192.0.0.4/31 fib 0 up"
50#ifconfig_epair2a_ipv6="inet6 fcff:ffff:12::a/64 fib 0 up"
51ifconfig_epair2b="192.0.0.5/31 fib 12 up"
52#ifconfig_epair2b_ipv6="inet6 fcff:ffff:12::b/64 fib 12 up"
53
54# VM interfaces (FIB assignment)
55ifconfig_tap0="fib 8 up" # SWARM1
56ifconfig_tap1="fib 8 up" # SWARM2
57ifconfig_tap2="fib 8 up" # SWARM3
58ifconfig_tap3="fib 10 up" # HOME1
59ifconfig_tap4="fib 12 up" # TAILSCALE1
60ifconfig_tap5="fib 10 up" # FBSDDEV1
61
62# Docker swarm virtual switch
63ifconfig_bridge0="198.18.0.1/23 fib 8 up"
64#ifconfig_bridge0_ipv6="inet6 fcff:8::1/64 fib 8 up"
65ifconfig_bridge0_aliases="inet 169.254.169.254/16 alias addm igb0 addm tap0 addm tap1 addm tap2"
66
67# Home servers virtual switch
68ifconfig_bridge1="192.168.64.129/25 fib 10 up"
69#ifconfig_bridge1_ipv6="inet6 fcff:10::1/64 fib 10 up"
70ifconfig_bridge1_aliases="inet 169.254.169.254/16 alias addm igb1 addm tap3 addm tap5"
71
72# Tailscale virtual switch
73ifconfig_bridge2="192.0.2.1/30 fib 12 up"
74#ifconfig_bridge2_ipv6="inet6 fcff:12::1/64 fib 12 up"
75ifconfig_bridge2_aliases="inet 169.254.169.254/16 alias addm tap4"
76
77# This must list all interface variables for interfaces that don't exist yet
78cloned_interfaces="bridge0 bridge1 bridge2 epair0 epair1 epair2 \
79 tap0 tap1 tap2 tap3 tap4"
80
81# Core routes (FIB 0)
82route_fib0_swarm="-fib 0 -net 198.18.0.0/23 192.0.0.1" # 198.18.0.0 - 198.18.1.255
83#ipv6_route_fib0_swarm="-fib 0 -6 fcff:8::/48 fcff:ffff:8::b"
84route_fib0_home="-fib 0 -net 192.168.64.128/24 192.0.0.3" # My 192.168.64.0/20 (2nd /25 of 1st /24 of /20)
85#ipv6_route_fib0_home="-fib 0 -6 fcff:10::/48 fcff:ffff:10::b"
86route_fib0_ts="-fib 0 -net 192.0.2.0/30 192.0.0.5" # Tailcale VRF
87#ipv6_route_fib0_ts="-fib 0 -6 fcff:12::/48 fcff:ffff:12::b"
88route_fib0_egr_ts="-fib 0 -net 100.64.0.0/10 192.0.0.5" # Tailscale uses 100.64.0.0/10
89#ipv6_route_fib0_egr_ts="-fib 0 -6 fd7a:115c::/32 fcff:ffff:12::b"
90
91# Default egress (For all FIBs)
92route_fib0_default="-fib 0 default 192.168.1.1"
93route_fib8_default="-fib 8 default 192.0.0.0"
94#ipv6_route_fib8_default="-fib 8 -6 fcff::/7 fcff:ffff:8::a"
95route_fib10_default="-fib 10 default 192.0.0.2"
96#ipv6_route_fib10_default="-fib 10 -6 fcff::/7 fcff:ffff:10::a"
97route_fib12_default="-fib 12 default 192.0.0.4"
98#ipv6_route_fib12_default="-fib 12 -6 fcff::/7 fcff:ffff:12::a"
99
100
101# Egress to Tailscale (FIB 12)
102route_fib12_egr_ts="-fib 12 -net 100.64.0.0/10 192.0.2.2"
103#ipv6_route_fib12_egr_ts="-fib 12 -6 fd7a:115c::/32 fcff:12::192:0:2:2"
104
105# Null routes (All FIBs)
106route_fib8_null_fib0="-fib 8 -net 192.168.0.0/16 -reject" # Swarm to UDM & Home (and anything else)
107#ipv6_route_fib8_null_fib0="-fib 8 -6 fcff::/48 -reject"
108route_fib10_null_fib8="-fib 10 -net 198.18.0.0/15 -reject" # Home servers to Swarm
109#ipv6_route_fib10_null_fib8="-fib 10 -6 fcff:8::/48 -reject"
110route_fib12_null_fib0="-fib 12 -net 192.168.0.0/20 -reject" # 192.168.0.0/20 UDM Networks(LAN/WiFi/etc)
111#ipv6_route_fib12_null_fib0="-fib 12 -6 fcff::/48 -reject"
112route_fib0_null_vgw="-fib 0 -net 192.0.0.0/24 -reject" # Prevent forwarding for VGW addresses
113#ipv6_route_fib0_null_vgw="-fib 0 -6 fcff:ffff::/32 -reject"
114route_fib0_null_ll="-fib 0 -net 169.254.0.0/16 -reject" # Prevent forwarding for link-local
115
116# This must list all route variables
117static_routes="fib0_swarm fib0_home fib0_ts fib0_egr_ts fib0_default fib8_default \
118 fib10_default fib12_default fib12_egr_ts fib8_null_fib0 \
119 fib0_null_vgw fib0_null_ll fib10_null_fib8 fib12_null_fib0"
120
121# ipv6_static_routes="fib0_swarm fib0_home fib0_ts fib0_egr_ts fib8_default \
122# fib10_default fib12_default fib12_egr_ts fib8_null_fib0 \
123# fib0_null_vgw fib10_null_fib8 fib12_null_fib0"
Continuing the list:
- Zeroconf networking; This is made possible using
lldpd
on both the host and the guest, as well asavahi-autoipd
on the guest. I can runlldpctl
on the host to retrieve information about the guest:
1-------------------------------------------------------------------------------
2Interface: tap5, via: LLDP, RID: 13, Time: 0 day, 06:41:58
3 Chassis:
4 ChassisID: mac 58:9c:fc:0b:39:9f
5 SysName: FBSDDEV1
6 SysDescr: FreeBSD 14.2-RELEASE FreeBSD 14.2-RELEASE FreeBSD 14.2-RELEASE releng/14.2-n269506-c8918d6c7412 GENERIC amd64
7 MgmtIP: 169.254.10.136
8 MgmtIface: 1
9 MgmtIP: fe80::5a9c:fcff:fe0b:399f
10 MgmtIface: 1
11 Capability: Bridge, off
12 Capability: Router, off
13 Capability: Wlan, off
14 Capability: Station, on
15 Port:
16 PortID: mac 58:9c:fc:0b:39:9f
17 PortDescr: vtnet0
18 TTL: 120
19 PMD autoneg: supported: yes, enabled: yes
20 MAU oper type: 10GigBaseCX4 - X copper over 8 pair 100-Ohm balanced cable
21-------------------------------------------------------------------------------
and thus in addition to being able to vm console attach
the guest, I can also SSH it before it’s even setup:
1➜ /etc setfib -F 10 ssh -i /vm/FBSDDEV1/id_ed25519 root@169.254.10.136 "uname -a"
2FreeBSD FBSDDEV1 14.2-RELEASE FreeBSD 14.2-RELEASE releng/14.2-n269506-c8918d6c7412 GENERIC amd64
You may also notice that every bridge has the same IP address 169.254.169.254
specified and this is possible because
without being on the same routing table, they can’t overlap. In order to tell the operating system which routing table
should be used when looking up the route for the network, the setfib
command is used. It’s essentially the same thing
as ip vrf exec
or ip netns exec
if you’re familiar.
You might also be wondering “wtf is LLDP” and you should, because it’s bad ass: https://en.wikipedia.org/wiki/Link_Layer_Discovery_Protocol If ever there was a layer 2 protocol that I would want to have on everything it’d be this. It’s a decent compromise for lack of having SNMP and it just makes my life easier. You’ll also find that a lot of high end switches / routers also use LLDP:
1-------------------------------------------------------------------------------
2Interface: ix1, via: LLDP, RID: 2, Time: 3 days, 23:46:31
3 Chassis:
4 ChassisID: mac 78:45:58:6a:e2:b9
5 SysName: USW-Aggregation
6 SysDescr: UBNT-USL8A
7 Capability: Bridge, on
8 Port:
9 PortID: local Port 3
10 PortDescr: SFP_ 3
11 TTL: 120
12 VLAN: 1, pvid: yes
13 LLDP-MED:
14 Device Type: Network Connectivity Device
15 Capability: Capabilities, yes
16 Capability: Policy, yes
17-------------------------------------------------------------------------------
Serial console is always an option:
1➜ /etc vm console FBSDDEV1
2Connected
3
4
5FreeBSD/amd64 (FBSDDEV1) (ttyu0)
6
7login: root
8Apr 24 00:22:10 FBSDDEV1 login[772]: ROOT LOGIN (root) ON ttyu0
9FreeBSD 14.2-RELEASE (GENERIC) releng/14.2-n269506-c8918d6c7412
10
11Welcome to FreeBSD!
12
13Release Notes, Errata: https://www.FreeBSD.org/releases/
14Security Advisories: https://www.FreeBSD.org/security/
15FreeBSD Handbook: https://www.FreeBSD.org/handbook/
16FreeBSD FAQ: https://www.FreeBSD.org/faq/
17Questions List: https://www.FreeBSD.org/lists/questions/
18FreeBSD Forums: https://forums.FreeBSD.org/
19
20Documents installed with the system are in the /usr/local/share/doc/freebsd/
21directory, or can be installed later with: pkg install en-freebsd-doc
22For other languages, replace "en" with a language code like de or fr.
23
24Show the version of FreeBSD installed: freebsd-version ; uname -a
25Please include that output and any error messages when posting questions.
26Introduction to manual pages: man man
27FreeBSD directory layout: man hier
28
29To change this login announcement, see motd(5).
30root@FBSDDEV1:~ # ^D
31
32FreeBSD/amd64 (FBSDDEV1) (ttyu0)
33
34login:
35
36FreeBSD/amd64 (FBSDDEV1) (ttyu0)
37
38login: ~
39[EOT]
40➜ /etc
The setup scripts¶
On a high level, both the create_void_vm.sh
and create_freebsd_vm.sh
scripts both do the following:
- Create the VM; disk and configuration file from templates stored in
/vm/.templates
- Each VM disk is a zvol, configured in geom mode
- The VM disk is partitioned with a GPT label; FAT32 for the EFI partition and UFS2 for the root filesystem; FreeBSD doesn’t have userland tools for creating ext4 or btrfs filesystems (at least not in base userland) so I opted for using UFS2 which unfortunately requires an experimental option for write-support in Linux at the moment.
- The filesystems are formatted and mounted to a chroot path
- The base userland is downloaded and extracted in the chroot path
- A
setup.sh
script is created in the chroot path as well as an SSHauthorized_keys
file; anid_ed25519
is also generated and stored in the VM configuration directory in/vm
- A chroot is performed (in the case of Linux, FreeBSD has Linux binary compatibility which makes this possible)
- The
setup.sh
script runs in the chroot and sets up everything; it also in the case of Linux compiles a custom kernel with as many networking options as I could possibly scrape like a stoner desperately scraping weed resin to smoke from a pipe. Total disaster but I think I just about got everything that is needed for networking, iptables and nftables (Docker works at least.) Obviously not ideal, but it bypasses the need for an initramfs and the “experimental” UFS2 write support is enabled allowing this to work. - For freeBSD, the bootloader is a little more straight forward:
cp /boot/loader.efi /boot/efi/efi/boot/bootx64.efi
but on Linux, running GRUB or efibootmgr inside of a chroot on FreeBSD under Linux binary compatibility is a bit of a stretch, so I had to get creative and luckily BHyve’s coreboot comes with something called EFIShell: https://github.com/tianocore/tianocore.github.io/wiki/Efi-shell
EFIShell is the default EFI application when no boot device is specified in EFIVars. It first checks to see if a
startup.nsh
exists, and runs it if it does:
1fs0:\efi\boot\vmlinuz console=ttyS0 root=/dev/vda2 rootflags=ufstype=ufs2 rootfstype=ufs
That’s right, the Linux kernel itself works as an EFI application thanks to CONFIG_EFI_STUB
There’s admittedly a few
different ways including just naming the vmlinuz bootx64.efi
to make this work, although I found this way to be the
most convienent for the kernel cmdline
rather than an accompanying boot config, see:
- CONFIG_BOOT_CONFIG
- CONFIG_BOOT_CONFIG_FORCE
- CONFIG_BOOT_CONFIG_EMBED
for more information about that.
And so bootstrapping this was pretty straight forward both for freeBSD and Linux. Continuing with the VM creation scripts:
- After the chroot completes the filesystems are unmounted, and the
tap<N>
interface that was specified to the creation script is appended to the VM configuration.
The VMs boot, the work on the correct networks, I can find them with lldpctl
if not just vm attach <VM>
and ssh into
them…preferred because console attach kinda sucks. So what’s left?
Route leaking¶
This part I am still working on, because I want to use VRF in FRR and I want per-VRF BGP/OSPF. For the time being, there is a simple approach that seems to work:
Gives you something that looks kinda like this:
1stelleri.netcrave.network# show ip rip
2Codes: K - kernel route, C - connected, L - local, S - static,
3 R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
4 T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
5 f - OpenFabric, t - Table-Direct
6Sub-codes:
7 (n) - normal, (s) - static, (d) - default, (r) - redistribute,
8 (i) - interface
9
10 Network Next Hop Metric From Tag Time
11C(i) 192.168.1.0/24 0.0.0.0 1 self 0
12R(s) 192.168.64.128/29 0.0.0.0 1 self 0
And it appears to work as expected:
1 00:00:10.177590 98:b7:85:1e:de:4e > 01:00:5e:00:00:09, ethertype IPv4 (0x0800), length 66: (tos 0xc0, ttl 1, id 49765, offset 0, flags [none], proto UDP (17), length 52, bad cksum 0 (->5462)!)
2 192.168.1.128.520 > 224.0.0.9.520: [bad udp cksum 0xa263 -> 0x5645!]
3 RIPv2, Response, length: 24, routes: 1 or less
4 AFI IPv4, 192.168.64.128/29, tag 0x0000, metric: 1, next-hop: self
5 0x0000: 0202 0000 0002 0000 c0a8 4080 ffff fff8
6 0x0010: 0000 0000 0000 0001
Ideally I want to use BGP/OSPF but also being able to specify just a VRF from which to leak routes and a list of routes
that shouldn’t be leaked (eg: default
, 192.0.0.0/24
, 169.254.0.0/16
etc) sometimes simpler is better though.
pf.conf¶
The virtual gateways need to be able to route traffic to the internet:
1table <resvd_networks> { 0.0.0.0/8 10.0.0.0/8 100.64.0.0/10 127.0.0.0/8 169.254.0.0/16
2 172.16.0.0/12 192.0.0.0/24 192.0.2.0/24 192.88.99.0/24
3 192.168.0.0/16 198.18.0.0/15 198.51.100.0/24 203.0.113.0/24
4 224.0.0.0/4 233.252.0.0/24 240.0.0.0/4 255.255.255.255/32 }
5
6nat on ix1 inet from 198.18.0.0/23 to !<resvd_networks> -> ix1
7nat on ix1 inet from 192.168.64.128/25 to !<resvd_networks> -> ix1
8nat on ix1 inet from 192.0.2.0/30 to !<resvd_networks> -> ix1
The epair interfaces¶
These are used to create a link (single hop) between FIB 0 (the default FIB) and each of the other FIBs. It’s possible to configure a link between two other FIBs as well, but in this configuration currently I don’t have any use for ad-hoc networks. To prevent undesired routing between FIBs, null routes are used.
What’s left?¶
- IPv6; currently there is a problem with IPv6 and it doesn’t work with epairs the way I would expect:
1➜ stelleri ifconfig epair128 create
2epair128a
3➜ stelleri ifconfig epair128b inet6 fcff::b/64 fib 128
4➜ stelleri ifconfig epair128a inet6 fcff::a/64 fib 0
5➜ stelleri ping6 -S fcff::a fcff::b
6PING(56=40+8+8 bytes) fcff::a --> fcff::b
7^C
8--- fcff::b ping statistics ---
93 packets transmitted, 0 packets received, 100.0% packet loss
10➜ stelleri ndp -a
11Neighbor Linklayer Address Netif Expire S Flags
12fe80::df:98ff:feaf:9e0a%epair128a 02:df:98:af:9e:0a epair128a permanent R
13fcff::a 02:df:98:af:9e:0a epair128a permanent R
14fe80::df:98ff:feaf:9e0b%epair128b 02:df:98:af:9e:0b epair128b permanent R
15fcff::b 02:df:98:af:9e:0b epair128b permanent R
moving epair128b
back to FIB 0 we can get a different result:
1➜ stelleri ifconfig epair128b inet6 fcff::b/64 fib 0
2➜ stelleri ping6 -S fcff::a fcff::b
3PING(56=40+8+8 bytes) fcff::a --> fcff::b
416 bytes from fcff::b, icmp_seq=0 hlim=64 time=0.107 ms
5^C
6--- fcff::b ping statistics ---
71 packets transmitted, 1 packets received, 0.0% packet loss
8round-trip min/avg/max/stddev = 0.107/0.107/0.107/0.000 ms
9Neighbor Linklayer Address Netif Expire S Flags
10fe80::df:98ff:feaf:9e0a%epair128a 02:df:98:af:9e:0a epair128a permanent R
11fcff::a 02:df:98:af:9e:0a epair128a permanent R
12fe80::df:98ff:feaf:9e0b%epair128b 02:df:98:af:9e:0b epair128b permanent R
13fcff::b 02:df:98:af:9e:0b epair128b permanent R
Looking at the NDP entries You can’t really tell that anything is wrong, however if you move epair128b
back
to FIB 128:
1➜ stelleri ifconfig epair128b inet6 fcff::b/64 fib 128
2➜ stelleri ping6 -S fcff::a fcff::b
3PING(56=40+8+8 bytes) fcff::a --> fcff::b
416 bytes from fcff::b, icmp_seq=0 hlim=64 time=0.112 ms
516 bytes from fcff::b, icmp_seq=1 hlim=64 time=0.102 ms
616 bytes from fcff::b, icmp_seq=2 hlim=64 time=0.104 ms
716 bytes from fcff::b, icmp_seq=3 hlim=64 time=0.100 ms
816 bytes from fcff::b, icmp_seq=4 hlim=64 time=0.128 ms
9^C
10--- fcff::b ping statistics ---
115 packets transmitted, 5 packets received, 0.0% packet loss
12round-trip min/avg/max/stddev = 0.100/0.109/0.128/0.010 ms
13➜ stelleri ndp -a
14Neighbor Linklayer Address Netif Expire S Flags
15fe80::df:98ff:feaf:9e0a%epair128a 02:df:98:af:9e:0a epair128a permanent R
16fcff::a 02:df:98:af:9e:0a epair128a permanent R
17fe80::df:98ff:feaf:9e0b%epair128b 02:df:98:af:9e:0b epair128b permanent R
18fcff::b 02:df:98:af:9e:0b epair128b permanent R
It works, so I do believe this is an issue with NDP but I need to find somebody to help me triage this.
Impressions¶
This rocks, really don’t know how the hell I would get libvirt to do this but thankfully I don’t even have to think about it because I have this and it works. Creating a Docker network driver is also a pain in the ass, but I also don’t really need to do that either and really the only thing I care about is that there is some basic isolation between the swarm network and my home network:
1➜ stelleri setfib -F 8 ssh -i /vm/SWARM1/id_ed25519 admin@198.18.0.2 "sudo docker node inspect rkssiknkct3cc6tlg1nb5ptfw" | jq '.[] | .ManagerStatus'
2{
3 "Leader": true,
4 "Reachability": "reachable",
5 "Addr": "100.97.94.117:2377"
6}
7➜ stelleri setfib -F 8 ssh -i /vm/SWARM1/id_ed25519 admin@198.18.0.2 "sudo ping 100.97.94.117"
8PING 100.97.94.117 (100.97.94.117) 56(84) bytes of data.
964 bytes from 100.97.94.117: icmp_seq=1 ttl=60 time=162 ms
1064 bytes from 100.97.94.117: icmp_seq=2 ttl=60 time=159 ms
11^C
12➜ stelleri setfib -F 8 ssh -i /vm/SWARM1/id_ed25519 admin@198.18.0.2 "sudo ping 192.168.1.1"
13PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
14From 198.18.0.1 icmp_seq=1 Destination Host Unreachable
100.97.94.117 is a tailscale address
Things I’d like to improve¶
-
Adding another FIB for HE.net (need ICMP on WAN, and for IPv6 to work correctly with FIBs on FreeBSD)
-
Offloading some of the networking to another router; creating more of a buffer between the server that hosts live guests and a host that is separately responsbible for isolating the networks apart from each other. This will be very easy to do with or without VLANs; I would just have to add another device such as my Zimaboard which has been sitting on my shelf doing nothing for two years. In terms of what I want to host I’ll follow up on another blog post, but essentially I’m looking at something like:
-
HTTP/HTTPS/3:
1Internet -> Cloudflare -> (Origin certificate / mTLS authenticated) -> Traefik (docker swarm) -> *shrug* Matrix server I guess?
-
I would like to setup a BBS with synchronet, though I think Cloudflare free tier will be out of the question at least I haven’t seen anything that would lead me to believe that I can setup SNI routing and it relies on the service itself supporting PROXY; my HE.net tunnel would be good for this, it would just be IPv6-only.
-
Plenty of topics for other blog posts
I’d like to also add a FIB dedicated for a Squid forward proxy; and remove the default routes from the SWARM and HOME FIB;
this way they don’t have direct access to the internet but rather have to use HTTP_PROXY
which would be the Squid proxy.
This would be a nice way to air gap these networks and would allow for more control over what is actually reachable from
these networks.
-
sysctl net.inet.ip.accept_sourceroute
I haven’t tested this from a source spoofing perspective yet but I’ll bet there are problems. -
Multicast routing, pimd doesn’t quite work with FreeBSD’s FRR port atm: https://troglobit.com/howtos/pimd-on-freebsd/
Anyway, everything is working fine for the most part I’m pretty satisfied with it:
1➜ pub vm list
2NAME DATASTORE LOADER CPU MEMORY VNC AUTO STATE
3FBSDDEV1 default uefi 4 2048M - No Running (35391)
4HOME1 default uefi 4 2048M - No Running (17220)
5SWARM1 default uefi 4 2048M - No Running (3770)
6SWARM2 default uefi 4 2048M - No Running (3528)
7SWARM3 default uefi 4 2048M - No Running (3286)
8TAILSCALE1 default uefi 2 512M - No Running (8945)