This is going to be a bit of a long post, so I’ll be brief in the intro.
Over the last few years, I’ve become enamored with routing. As part of that, I’ve spent a lot of time exploring the various virtual routing platforms, trying to eek out the best performance possible. So I decided to put them all head-to-head in a virtual setting and see how they fare.
The victims:
- pfSense 2.4.4-p3
- OPNSense 19.7
- VyOS 1.2.2
- Mikrotik Cloud Hosted Router 6.45.3, unlimited license (CHR)
- Just basic Debian Buster running FRR
- All tests were done with
iperf3 -c IP -P2
. I tried a number of different combinations of window sizes, number of streams, etc, and everything was pretty close to equal. UDP obviously gave much bigger numbers, but I’m interested in TCP for right now.
For the purpose of testing and to make all things equal, pfSense and OPNSense will be running with NAT disabled and pfctl -d
, to remove the potential slow-downs from packet inspection. The CHR, Debian, VyOS installs won’t have firewalls running.
If you are just interested in the final results, they are HERE. Or maybe you’d like the results when I ran the test on ESXi HERE.
The Layout
They say a picture tells a thousand words, so I think a simple network diagram should illustrate the setup:
I’ll be replacing the firewall image in this picture with the various victims.
I’ve got a LOT of routing going on in my network. At last count, I think I have at least 8 hops from one end of my network to the other, so OSPF simplifies adding new subnets behind various routers. In most cases, I don’t even bother to add default routes to new hosts as OSPF floods the default route from my edge device.
The Hardware
For these tests, I wanted to use KVM. The network stack on ESXi has been driving me crazy lately, so I was hoping KVM would give me some decent results. Proxmox makes this pretty trivial, so that’s what I set up.
- Proxmox 6
- rootonzfs on an SSD
- 64GB of RAM
- D-1521, with Dual X550/552 10G-BaseT ports
This is a more-than-capable machine for these tests, though I will say I’ve gotten better results on my Broadwell E5.
The networking config on the host is pretty simple. To further isolate things, I have two different ports running as Trunks. Proxmox networking goes over one, and the networking for this test lab goes out of the other.
I did try using OVS, just to see if the results were any different, but they weren’t. So on Proxmox, the network config is pretty simple:
auto lo
iface lo inet loopback
iface eno4 inet manual
auto ens2
iface eno2 inet manual
auto eno3
iface eno3 inet manual
auto vmbr0.3
iface vmbr0.3 inet static
address 10.3.1.84/24
gateway 10.3.1.1
auto vmbr0
iface vmbr0 inet manual
bridge_ports eno3
bridge_stp off
bridge_fd 0
bridge_vlan_aware yes
auto vmbr1
iface vmbr1 inet manual
bridge_ports ens2
bridge_stp off
bridge_fd 0
bridge_vlan_aware yes
All VMs are connected to vmbr1
.
The VMs should be fairly equal. CPU-wise, I’ve gone with 2c2t. RAM is 2GB, which is overkill for basically any of these
The Linux crowd
The first batch of routers are all based on Linux. The VyOS version I used (1.2.2), is based on Debian Jessie with a modern 4.19 kernel and runs FRR. The CHR is based on an ancient 3.x Linux kernel. The plain Linux install will be the latest Debian Buster, with the repo version of FRR, 6.0.1.
VyOS
As a contributor, I have access to the current VyOS crux
version, 1.2.2. For routing, it uses FRR 7.0.1.
config
The config is painfully simple. Three interfaces, WAN/LAN/LAN2, OSPF, DHCP on the LAN interfaces (for ease of use) and enabling SSH.
As initially mentioned, there’s been no firewall configured here. I’m only interested in pure routing performance.
interfaces {
ethernet eth0 {
duplex auto
smp-affinity auto
speed auto
vif 500 {
address 10.253.253.2/30
}
vif 2222 {
address 10.222.222.1/24
}
vif 2223 {
address 10.223.223.1/24
}
}
loopback lo {
}
}
protocols {
ospf {
area 0.0.0.0 {
network 10.253.253.0/30
}
redistribute {
connected {
metric-type 2
}
}
}
}
service {
dhcp-server {
shared-network-name VLAN2222 {
subnet 10.222.222.0/24 {
default-router 10.222.222.1
dns-server 10.53.53.53
range 0 {
start 10.222.222.10
stop 10.222.222.200
}
}
shared-network-name VLAN2223 {
subnet 10.223.223.0/24 {
default-router 10.223.223.1
dns-server 10.53.53.53
range 0 {
start 10.223.223.10
stop 10.223.223.200
}
}
}
}
ssh {
port 22
}
}
boot time
VyOS generally boots pretty quickly, at least with a simple config. 37 seconds from starting the VM to routing packets. That’s also with the bootloader timeout.
iperf3 results
Here are the results of a simple iperf3 test. From 10.223.223.11
to 10.222.222.12
:
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.04 sec 7.17 GBytes 6.14 Gbits/sec receiver
[ 8] 0.00-10.04 sec 7.22 GBytes 6.18 Gbits/sec receiver
[SUM] 0.00-10.04 sec 14.4 GBytes 12.3 Gbits/sec receiver
I would be lying if I said I wasn’t at least a little bit disappointed with those results. Especially since the direct VM->VM on the same layer2 test shows quite a bit more performance:
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.04 sec 11.8 GBytes 10.1 Gbits/sec receiver
[ 8] 0.00-10.04 sec 11.7 GBytes 10.0 Gbits/sec receiver
[SUM] 0.00-10.04 sec 23.5 GBytes 20.1 Gbits/sec receiver
I guess it’s just the hardware.
I have a few VyOS VMs on ESXi as my main intervlan routing devices on bigger hardware, and those are capable of 15-30Gbps routing between VMs on the same host depending on how busy the host is.
speedtest
I have gigabit Internet, so I wanted to make sure results are at least close to what I would get natively.
Given that this lab is behind a few routers, and my network is fairly busy at any point in time, these results are close enough to gigabit to count.
CPU usage
I was curious what kind of CPU usage there would be under max routing load. Obviously adding a firewall would add to this a bit:
Debian/FRR
The was just a plain Debian Buster install with FRR installed. Honestly, most of the results are throwaway, since they are largely identical to VyOS.
config
There isn’t much in the way of special config here. Enabling net.ipv4.ip_forward = 1
in /etc/sysctl.conf
is a must.
Networking ( /etc/network/interfaces
):
# The loopback network interface
auto lo
iface lo inet loopback
# The primary network interface
auto ens18
iface ens18
auto vlan500
iface vlan500
bridge-ports ens18.500
bridge-stp on
address 10.253.253.2/30
gateway 10.253.253.1
dns-nameservers 10.53.53.53
auto vlan2222
iface vlan2222
bridge-ports ens18.2222
bridge-stp on
address 10.222.222.1/24
auto vlan2223
iface vlan2223
bridge-ports ens18.2223
bridge-stp on
address 10.223.223.1/24
FRR ( /etc/frr/frr.conf
):
frr version 6.0.2
frr defaults traditional
hostname ovs
log syslog informational
service integrated-vtysh-config
!
router ospf
redistribute connected
network 10.253.253.0/30 area 0
!
line vty
isc-dhcpd ( /etc/dhcpd.conf
):
option domain-name-servers 10.53.53.53;
default-lease-time 600;
max-lease-time 7200;
subnet 10.222.222.0 netmask 255.255.255.0 {
range 10.222.222.10 10.222.222.200;
option routers 10.222.222.1;
}
subnet 10.223.223.0 netmask 255.255.255.0 {
range 10.223.223.10 10.223.223.200;
option routers 10.223.223.1;
}
boot time
This is the one metric that isn’t throwaway compared to VyOS. 19 seconds from power-on to routing is FAST, and that’s with 5 seconds of grub time. This means if you are interested in super fast boot times, it’s probably worth your time learning how to set up a router/firewall on unadulterated Linux.
iperf3 results
As hinted at, the results are very similar to the VyOS results. Not surprising considering same kernel versions, both are FRR, etc.
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.04 sec 7.14 GBytes 6.11 Gbits/sec receiver
[ 8] 0.00-10.04 sec 7.08 GBytes 6.06 Gbits/sec receiver
[SUM] 0.00-10.04 sec 14.2 GBytes 12.2 Gbits/sec receiver
speedtest
With these speedtest results, I must have found a rare moment when the rest of my network was mostly idle:
CPU usage during heavy routing
Again, similar and expected results:
All in all, the differences between FRR and VyOS were pretty minimal. Despite the longer boot times, I would probably just use VyOS, for the CLI and ease of configuration.
CHR
I’m a heavy Mikrotik user otherwise (multiple RB4011s, CRS317, 326, hex, haps, etc), and CHRs are something I feel like should really perform well.
This is where things started to get a bit funky. I’ll be honest here, the CHR results were the ones I was most interested in. I have NEVER been able to get decent routing results out of a CHR, I was hoping the proverbial “blank slate” would allow me to get the real picture.
The VM was a freshly installed (via systemrescuecd), 6.45.3 CHR, unlocked with a purchased unlimited license.
config
The config here was painfully simple again. A few interfaces and DHCP servers.
/interface ethernet
set [ find default-name=ether1 ] disable-running-check=no name=Trunk
/interface vlan
add interface=Trunk name=LAN vlan-id=2222
add interface=Trunk name=LAN2 vlan-id=2223
add interface=Trunk name=WAN vlan-id=500
/interface wireless security-profiles
set [ find default=yes ] supplicant-identity=MikroTik
/ip pool
add name=dhcp_pool0 ranges=10.222.222.2-10.222.222.254
add name=dhcp_pool1 ranges=10.223.223.2-10.223.223.254
/ip dhcp-server
add address-pool=dhcp_pool0 disabled=no interface=LAN name=dhcp1
add address-pool=dhcp_pool1 disabled=no interface=LAN2 name=dhcp2
/routing ospf instance
set [ find default=yes ] redistribute-connected=as-type-2 router-id=10.253.253.2
/ip address
add address=10.253.253.2/30 interface=WAN network=10.253.253.0
add address=10.222.222.1/24 interface=LAN network=10.222.222.0
add address=10.223.223.1/24 interface=LAN2 network=10.223.223.0
/ip dhcp-server network
add address=10.222.222.0/24 gateway=10.222.222.1
add address=10.223.223.0/24 gateway=10.223.223.1
/ip dns
set servers=10.53.53.53
/routing ospf network
add area=backbone network=10.253.253.0/30
/system clock
set time-zone-name=America/Chicago
boot time
Frankly, the boot time was really about the only decent thing about my CHR tests. 12 seconds, which is in-line with the other Linux results, there’s just no 5 second timeout in grub to contend with here.
iperf3 results
These results really made me question my testing methodology. Abysmal is about all I can say about it:
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.04 sec 1.84 GBytes 1.58 Gbits/sec receiver
[ 8] 0.00-10.04 sec 1.67 GBytes 1.43 Gbits/sec receiver
[SUM] 0.00-10.04 sec 3.51 GBytes 3.01 Gbits/sec receiver
I really spent a lot of time fighting these results. They felt way off. Some of the troubleshooting:
- Messing with offload settings, both in the host and in the VM.
- Using different NIC types. Maybe the CHR didn’t like virtio.
- Using a Mellanox vs an Intel NIC. The results were the same whether between the same bridge on Proxmox, to another physical host over the 10Gb Trunk link, etc.
- Changing around all the VM’s involved CPU cores, RAM, NIC types, anything else I could think of.
In the end, I couldn’t do anything to really change the results. They held pretty steady at 3Gbps.
speedtest
As mentioned above, probably a throwaway test. I just happened to run this test when my network was a little bit busier.
cpu usage
Interestingly enough, the CPU usage on the CHR is quite a bit higher, despite also Linux. I’m guessing this is attributed to the fact that quite a few networking enhancements and optimizations have occurred in later Linux kernel versions + FRR.
I was REALLY disappointed with the Mikrotik results here. I know RouterOS is more than capable of going faster than those speeds, but I’m not sure what kind of tweaking it takes to get it. Maybe you need lots of sources and destinations.
I know my RB4011 can do almost full 10Gb (unless you are on IPv6 grumble grumble), CCRs can do way more, so I don’t know why this thing is so slow. And it’s been like that every time I’ve tested it, whether on KVM or ESXi.
FreeBSD
It definitely wouldn’t be a complete test without also testing two popular players in the FreeBSD world. OPNSense and pfSense.
For these tests, a number of important configuration options were applied:
- All hardware offload stuff disabled in the VMs.
- Three VLAN interfaces, WAN/LAN/LAN2 as mapped out above.
- FRR plugin installed.
- NAT completely disabled.
pfctl -d
. While it didn’t really impact these tests, I wanted to make sure it was a level playing field.
pfSense
I’ll be honest. I think these *Sense platforms aren’t really great routers. Can they route intervlan? Sure, but even with FRR somewhat native now, it’s just a pain and very clunky.
With pfSense, I never actually did get FRR running. It starts, but it just doesn’t do anything. Not to mention the GUI for it absolutely sucks and is just a confusing mess. It doesn’t actually impact these tests, but I annoyingly had to create some static routes on my other routers to get everything working.
boot time
Not fast, 37 seconds with a few seconds at the bootloader. I’m also positive I’ve seen this take considerably longer when it’s not a basic config.
iperf3 results
Again these results made me question my testing methodogy a bit:
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.04 sec 1.04 GBytes 891 Mbits/sec receiver
[ 8] 0.00-10.04 sec 1.97 GBytes 1.68 Gbits/sec receiver
[SUM] 0.00-10.04 sec 3.01 GBytes 2.57 Gbits/sec receiver
It’s worth noting that I’ve personally seen about 6Gbps on decent hardware, with pf
enabled, so I know some of this is due to to the hardware. But I also did try some troubleshooting:
- A variety of tweaks on both the host and in the VM with hardware offloading
- Using E1000 NICs. Yeah, that didn’t turn out too well
- Breaking out the VLANs on the host instead of in the VM, and passing in three individual NICs.
Nothing really made too much of a difference.
speedtest
Welcome back to the land of probably-not-very-useful-benchmarks. As before, this easily represents gigabit; my network was just busy.
cpu usage during routing
These CPU results are more within line of the CHR results above. Meaning resource-wise, pfSense takes a bit more to route at high-ish (crappy) speeds.
OPNSense
For the most part, OPNSense and pfSense performed mostly identically, to the point where it’s almost not worth the posting the results. But I will.
I will point out that I was able to get FRR going in about 20 seconds on OPNSense. Whereas on pfSense after hacking around with it for 20 minutes I still couldn’t get it running.
I’ll also say that I like some things about the OPNSense UI and absolutely hate others. There are definitely some things buried in weird and unintuitive places.
boot time
At 34 seconds with 2 seconds of bootloader, this is a few seconds quicker than pfSense.
iperf3 results
Within the same range as pfSense, similarly disappointing:
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.03 sec 1.29 GBytes 1.10 Gbits/sec receiver
[ 8] 0.00-10.03 sec 1.80 GBytes 1.54 Gbits/sec receiver
[SUM] 0.00-10.03 sec 3.09 GBytes 2.65 Gbits/sec receiver
speedtest
Yet another basically-gigabit screenshot:
cpu during routing
As with pfSense, the CPU definitely spikes and stays a bit higher during heavy routing.
Final Results
After all these tests, it appears I’ll probably be sticking to VyOS or Debian for my routing needs. Mostly just for pure routing tasks, which is a lot of what I end up labbing up, Linux seems to do it a bit better.
Router | Boot Time* | Intervlan iperf3 | WAN Speedtest** |
---|---|---|---|
VyOS | 37 seconds*** | 12.3 Gbps | 878Mbps down/918Mbps up |
Debian/FRR | 19 seconds | 12.2 Gbps | 957Mbps down/932Mbps up |
CHR | 12 seconds | 3.01 Gbps | 845Mbps down/868Mbps up |
OPNSense | 34 seconds | 2.65 Gbps | 888Mbps down/930Mbps up |
pfSense | 37 seconds | 2.57 Gbps | 874Mbps down/896Mbps up |
- * boot times all include bootloader waits. Linux-based was usually about 5 seconds (except CHR). FreeBSD-based is 2 seconds.
- ** All results are within a margin of error that says “gigabit-capable“. My network was somewhat busy at the time I ran the speedtests, not to mention this lab is buried behind two other different routers.
- *** I’ve seen VyOS take considerably longer once more complex configurations are applied. Upwards of a minute or more.
As mentioned, I’ve seen pfSense run up to 6-7Gbps. So the slow speeds on pfSense and OPNSense here are almost certainly due to the host. Along the same line, my VyOS routers that run on my E5-2540v4 can route at 30Gbps or faster, so the 12Gbps observed here is slow.
As far as the CHR is concerned, I’m not sure. I’ve done extensive testing with them, and even with the unlimited license, I’ve never see very good performance out of them, at least not without 100 hosts behind them.
The ESXi Mutation
So before I close this up, I decided to spin up one more set of tests on ESXi. The painfully low pfSense and CHR numbers made me really believe that there was an incompatibility somewhere. Maybe CHR and pfSense just really don’t like the virtio drivers.
I also only tested VyOS, CHR, and pfSense, since the Debian and OPNSense numbers were largely duplicative.
Router | Boot Time | Intervlan iperf3 |
---|---|---|
VyOS | 32 seconds | 17.3 Gbps |
CHR | 17 seconds | 7.63 Gbps |
pfSense | 33 seconds | 5.61 Gbps |
As far as boot times are concerned, the shorter times are caused by being on faster storage, though I don’t know why the CHR decided to be an outlier.
For VyOS, the slightly increased number tracks with my expectations due to CPU and memory bandwidth. My ESXi server is an E5-2640v4, which runs circles around a D-1521.
For the CHR and pfSense, I wouldn’t expect the numbers to be over double just due to the platform change, so I have to believe it’s mostly due to some driver situation.