Battle of the Virtual Routers

Battle of the Virtual Routers

This is going to be a bit of a long post, so I’ll be brief in the intro.  

Over the last few years, I’ve become enamored with routing.  As part of that, I’ve spent a lot of time exploring the various virtual routing platforms, trying to eek out the best performance possible.  So I decided to put them all head-to-head in a virtual setting and see how they fare.  

The victims:

  • pfSense 2.4.4-p3
  • OPNSense 19.7
  • VyOS 1.2.2
  • Mikrotik Cloud Hosted Router 6.45.3, unlimited license (CHR)
  • Just basic Debian Buster running FRR
  • All tests were done with iperf3 -c IP -P2.  I tried a number of different combinations of window sizes, number of streams, etc, and everything was pretty close to equal.  UDP obviously gave much bigger numbers, but I’m interested in TCP for right now.  

For the purpose of testing and to make all things equal, pfSense and OPNSense will be running with NAT disabled and pfctl -d, to remove the potential slow-downs from packet inspection.  The CHR, Debian, VyOS installs won’t have firewalls running.

If you are just interested in the final results, they are HERE.  Or maybe you’d like the results when I ran the test on ESXi HERE.


The Layout

They say a picture tells a thousand words, so I think a simple network diagram should illustrate the setup:

I’ll be replacing the firewall image in this picture with the various victims.

I’ve got a LOT of routing going on in my network.  At last count, I think I have at least 8 hops from one end of my network to the other, so OSPF simplifies adding new subnets behind various routers.  In most cases, I don’t even bother to add default routes to new hosts as OSPF floods the default route from my edge device.  


The Hardware

For these tests, I wanted to use KVM.  The network stack on ESXi has been driving me crazy lately, so I was hoping KVM would give me some decent results.  Proxmox makes this pretty trivial, so that’s what I set up.

  • Proxmox 6
  • rootonzfs on an SSD
  • 64GB of RAM
  • D-1521, with Dual X550/552 10G-BaseT ports

This is a more-than-capable machine for these tests, though I will say I’ve gotten better results on my Broadwell E5.

The networking config on the host is pretty simple. To further isolate things, I have two different ports running as Trunks.  Proxmox networking goes over one, and the networking for this test lab goes out of the other.

I did try using OVS, just to see if the results were any different, but they weren’t.  So on Proxmox, the network config is pretty simple:

auto lo
    iface lo inet loopback

    iface eno4 inet manual

    auto ens2
    iface eno2 inet manual

    auto eno3
    iface eno3 inet manual

    auto vmbr0.3
    iface vmbr0.3 inet static
       address 10.3.1.84/24
       gateway 10.3.1.1


    auto vmbr0
    iface vmbr0 inet manual
        bridge_ports eno3
        bridge_stp off
        bridge_fd 0
        bridge_vlan_aware yes

    auto vmbr1
    iface vmbr1 inet manual
        bridge_ports ens2
        bridge_stp off
        bridge_fd 0
        bridge_vlan_aware yes

All  VMs are connected to vmbr1.

The VMs should be fairly equal.  CPU-wise, I’ve gone with 2c2t.  RAM is 2GB, which is overkill for basically any of these


The Linux crowd

The first batch of routers are all based on Linux.  The VyOS version I used (1.2.2), is based on Debian Jessie with a modern 4.19 kernel and runs FRR.  The CHR is based on an ancient 3.x Linux kernel.  The plain Linux install will be the latest Debian Buster, with the repo version of FRR, 6.0.1.


VyOS

As a contributor, I have access to the current VyOS crux version, 1.2.2.  For routing, it uses FRR 7.0.1.

config

The config is painfully simple.  Three interfaces, WAN/LAN/LAN2, OSPF, DHCP on the LAN interfaces (for ease of use) and enabling SSH.

As initially mentioned, there’s been no firewall configured here.  I’m only interested in pure routing performance.

interfaces {
         ethernet eth0 {
             duplex auto
             smp-affinity auto
             speed auto
             vif 500 {
                 address 10.253.253.2/30
             }
             vif 2222 {
                 address 10.222.222.1/24
             }
             vif 2223 {
                 address 10.223.223.1/24
             }
         }
         loopback lo {
         }
     }
     protocols {
         ospf {
             area 0.0.0.0 {
                 network 10.253.253.0/30
             }
             redistribute {
                 connected {
                     metric-type 2
                 }
             }
         }
     }
     service {
         dhcp-server {
             shared-network-name VLAN2222 {
                 subnet 10.222.222.0/24 {
                     default-router 10.222.222.1
                     dns-server 10.53.53.53
                     range 0 {
                         start 10.222.222.10
                         stop 10.222.222.200
                     }
              }
              shared-network-name VLAN2223 {
                 subnet 10.223.223.0/24 {
                     default-router 10.223.223.1
                     dns-server 10.53.53.53
                     range 0 {
                         start 10.223.223.10
                         stop 10.223.223.200
                     }
                 }
             }
         }
         ssh {
             port 22
         }
     }

boot time

VyOS generally boots pretty quickly, at least with a simple config.  37 seconds from starting the VM to routing packets.  That’s also with the bootloader timeout.  

iperf3 results

Here are the results of a simple iperf3 test.  From 10.223.223.11 to 10.222.222.12  :

     [ ID] Interval           Transfer     Bitrate
     [  5]   0.00-10.04  sec  7.17 GBytes  6.14 Gbits/sec                  receiver
     [  8]   0.00-10.04  sec  7.22 GBytes  6.18 Gbits/sec                  receiver
     [SUM]   0.00-10.04  sec  14.4 GBytes  12.3 Gbits/sec                  receiver

I would be lying if I said I wasn’t at least a little bit disappointed with those results.  Especially since the direct VM->VM on the same layer2 test shows quite a bit more performance:

    [ ID] Interval           Transfer     Bitrate
    [  5]   0.00-10.04  sec  11.8 GBytes  10.1 Gbits/sec                  receiver
    [  8]   0.00-10.04  sec  11.7 GBytes  10.0 Gbits/sec                  receiver
    [SUM]   0.00-10.04  sec  23.5 GBytes  20.1 Gbits/sec                  receiver

I guess it’s just the hardware.  

I have a few VyOS VMs on ESXi as my main intervlan routing devices on bigger hardware, and those are capable of 15-30Gbps routing between VMs on the same host depending on how busy the host is.

speedtest

I have gigabit Internet, so I wanted to make sure results are at least close to what I would get natively.  

Given that this lab is behind a few routers, and my network is fairly busy at any point in time, these results are close enough to gigabit to count.  

CPU usage

I was curious what kind of CPU usage there would be under max routing load.  Obviously adding a firewall would add to this a bit:


Debian/FRR

The was just a plain Debian Buster install with FRR installed.  Honestly, most of the results are throwaway, since they are largely identical to VyOS.

config

There isn’t much in the way of special config here.  Enabling net.ipv4.ip_forward = 1 in /etc/sysctl.conf is a must.

Networking ( /etc/network/interfaces ):

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto ens18
iface ens18


auto vlan500
iface vlan500
  bridge-ports ens18.500
  bridge-stp on
  address 10.253.253.2/30
  gateway 10.253.253.1
  dns-nameservers 10.53.53.53


auto vlan2222
iface vlan2222
  bridge-ports ens18.2222
  bridge-stp on
    address 10.222.222.1/24

auto vlan2223
iface vlan2223
  bridge-ports ens18.2223
  bridge-stp on
    address 10.223.223.1/24

FRR ( /etc/frr/frr.conf ):

frr version 6.0.2
frr defaults traditional
hostname ovs
log syslog informational
service integrated-vtysh-config
!
router ospf
 redistribute connected
 network 10.253.253.0/30 area 0
!
line vty

isc-dhcpd ( /etc/dhcpd.conf ):

option domain-name-servers 10.53.53.53;

default-lease-time 600;
max-lease-time 7200;

subnet 10.222.222.0 netmask 255.255.255.0 {
    range 10.222.222.10 10.222.222.200;
    option routers 10.222.222.1;
}

subnet 10.223.223.0 netmask 255.255.255.0 {
    range 10.223.223.10 10.223.223.200;
    option routers 10.223.223.1;
}

boot time

This is the one metric that isn’t throwaway compared to VyOS.  19 seconds from power-on to routing is FAST, and that’s with 5 seconds of grub time.  This means if you are interested in super fast boot times, it’s probably worth your time learning how to set up a router/firewall on unadulterated Linux.

iperf3 results

As hinted at, the results are very similar to the VyOS results.  Not surprising considering same kernel versions, both are FRR, etc.

 [ ID] Interval           Transfer     Bitrate
 [  5]   0.00-10.04  sec  7.14 GBytes  6.11 Gbits/sec                  receiver
 [  8]   0.00-10.04  sec  7.08 GBytes  6.06 Gbits/sec                  receiver
 [SUM]   0.00-10.04  sec  14.2 GBytes  12.2 Gbits/sec                  receiver

speedtest

With these speedtest results, I must have found a rare moment when the rest of my network was mostly idle:

CPU usage during heavy routing

Again, similar and expected results:

All in all, the differences between FRR and VyOS were pretty minimal.  Despite the longer boot times, I would probably just  use VyOS, for the CLI and ease of configuration.


CHR

I’m a heavy Mikrotik user otherwise (multiple RB4011s, CRS317, 326, hex, haps, etc), and CHRs are something I feel like should really perform well.

This is where things started to get a bit funky.  I’ll be honest here, the CHR results were the ones I was most interested in.  I have NEVER been able to get decent routing results out of a CHR, I was hoping the proverbial “blank slate” would allow me to get the real picture.

The VM was a freshly installed (via systemrescuecd), 6.45.3 CHR, unlocked with a purchased unlimited license.  

config

The config here was painfully simple again.  A few interfaces and DHCP servers.

/interface ethernet 
set [ find default-name=ether1 ] disable-running-check=no name=Trunk
/interface vlan
add interface=Trunk name=LAN vlan-id=2222
add interface=Trunk name=LAN2 vlan-id=2223
add interface=Trunk name=WAN vlan-id=500
/interface wireless security-profiles
set [ find default=yes ] supplicant-identity=MikroTik
/ip pool
add name=dhcp_pool0 ranges=10.222.222.2-10.222.222.254
add name=dhcp_pool1 ranges=10.223.223.2-10.223.223.254
/ip dhcp-server
add address-pool=dhcp_pool0 disabled=no interface=LAN name=dhcp1
add address-pool=dhcp_pool1 disabled=no interface=LAN2 name=dhcp2
/routing ospf instance
set [ find default=yes ] redistribute-connected=as-type-2 router-id=10.253.253.2
/ip address
add address=10.253.253.2/30 interface=WAN network=10.253.253.0
add address=10.222.222.1/24 interface=LAN network=10.222.222.0
add address=10.223.223.1/24 interface=LAN2 network=10.223.223.0
/ip dhcp-server network
add address=10.222.222.0/24 gateway=10.222.222.1
add address=10.223.223.0/24 gateway=10.223.223.1
/ip dns
set servers=10.53.53.53
/routing ospf network
add area=backbone network=10.253.253.0/30
/system clock
set time-zone-name=America/Chicago

boot time

Frankly, the boot time was really about the only decent thing about my CHR tests.  12 seconds, which is in-line with the other Linux results, there’s just no 5 second timeout in grub to contend with here.

iperf3 results

These results really made me question my testing methodology.  Abysmal is about all I can say about it:

[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.04  sec  1.84 GBytes  1.58 Gbits/sec                  receiver
[  8]   0.00-10.04  sec  1.67 GBytes  1.43 Gbits/sec                  receiver
[SUM]   0.00-10.04  sec  3.51 GBytes  3.01 Gbits/sec                  receiver

I really spent a lot of time fighting these results.  They felt way off.  Some of the troubleshooting:

  • Messing with offload settings, both in the host and in the VM.  
  • Using different NIC types.  Maybe the CHR didn’t like virtio.
  • Using a Mellanox vs an Intel NIC.  The results were the same whether between the same bridge on Proxmox, to another physical host over the 10Gb Trunk link, etc.
  • Changing around all the VM’s involved CPU cores, RAM, NIC types, anything else I could think of.

In the end, I couldn’t do anything to really change the results.  They held pretty steady at 3Gbps.

speedtest

As mentioned above, probably a throwaway test.  I just happened to run this test when my network was a little bit busier.

cpu usage

Interestingly enough, the CPU usage on the CHR is quite a bit higher, despite also Linux.  I’m guessing this is attributed to the fact that quite a few networking enhancements and optimizations have occurred in later Linux kernel versions + FRR.

I was REALLY disappointed with the Mikrotik results here.  I know RouterOS is more than capable of going faster than those speeds, but I’m not sure what kind of tweaking it takes to get it.  Maybe you need lots of sources and destinations.  

I know my RB4011 can do almost full 10Gb (unless you are on IPv6 grumble grumble), CCRs can do way more, so I don’t know why this thing is so slow.  And it’s been like that every time I’ve tested it, whether on KVM or ESXi.  


FreeBSD

It definitely wouldn’t be a complete test without also testing two popular players in the FreeBSD world.  OPNSense and pfSense.

For these tests, a number of important configuration options were applied:

  • All hardware offload stuff disabled in the VMs.
  • Three VLAN interfaces, WAN/LAN/LAN2 as mapped out above.
  • FRR plugin installed.
  • NAT completely disabled.
  • pfctl -d .  While it didn’t really impact these tests, I wanted to make sure it was a level playing field.

pfSense

I’ll be honest.  I think these *Sense platforms aren’t really great routers.  Can they route intervlan?  Sure, but even with FRR somewhat native now, it’s just a pain and very clunky.

With pfSense, I never actually did get FRR running.  It starts, but it just doesn’t do anything.  Not to mention the GUI for it absolutely sucks and is just a confusing mess.  It doesn’t actually impact these tests, but I annoyingly had to create some static routes on my other routers to get everything working.  

boot time

Not fast, 37 seconds with a few seconds at the bootloader. I’m also positive I’ve seen this take considerably longer when it’s not a basic config.

iperf3 results

Again these results made me question my testing methodogy a bit:

[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.04  sec  1.04 GBytes   891 Mbits/sec                  receiver
[  8]   0.00-10.04  sec  1.97 GBytes  1.68 Gbits/sec                  receiver
[SUM]   0.00-10.04  sec  3.01 GBytes  2.57 Gbits/sec                  receiver  

It’s worth noting that I’ve personally seen about 6Gbps on decent hardware, with pf enabled, so I know some of this is due to to the hardware.  But I also did try some troubleshooting:

  • A variety of tweaks on both the host and in the VM with hardware offloading
  • Using E1000 NICs.  Yeah, that didn’t turn out too well
  • Breaking out the VLANs on the host instead of in the VM, and passing in three individual NICs.

Nothing really made too much of a difference.

speedtest

Welcome back to the land of probably-not-very-useful-benchmarks.  As before, this easily represents gigabit; my network was just busy.

cpu usage during routing

These CPU results are more within line of the CHR results above.  Meaning resource-wise, pfSense takes a bit more to route at high-ish (crappy) speeds.  


OPNSense

For the most part, OPNSense and pfSense performed mostly identically, to the point where it’s almost not worth the posting the results.  But I will.

I will point out that I was able to get FRR going in about 20 seconds on OPNSense.  Whereas on pfSense after hacking around with it for 20 minutes I still couldn’t get it running.

I’ll also say that I like some things about the OPNSense UI and absolutely hate others.  There are definitely some things buried in weird and unintuitive places.  

boot time

At 34 seconds with 2 seconds of bootloader, this is a few seconds quicker than pfSense.

iperf3 results

Within the same range as pfSense, similarly disappointing:

[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.03  sec  1.29 GBytes  1.10 Gbits/sec                  receiver
[  8]   0.00-10.03  sec  1.80 GBytes  1.54 Gbits/sec                  receiver
[SUM]   0.00-10.03  sec  3.09 GBytes  2.65 Gbits/sec                  receiver

speedtest

Yet another basically-gigabit screenshot:

cpu during routing

As with pfSense, the CPU definitely spikes and stays a bit higher during heavy routing.  


Final Results

After all these tests, it appears I’ll probably be sticking to VyOS or Debian for my routing needs.  Mostly just for pure routing tasks, which is a lot of what I end up labbing up,  Linux seems to do it a bit better.

RouterBoot Time*Intervlan iperf3WAN Speedtest**
VyOS37 seconds***12.3 Gbps878Mbps down/918Mbps up
Debian/FRR19 seconds12.2 Gbps957Mbps down/932Mbps up
CHR12 seconds3.01 Gbps845Mbps down/868Mbps up
OPNSense34 seconds2.65 Gbps888Mbps down/930Mbps up
pfSense37 seconds2.57 Gbps874Mbps down/896Mbps up
  • * boot times all include bootloader waits.  Linux-based was usually about 5 seconds (except CHR).  FreeBSD-based is 2 seconds.
  • ** All results are within a margin of error that says “gigabit-capable“.  My network was somewhat busy at the time I ran the speedtests, not to mention this lab is buried behind two other different routers.
  • *** I’ve seen VyOS take considerably longer once more complex configurations are applied. Upwards of a minute or more.  

As mentioned, I’ve seen pfSense run up to 6-7Gbps.  So the slow speeds on pfSense and OPNSense here are almost certainly due to the host.  Along the same line, my VyOS routers that run on my E5-2540v4 can route at 30Gbps or faster, so the 12Gbps observed here is slow.  

As far as the CHR is concerned, I’m not sure.  I’ve done extensive testing with them, and even with the unlimited license, I’ve never see very good performance out of them, at least not without 100 hosts behind them.  


The ESXi Mutation

So before I close this up, I decided to spin up one more set of tests on ESXi.  The painfully low pfSense and CHR numbers made me really believe that there was an incompatibility somewhere.  Maybe  CHR and pfSense just really don’t like the virtio drivers.

I also only tested VyOS, CHR, and pfSense, since the Debian and OPNSense numbers were largely duplicative.

RouterBoot TimeIntervlan iperf3
VyOS32 seconds17.3 Gbps
CHR17 seconds7.63 Gbps
pfSense33 seconds5.61 Gbps

As far as boot times are concerned, the shorter times are caused by being on faster storage, though I don’t know why the CHR decided to be an outlier.  

For VyOS, the slightly increased number tracks with my expectations due to CPU and memory bandwidth.  My ESXi server is an E5-2640v4, which runs circles around a D-1521.  

For the CHR and pfSense, I wouldn’t expect the numbers to be over double just due to the platform change, so I have to believe it’s mostly due to some driver situation.

Please follow and like us: