Migrating the lab BACK to ESXi after a Proxmox Experiment

proxmox

A few months ago I migrated my home lab to Proxmox. After fighting with a few things, it’s time to move back to ESXi.


It must have been about two months ago that I decided I wanted to move away from ESXi. My VMUG renewal is coming up, I wasn’t very happy with the bloated beast that vCenter has become, and I found I very much enjoyed Proxmox from my usage of it on the NucNucNuc.

Proxmox

Here is a screenshot of my Proxmox cluster at its height.

With the way that Proxmox clustering works, I was able to re-use an old AspireRevo R3700 (the acropolis) with a D525 Atom as a cluster witness.

For me, one of the best parts of Proxmox were Linux containers. I had a few running, but in particular, my Plex container just felt so clean. Instead of running a huge bloated OS, I just ran a simple little container.

The dark underbelly of Proxmox

Unfortunately, things weren’t perfect. The immediate issue that I noticed was that my cluster was using a solid 30% more electricity than the same servers, basic VMs, and setup under ESXi. As I learned, Proxmox doesn’t do any power management, where ESXi does. And while you can install cpufrequtils and setup governors, apparently it can very negatively impact VMs.

My Proxmox cluster ran a solid 342 watts. Right now, the exact same hardware running under ESXi is at 303 watts. 11% is not an insignificant savings, especially when a focus of my lab has been to keep the power consumption lower.

Console my console

A HUGE negative for me was the console. I don’t use many Windows VMs, but the ones I do use need to work. Coming from ESXi, the NoVNC/SPICE consoles in Proxmox is at best, a hack. And finally, I didn’t want to mess with RDP.

Using VMWare Fusion, a console connection to a Windows VM almost feels native. With SPICE, I ended up needing to reboot Windows often because it would act like the CONTROL key was held down. And a NoVNC connection to a Windows VM was just impossible to use as the mouse cursor was unpredictable and often unusable.

This poor console experience also translated elsewhere. Proxmox doesn’t do well if you venture away from the official support list. I wanted to play with Solaris and OpenIndiana, and at least under Proxmox, it was not really workable. As I sometimes use “off the beaten path” software, this was difficult for me as well.

Bad bad Network

Proxmox also had another serious negative for me. I HATED the networking. I guess that’s the downside of coming from a polished system like ESXi, but in Proxmox it’s just uncomfortable. I even set up Open vSwitch networking instead of Linux bridges, and while it was better, I only liked it slightly more. Not to mention it appeared to have some issues when I started to subject it to serious load.

Any time I had to mess with any networking, touch any VLANs, etc, I just groaned. I’ve got at least 25 years of Linux experience, so it’s not difficult. It was just annoying.

There was also a bit of lag in almost all of my VMs that I was never able to attribute to anything. All the networking was set up properly, access to the “datastores” was fast and reliable, and all the VMs had sufficient resource allocation. Linux VMs ran slow and sluggish, and even with SPICE marginally working, Windows VMs were like clicking around in molasses.

The final straw

The quorum system is a bit of black magic to me. In more advanced setups, best practice says you set up completely separate NICs, switches, etc to make sure that your cluster doesn’t lose quorum. Because bad things happen.

Trust me, I found out.

The reality is, especially when you are using some NUCs and an Atom D525 net-top, it’s hard to set up “proper” redundancy. I was forced to use a bunch of USB NICs to try and achieve acceptable levels of redundancy, which was tenuous at best.

I’ve got a box of these guys now:

usb nics

So what happens when 3/6 of your quorum members are on the other end of a 4x1Gbps LAG and the uplink gets saturated even for a second during a VM migration? You lose quorum and bad things happen.

Let me just say. This does NOT mean it is a broken system. It just means it’s not working my MY setup. More than anything, it just felt like a bunch of extra work to keep things running at acceptable levels.

With enterprise hardware and proper setup, I’m still planning a production Proxmox migration later this year.

The Migration

Migration between hypervisors is actually surprisingly easy, assuming you have multiple hosts where you can run the new hypervisor on one of them.

  • Boot original VM with a Clonezilla ISO.
  • Create a VM on the new hypervisor. Make the new drive 1 GB larger than the original.
  • Boot the new VM with the Clonezilla ISO.
  • Go through the process of cloning. This involves going through a few menus on the original host and typing two commands on the new host in the command-line.
  • Shutdown old VM.
  • Install any drivers, change any network interface names, and install relevant guest agents (qemu_guest_agent/open_vm_tools).

With a decent Ethernet connection between the servers, the whole cloning process of a VM really only takes a few minutes.

Even if you don’t have multiple servers, as long as you have some sort of networked or external storage, you can Clonezilla all your VMs to flat file images, install the new hypervisor, and then Clonezilla the VMs back from the disk images.

Hello ESXi my old friend

I could never really get ESXi out of my system. Even under Proxmox, I had setup a small virtualized ESXi cluster to play with things. It’s just such a polished system, and it’s what my business uses.

After the split-brained quorum situation, I quickly made the decision to migrate back to ESXi fully. With the Clonezilla solution outlined above, it took all of about two hours.

The aftermath

Unfortunately for the poor D525, the “acropolis”, that means it’s the end of the road. This processor does not even support the VT-x required for virtualization. Even under Proxmox, at the very best it could run a very light container or two.

I was excited to give this ancient hardware new life as a quorom witness, but with ESXi, there is no place for it.

The Acropolis stands alone, waiting for the quorom which will never return.

rip acropolis

I haven’t had the heart to wipe it yet, so back on a shelf it went.

Please follow and like us: