[ HOMELAB ]·#0001·7 min read··status: resolved

How One NIC Took Down My Entire Proxmox Cluster

I run a two-node Proxmox cluster at home called ulsterpve on 10.10.10.30 and pve2 on 10.10.10.20. It hosts everything that matters on my network: DNS, the reverse proxy, Nextcloud, a stack of containers, and the VMs I actually get work done in.

Last month I did something completely routine. I shut one node down, slotted in an extra network card, and powered it back on.

The node came back with no network, no cluster quorum, and no storage. Three separate subsystems fell over from one hardware change, and each failure was hiding the next one behind it. This is the write-up I wish I’d found while I was staring at a dead console at 11pm.

The thirty-second version

Adding a PCIe NIC reshuffled the PCI bus. That renamed my onboard interface. Because Proxmox’s network config points at interfaces by name, the node booted with its cluster IP living on an interface that no longer existed. No IP meant corosync couldn’t join the ring, so the cluster lost quorum. And a stale ZFS cache file meant the data pool didn’t auto-import, so nothing that lived on it would start.

One change, three failures, stacked. Let’s take them in order.

The setup

Nothing exotic. Each node is a single box with ZFS on root (rpool) and a separate data pool (tank) for VM and container disks. The two nodes talk to each other over 10.10.10.0/24, and corosync uses those addresses for its cluster ring. Standard homelab Proxmox.

The only change was physical: one extra NIC, seated in the first free PCIe slot on pve.

What actually broke

1. Networking — the root cause

Modern Linux uses predictable interface names — things like enp2s0 — derived from where the card sits on the PCI bus. That’s normally a good thing. The catch nobody mentions: predictable means derived from hardware position, and adding a card changes hardware positions.

When I seated the new NIC, the kernel re-enumerated the bus. My onboard adapter, which had always been enp2s0, came back as enp5s0. My /etc/network/interfaces still said enp2s0. That interface no longer existed, so its vmbr0 bridge came up with no carrier and the node had no IP on the network.

Everything downstream flowed from this.

2. corosync — the collateral damage

corosync binds its ring to the node’s cluster IP. No IP, no ring. From pve2’s point of view its partner had simply vanished, and a two-node cluster that loses a node loses quorum. pvecm status on the surviving node showed one vote where there should be two, and the cluster went read-only to protect itself. Anything HA-managed got fenced.

Worth being clear about the causation here: corosync wasn’t broken. It was doing exactly its job. It just had no network to do it on, because of failure #1.

3. ZFS — the one that stopped the boot

This is where the evening got longer. On the reboot, the data pool tank didn’t auto-import. Proxmox relies on a cache file at /etc/zfs/zpool.cache to know which pools to bring up early, and after the recent 8.4 → 9.1 upgrade plus the hardware change, that cache was stale. On one boot the root pool import stalled too, and I got dropped to an initramfs shell before the system was even up.

So now I had a node that, on a bad boot, wouldn’t fully come up at all — which neatly hid the networking problem underneath it.

Getting back in

The golden rule when three things are broken: get to a console that doesn’t depend on any of them. Physical monitor and keyboard, or IPMI/iKVM if the board has it. Do not try to fix a networking problem over the network you just broke.

If the boot drops to initramfs (root pool didn’t import), import it and continue:

# from the initramfs shell
zpool import -N rpool      # import without mounting, -f if it insists
exit                        # let the boot continue

If the box boots but is unreachable, log in at the local console as root and work from there. If you need to intervene before services start — for example the boot is hanging waiting on the network — interrupt GRUB, edit the boot entry, and append to the linux line:

init=/bin/bash

Ctrl-X to boot into a raw root shell, then make the filesystem writable before you change anything:

mount -o remount,rw /

The fix, step by step

Step 1 — find out what your interfaces are actually called now

ip -br link                        # short list of every interface + state
udevadm info /sys/class/net/*      # the detail, including the MAC per device
dmesg | grep -i -E 'eth|enp|link'  # what the kernel named things at boot

Note the MAC address of the interface you care about. The name changed; the MAC didn’t. The MAC is the stable identity you’re going to anchor everything to.

Step 2 — get the network back up (the quick fix)

Point /etc/network/interfaces at the new name (enp5s0 in my case), then reload. Proxmox uses ifupdown2, so:

ifreload -a
ip a                # confirm the bridge has your cluster IP again
ping -c3 10.10.10.1 # confirm you can reach the gateway

That gets you online. But it’s a trap to stop here — the name will drift again the next time you touch the hardware. On to the real fix.

Step 3 — pin the interface name to its MAC (the permanent fix)

This is the whole point of the post. Instead of hoping the kernel names things consistently, you tell it. A systemd .link file matches on the MAC and forces a name that never changes:

# /etc/systemd/network/10-nic-cluster.link
[Match]
MACAddress=aa:bb:cc:dd:ee:ff

[Link]
Name=cluster0

Now reference cluster0 in /etc/network/interfaces instead of any enpXsY name. Rebuild the initramfs so the rule applies early in boot:

update-initramfs -u -k all

From here on, that physical port is cluster0 forever, whatever slot anything else is plugged into. You can add one .link file per NIC and stop thinking about PCI enumeration entirely.

Step 4 — make ZFS import reliably

Rewrite the cache file so the pools are known at boot, and make sure the import services are enabled:

zpool set cachefile=/etc/zfs/zpool.cache rpool
zpool set cachefile=/etc/zfs/zpool.cache tank
systemctl enable zfs-import-cache.service zfs-import.target
update-initramfs -u -k all     # so a root-pool import is baked into initramfs

Step 5 — pin a known-good kernel

Because this all overlapped with the 8.4 → 9.1 upgrade, I wanted to remove “new kernel” as a variable while I stabilised. Proxmox makes this clean — list what’s installed, then pin the one you trust:

proxmox-boot-tool kernel list
proxmox-boot-tool kernel pin <your-known-good-version>
proxmox-boot-tool refresh

Unpin later with proxmox-boot-tool kernel unpin once you’re happy the newer kernel behaves.

Verifying it’s genuinely fixed

Don’t trust it until all three subsystems report healthy:

ip -br a                 # cluster0 is up with the right IP
pvecm status             # two nodes, quorum present
corosync-cfgtool -s      # ring is connected, no faults
zpool status             # both pools ONLINE
systemctl --failed       # nothing failed at boot

Then — and this is the part it’s tempting to skip — reboot on purpose and watch it come up clean. A fix you haven’t rebooted into is a hypothesis, not a fix.

What I took away from it

  • A NIC is a config change, not just a hardware change. Any card that alters the PCI bus can rename every interface on the box. Treat “I’m adding a network card” with the same caution as “I’m editing /etc/network/interfaces”, because functionally that’s what you’re doing.
  • Pin interface names to MACs before you need to. The .link file takes two minutes and it’s the difference between a five-minute hardware swap and a lost evening. If I’d done this on day one, none of the rest would have happened.
  • Stacked failures hide each other. The dead network was invisible until I fixed the ZFS import that was stopping the boot. When multiple things are broken, work from the bottom of the boot up — storage, then network, then cluster.
  • Two-node clusters are quorum-fragile by design. Losing one node loses the cluster. A QDevice on a third low-power box (even a Pi) gives you a tie-breaker vote and is on my list precisely because of this.

The two-minute version of prevention

If you take one thing from this: before you ever open the case, write a .link file pinning each NIC to its MAC, and confirm your zpool.cache is current. Do that and the failure in this post simply can’t happen to you.

I’ve broken plenty of things in this homelab over the years. This one taught me the most per minute of downtime — which is exactly why it’s the first thing I’ve written up here.