KVM High Availability Clustering Using DRBD and OpenSVC on Ubuntu 24.04
Introduction
This document provides a step-by-step guide for deploying a high-availability (HA) KVM cluster using DRBD 9 from Linbit and OpenSVC on Ubuntu 24.04. DRBD provides network-based block replication, effectively acting as shared storage comparable to RAID1 over TCP/IP.
This setup enables:
- Automatic failover of KVM virtual machines when a node becomes unavailable
- Live migration of virtual machines with zero downtime
Prerequisites
This guide uses the following environment:
- Two Ubuntu 24.04 servers:
n1andn2 - Same network segment:
10.30.0.0/24 - IP addressing:
n1→10.30.0.11n2→10.30.0.12kvm1→10.30.0.20- Gateway →
10.30.0.1
- CPU virtualization extensions enabled (
VT-xfor Intel,AMD-Vfor AMD) - Sudo or root access
- Sufficient disk space available on both nodes
Installation
DRBD
DRBD software is a product from Linbit company. It is shipped by default on Ubuntu distributions as version 8.x. The current guide requires DRBD version 9.x minimum.
Linbit exposes a drbd public ppa repository, where DRBD release candidates are regularly pushed. We will use this repository to install DRBD version 9.
Add Linbit DRBD9 PPA
Let’s add Linbit DRBD9 public ppa to our distributions, and install DRBD.
sudo add-apt-repository ppa:linbit/linbit-drbd9-stack
sudo apt update
sudo apt -y install drbd-dkms drbd-utils
modinfo drbd | grep ^version
⚠️ Warning: The Linbit public PPA is not recommended for production. For production-grade repositories and support, contact Linbit directly.
Verify DRBD version
opensvc@n1:~$ modinfo drbd | grep ^version
version: 9.2.15
Ensure DRBD systemd unit is disabled
DRBD systemd unit must remain disabled on cluster nodes.
⚠️ Warning: OpenSVC must be the only component responsible of DRBD resources management.
opensvc@n1:~$ systemctl status drbd
○ drbd.service - DRBD -- please disable. Instead, use the per resource drbd@.target template unit.
Loaded: loaded (/usr/lib/systemd/system/drbd.service; disabled; preset: disabled)
Active: inactive (dead)
Docs: man:drbd.service(7)
System Upgrade
Ensure your operating system is up to date
sudo apt update
sudo apt -y dist-upgrade
sudo reboot
Time Synchronization (NTP)
Ensure all nodes have NTP enabled. Ubuntu defaults are usually sufficient.
opensvc@n1:~$ sudo timedatectl status
Local time: Fri 2025-10-31 11:52:34 CET
Universal time: Fri 2025-10-31 10:52:34 UTC
RTC time: Fri 2025-10-31 10:52:34
Time zone: Europe/Paris (CET, +0100)
System clock synchronized: yes
NTP service: active
RTC in local TZ: no
Check server name resolution
It is very important that both nodes are correctly configured from a name service resolution point of view
Check that:
n1can resolve n2 to ip addrn2can resolve n1 to ip addr- both nodes can resolve
kvm1to ip addr
KVM Installation
Verify CPU Virtualization Support
Check that your processors support hardware virtualisation extensions
You must see either vmx or svm from the egrep --color "(vmx|svm)" /proc/cpuinfo command
Install Hypervisor Components
You need to install the following packages to run KVM virtual machines
sudo apt install -y qemu-kvm virtinst virt-manager libguestfs-tools bridge-utils
Once done, ensure that libvirtd systemd unit is enabled and started
sudo systemctl enable --now libvirtd.service
Local Storage Setup
DRBD requires a block device. For this guide, we use an LVM volume group on /dev/vdb.
opensvc@n1:~$ sudo pvcreate -f /dev/vdb
opensvc@n1:~$ sudo vgcreate data /dev/vdb
opensvc@n1:~$ sudo vgs
VG #PV #LV #SN Attr VSize VFree
data 1 0 0 wz--n- <5.00g <5.00g
root 1 1 0 wz--n- <26.95g 0
Setup firewall
If your network enforces filtering, ensure the following flows are allowed:
n1 <=> 22/tcp <=> n2ssh communication between nodesn1 <=> 1215/tcp <=> n2api communication between nodesn1 <=> 10000/tcp <=> n2unicast hearbeat communication between nodesn1 <=> 7289/tcp <=> n2DRBD communication between nodes for first DRBD resourcen1, n2 <=> icmp <=> 10.30.0.0/24OpenSVC has security checks, one of them uses icmp ping
⚠️ Warning: Each DRBD resource need a dedicated TCP port to communicate with peer node. If you plan to deploy 10 different DRBD resources, then you have to open ports starting from 7289 to 7289+9=7298
Network Bridge Setup
As we prepare a KVM virtual machine deployment, it will require a direct network connection to our subnet 10.30.0.0/24.
A network bridge has to be configured on both cluster nodes.
The netplan configuration is described below
opensvc@n1:~$ sudo cat /etc/netplan/br-prd.yaml
network:
version: 2
renderer: networkd
ethernets:
enp1s0:
match:
macaddress: "22:23:24:1e:00:11"
set-name: enp1s0
dhcp4: no
dhcp6: no
bridges:
br-prd:
interfaces:
- enp1s0
parameters:
stp: false
forward-delay: 0
accept-ra: no
addresses:
- 10.30.0.11/24
nameservers:
search:
- demo.com
addresses:
- 8.8.8.8
- 8.8.4.4
routes:
- to: 0.0.0.0/0
via: 10.30.0.1
metric: 100
Adjust the bridge configuration to your environment, and then apply it
sudo netplan generate --debug
sudo netplan apply --debug
OpenSVC
Installation
The OpenSVC server installation is described here
Import opensvc gpg signing keys
curl -s -o- https://packages.opensvc.com/gpg.public.key.asc | \
sudo gpg --dearmor --output /etc/apt/trusted.gpg.d/opensvc-package-pub.gpg --yes
Add the opensvc repository to apt sources
cat - <<EOF | sudo tee /etc/apt/sources.list.d/opensvc.list
deb https://packages.opensvc.com/apt/ubuntu uat-opensvc-v3-noble main
deb-src https://packages.opensvc.com/apt/ubuntu uat-opensvc-v3-noble main
EOF
Install the opensvc server
sudo apt update
sudo apt -y install opensvc-server
🛈 Info: Although not mandatory, the package
opensvc-clientshould also be installed. It installs theoxcommand, which is a fancy text user interface tool to manage OpenSVC clusters
Once the installation is done, ensure systemd unit opensvc-server is enabled and started
sudo systemctl enable --now opensvc-server.service
Finally, check that OpenSVC is running fine on both nodes
opensvc@n1:~$ sudo om mon
Threads n1
daemon | O
collector | /
hb |
hb#1.rx running unicast | /
hb#1.tx running unicast | /
listener | 1215
scheduler running |
Nodes n1
score | 97
load15m | 0.0
mem | 20%2.83g<98%
swap | 0%3.56g<90%
state |
Objects matching * n1
Set a custom cluster name (optional)
It is a best practise to set a custom cluster name, especially if you plan to deploy multiple different clusters
On node n1:
opensvc@n1:~$ sudo om cluster config update --set cluster.name=drbd4kvm
committed
opensvc@n1:~$ sudo om cluster config get --kw cluster.name
drbd4kvm
Join cluster nodes
We join both nodes from our laptop:
On laptop, ask for a join token, and store it into a variable:
token=$(ssh opensvc@n1 sudo om daemon auth --role join)
and then ask n2 to join n1 using token from variable:
ssh opensvc@$n2 "sudo om cluster join --node $n1 --token $token"
The log should looks like below
john@laptop:~$ token=$(ssh opensvc@n1 sudo om daemon auth --role join)
john@laptop:~$ ssh opensvc@n2 "sudo om cluster join --node n1 --token $token"
Fetch cluster config from n1
Add localhost node to the remote cluster configuration on n1
Daemon join
Cluster nodes updated
Fetch cluster from n1
Fetch system/sec/ca from n1
Fetch system/sec/cert from n1
Fetch system/sec/hb from n1
Draining node
Stop daemon
Dump all configs
Save configs to /var/lib/opensvc/backup/.pre-daemon-join-2025-11-04T10:40:51.json
Install fetched config system/sec/cert
Install fetched config system/sec/hb
Install fetched config cluster
Install fetched config system/sec/ca
Start daemon
Joined
And the om mon command executed on one of the cluster nodes should looks like:
root@n1:~# om mon
Daemon n1 n2
uptime | 3m23 3m24
state | dns dns
hb queue | 0 0
hb rx | O O
hb tx | O O
Nodes n1 n2
uptime | - -
score | 95 95
load15m | 0.2 0.2
mem | 30%2.83g<98% 27%2.83g<98%
swap | 1%3.56g<90% 1%3.56g<90%
state |
Objects ~ * n1 n2
🛈 Info: We can see that a unicast heartbeat
hb rxandhb txhas been autoconfigured. By default, it exchanges information on TCP port 10000.
Hearbeats details can be seen using om daemon hb status command.
root@n1:~# om daemon hb status
RUNNING BEATING ID NODE PEER TYPE DESC CHANGED_AT
O O hb#1.rx n1 n2 unicast :10000 ← n2 2025-11-28T09:35:08.603796279+01:00
O O hb#1.tx n1 n2 unicast → n2:10000 2025-11-28T09:35:08.444365461+01:00
O O hb#1.rx n2 n1 unicast :10000 ← n1 2025-11-28T09:35:11.706616707+01:00
O O hb#1.tx n2 n1 unicast → n1:10000 2025-11-28T09:35:11.463338726+01:00
⚠️ Warning: The default heartbeat configuration is ++minimal++. For a real production deployment, it is highly recommended to add other heartbeats kinds (relay, disk, multicast, unicast), and enable quorum arbitration.
Configure SSH trust
Enable automatic SSH trust between nodes (used for internal safety mechanisms).
This configuration is automated using the command below.
om cluster ssh trust
The configuration creates an opensvc dedicated ssh key pair on each node, and trust the public keys on all cluster nodes
root@n1:~# ls -l ~/.ssh/opensvc*
-rw------- 1 root root 452 Nov 4 11:06 /root/.ssh/opensvc
-rw-r--r-- 1 root root 133 Nov 4 11:06 /root/.ssh/opensvc.pub
root@n1:~# cat ~/.ssh/authorized_keys
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIMgCrZCehGhMPJK5j1ImjfzEKCaoSTiUBweLHteT8Hxd opensvc@n1 sshkey=opensvc 2025-11-04T11:06:31+01:00
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIKRBR+niCfqJPDINuIeZjQo17FhYK0FpQMutb3474p48 opensvc@n2 sshkey=opensvc 2025-11-04T11:06:31+01:00
Test:
root@n1:~# ssh -i /root/.ssh/opensvc n2 hostname
Configure DRBD storage pool
As this tutorial demonstrates KVM live migration, we have to inject a custom DRBD configuration to be used by the OpenSVC pool.
First we create this configuration:
cat >| live-migration <<EOF
resource {{.Name}} {
{{range \$node := .Nodes}}
on {{\$node.Name}} {
device {{\$node.Device}};
disk {{\$node.Disk}};
meta-disk internal;
address {{\$node.Addr}};
node-id {{\$node.NodeId}};
}
{{end}}
connection-mesh {
hosts{{range \$node := .Nodes}} {{\$node.Name}}{{end}};
}
net {
allow-two-primaries yes;
}
}
EOF
To store this configuration in the OpenSVC cluster, we have to create a special configmap system/cfg/drbd, which is a cluster-wide replicated object, and then load the drbd config as a configmap key.
root@n1:~# om system/cfg/drbd create
root@n1:~# om system/cfg/drbd key add --name live-migration --from live-migration
16:19:49.969 INF system/cfg/drbd: set key live-migration
root@n1:~# om system/cfg/drbd key list
live-migration
root@n1:~# om system/cfg/drbd key decode --name live-migration
resource {{.Name}} {
{{range $node := .Nodes}}
on {{$node.Name}} {
device {{$node.Device}};
disk {{$node.Disk}};
meta-disk internal;
address {{$node.Addr}};
node-id {{$node.NodeId}};
}
{{end}}
connection-mesh {
hosts{{range $node := .Nodes}} {{$node.Name}}{{end}};
}
net {
allow-two-primaries yes;
}
}
To make smooth DRBD configuration, we declare a DRBD storage pool in the cluster, and specify that the pool must use a special drbd configuration template named live-migration.
root@n1:~# om cluster config update --set pool#drbdvm.type=drbd --set pool#drbdvm.vg=data --set pool#drbdvm.template=live-migration
committed
root@n1:~# om cluster config show --section pool#drbdvm
[pool#drbdvm]
type = drbd
vg = data
template = live-migration
Once done, we can query the cluster about free space in the pool
root@n1:~# om pool list --name drbdvm
NAME TYPE CAPABILITIES HEAD VOLUME_COUNT BIN_SIZE BIN_USED BIN_FREE
drbdvm drbd rox,rwx,roo,rwo,snap,blk,shared data 0 10g 0 10g
By default the pool is cluster wide, so the reported space is aggregated accross nodes. To see local space, we have to specify the target node
root@n1:~# om pool list --name drbdvm --node n1
NODE NAME TYPE CAPABILITIES HEAD VOLUME_COUNT BIN_SIZE BIN_USED BIN_FREE
n1 drbdvm drbd rox,rwx,roo,rwo,snap,blk,shared data 0 5g 0 5g
Configure OpenSVC service
We first deal with the DRBD storage part, and then with VM provisioning
Create service: storage part
First we create a service named kvm1 in the demo namespace, with automated orchestration, and accross all cluster nodes
root@n1:~# om demo/svc/kvm1 create --kw nodes={clusternodes} --kw orchestrate=ha
Then we add storage resources to the service
om demo/svc/kvm1 config update --set volume#data.name={name} --set volume#data.shared=true --set volume#data.size=4G --set volume#data.format=false --set volume#data.pool=drbdvm
We have the following config at the moment
root@n1:~# om demo/svc/kvm1 config show
[DEFAULT]
nodes = {clusternodes}
orchestrate = ha
id = a6867a1e-e8b3-46eb-919f-78d4c60ca9c9
[volume#data]
name = {name}
shared = true
size = 4G
format = false
pool = drbdvm
🛈 Info: The volume#data config snippet will request a storage device to the cluster, which will provision it into the drbd pool because of the
shared=trueparameter. In case you have many pools in the cluster config, it is possible to add thepool=drbdkvmparameter to force the target pool used for volume creation.
Now provision the storage part
root@n1:~# om demo/svc/kvm1 provision --wait
OBJECT ORCHESTRATION_ID ERROR
demo/svc/kvm1 e332c71f-1f6b-4aed-b7fe-345db3fb00a9 -
During the DRBD device initial synchronisation, we see the transitional states below:
root@n1:~# drbdadm status
kvm1.demo.vol.drbd4kvm role:Primary
disk:UpToDate open:no
n2 role:Secondary
replication:SyncSource peer-disk:Inconsistent done:30.84
root@n1:~# om demo/svc/kvm1 instance status -r
demo/svc/kvm1 up warn
└ instances
├ n2 down idle
└ n1 up warn idle started
└ resources
└ volume#data ........ up kvm1
warn: Volume demo/vol/kvm1 has warnings
Once the synchronisation is finished, the states looks like:
root@n1:~# drbdadm status
kvm1.demo.vol.drbd4kvm role:Primary
disk:UpToDate open:no
n2 role:Secondary
peer-disk:UpToDate
root@n1:~# om demo/svc/kvm1 instance status -r
demo/svc/kvm1 up
└ instances
├ n2 down idle
└ n1 up idle started
└ resources
└ volume#data ........ up kvm1
The DRBD volume creation is fully automated. The DRBD configuration is located in /etc/drbd.d/kvm1.demo.vol.drbd4kvm.res
root@n1:~# cat /etc/drbd.d/kvm1.demo.vol.drbd4kvm.res
resource kvm1.demo.vol.drbd4kvm {
on n1 {
device /dev/drbd0;
disk /dev/data/kvm1.demo.vol.drbd4kvm;
meta-disk internal;
address ipv4 10.30.0.11:7289;
node-id 0;
}
on n2 {
device /dev/drbd0;
disk /dev/data/kvm1.demo.vol.drbd4kvm;
meta-disk internal;
address ipv4 10.30.0.12:7289;
node-id 1;
}
connection-mesh {
hosts n1 n2;
}
}
The command below displays the devices involved in the service. We can see that the resulting exposed DRBD device is /dev/drbd0
root@n1:~# om demo/svc/kvm1 instance device list
OBJECT RESOURCE DRIVER_GROUP DRIVER_NAME ROLE DEVICE
demo/svc/kvm1 volume#data volume exposed /dev/drbd0
demo/svc/kvm1 volume#data volume base /dev/data/kvm1.demo.vol.drbd4kvm
Create service: vm part
Now that the DRBD block device is ready, we can focus on VM deployment. 4 steps are needed:
- Download a qcow2 image and customize it
- Dump image content into DRBD device
- Create the kvm guest
- Declare the kvm resource in OpenSVC service
Download a qcow2 image and customize it
We start by downloading a VM image, Ubuntu 25.10 (Questing Quokka) for the demo. We store it in /var/lib/libvirt/images/questing.kvm1.img.
curl -L -o /var/lib/libvirt/images/questing.kvm1.img https://cloud-images.ubuntu.com/questing/current/questing-server-cloudimg-amd64.img
For customization purposes, the virt_customize command brings the following features
- set root password to
drbdrocks - set hostname to
kvm1 - copy file
enp1s0.yamlinto image file at/etc/netplan/enp1s0.yaml
First create the file enp1s0.yaml to configure network at boot
cat >| enp1s0.yaml <<EOF
network:
version: 2
renderer: networkd
ethernets:
enp1s0:
dhcp4: no
dhcp6: no
addresses:
- 10.30.0.20/24
routes:
- to: 0.0.0.0/0
via: 10.30.0.1
metric: 100
EOF
Then actually customize the image file
root@n1:~# virt-customize -a /var/lib/libvirt/images/questing.kvm1.img \
--root-password password:drbdrocks \
--hostname kvm1 \
--copy-in enp1s0.yaml:/etc/netplan
🛈 Info: Try with
--no-networkif virt-customize fails.
Dump image content into DRBD device
Let’s convert the qcow2 image to raw format, and write it into the DRBD device
IMAGE_LOC=/var/lib/libvirt/images/questing.kvm1.img
DRBD_DEV=/dev/drbd0
qemu-img convert -f qcow2 -O raw $IMAGE_LOC $DRBD_DEV
Create the KVM guest
Initiate the virt-install command to instantiate and boot the vm
OS_VARIANT=ubuntu25.10
VM_BRIDGE=br-prd
virt-install \
--name kvm1 \
--ram 2048 \
--vcpus 2 \
--cpu=host \
--disk path=$DRBD_DEV \
--os-variant $OS_VARIANT \
--network bridge:$VM_BRIDGE,model=virtio \
--graphics none \
--audio none \
--console pty,target_type=serial \
--boot hd \
--noautoconsole \
--import
🛈 Info: The OS_VARIANT value can be identified using command
virt-install --osinfo list. The VM_BRIDGE value corresponds to the bridge that we created earlier in the demo.
The virtual machine boots in the background. The console is available using command virsh console kvm1, and can be exited with control+]. If you have an X server available, it is possible to use the graphical interface using the command virt-manager.
After a few seconds, the virtual machine is available
root@n1:~# ping -c3 kvm1
PING kvm1 (10.30.0.20) 56(84) bytes of data.
64 bytes from kvm1 (10.30.0.20): icmp_seq=1 ttl=64 time=0.255 ms
64 bytes from kvm1 (10.30.0.20): icmp_seq=2 ttl=64 time=0.794 ms
64 bytes from kvm1 (10.30.0.20): icmp_seq=3 ttl=64 time=0.545 ms
--- kvm1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2021ms
rtt min/avg/max/mdev = 0.255/0.531/0.794/0.220 ms
Declare the kvm resource in OpenSVC service
Once the virtual machine is operational, we have to add it to the OpenSVC service configuration, as a new resource:
root@n1:~# om demo/svc/kvm1 config update --set container#1.type=kvm --set container#1.name=kvm1
committed
root@n1:~# om demo/svc/kvm1 instance status -r
demo/svc/kvm1 up
└ instances
├ n2 down mix-provisioned idle
└ n1 up idle started
└ resources
├ volume#data ........ up kvm1
└ container#1 ........ up kvm kvm1
⚠️ Warning: As the VM provisioning was done manually, the
mix-provisionedstate appear on then2instance. We have to informn2OpenSVC agent that the virtual machine resource is actually provisioned.
root@n1:~# om demo/svc/kvm1 instance provision --state-only --node n2
OBJECT NODE SID
demo/svc/kvm1 n2 d24fb390-8d62-4fa8-80cb-77da3980778c
root@n1:~# om demo/svc/kvm1 print status -r
demo/svc/kvm1 up
└ instances
├ n2 down idle
└ n1 up idle started
└ resources
├ volume#data ........ up kvm1
└ container#1 ........ up kvm kvm1
The service state is now normal on both nodes.
Tests
Test OpenSVC service
It is now possible to test service failover between nodes. Two methods are possible:
- VM switch with downtime
- VM switch with kvm live migration (no downtime)
VM switch with downtime
The switch action can be submitted with the command om demo/svc/kvm1 switch
The following steps are orchestrated:
- n1: stop the VM
- n1: bring down drbd device (secondary)
- n2: bring up drbd device (primary)
- n2: start the vm
root@n1:~# om demo/svc/kvm1 instance ls
OBJECT NODE AVAIL
demo/svc/kvm1 n1 up
demo/svc/kvm1 n2 down
root@n1:~# om demo/svc/kvm1 switch --wait
OBJECT ORCHESTRATION_ID ERROR
demo/svc/kvm1 983c32b5-96c4-4930-8eb2-670c5fae0cc8 -
root@n1:~# om demo/svc/kvm1 instance ls
OBJECT NODE AVAIL
demo/svc/kvm1 n1 down
demo/svc/kvm1 n2 up
🛈 Info: This process can take several seconds or even minutes depending on kvm domain use.
VM switch with kvm live migration (no downtime)
The switch action can be submitted with the command om demo/svc/kvm1 switch --live
The following steps are orchestrated:
- n1: enable drbd dual primary
- n1: launch virsh migrate –live –persistent
- n1: disable drbd dual primary
root@n1:~# om demo/svc/kvm1 instance ls
OBJECT NODE AVAIL
demo/svc/kvm1 n1 up
demo/svc/kvm1 n2 down
root@n1:~# om demo/svc/kvm1 switch --live --wait
OBJECT ORCHESTRATION_ID ERROR
demo/svc/kvm1 c898ddc6-f827-48b5-8686-21ddf500e56e -
root@n1:~# om demo/svc/kvm1 instance ls
OBJECT NODE AVAIL
demo/svc/kvm1 n1 down
demo/svc/kvm1 n2 up
🛈 Info: This process is transparent, without any network disruption.