Quorum
When a peer is flagged as stale by all heartbeats, the daemon assumes the cluster is in a split-brain situation, as it cannot determine whether the stale peer has failed or is isolated.
OpenSVC minimizes the likelihood of a split-brain scenario by leveraging multiple independent heartbeats.
Enabling Quorum Enforcement
Users who prefer to have a cluster segment shut down in such situations can enable quorum by setting cluster.quorum
to true
:
om cluster config update --set cluster.quorum=true
By default, the system allows split nodes to take over services, which may result in services running on multiple isolated segments. To revert to the default behavior, use:
om cluster config update --unset cluster.quorum
To check the current quorum configuration:
om cluster config get --kw cluster.quorum
Quorum Behavior
If the cluster is configured for quorum and a split-brain situation occurs, a node will shut down if the number of reachable nodes (including itself) plus arbitrators is less than half of the total cluster and arbitrator nodes.
Frozen nodes do no evaluate quorum. They will not shut down on split-brain.
Frozen nodes still vote for peer nodes quorum evaluation.
Example Arbitrator Requirements
To survive a interconnect outage:
- In a 2-node cluster, a single node requires 1 arbitrator vote to survive the split.
- In a 3-node cluster, a single node requires 2 arbitrator votes.
- In a 4-node cluster, a single node requires 3 arbitrator votes.
- In a 5-node cluster, a single node requires 3 arbitrator votes.
To survive a interconnect outage, plus all peers outage in the same availability zone:
- In a 2-node cluster, a single node requires 1 arbitrator vote to survive the split.
- In a 3-node cluster, a single node requires 2 arbitrator votes.
- In a 4-node cluster, a single node requires 3 arbitrator votes.
- In a 5-node cluster, a single node requires 4 arbitrator votes.
Configuring Arbitrators
Any OpenSVC agent can act as an arbitrator, and multiple arbitrators can be configured. For example, to configure an arbitrator:
Use a https server as an arbitrator
[arbitrator#a1]
uri = https://dev2n1:1215/metrics
#insecure = true
Use a tcp server as an arbitrator
[arbitrator#a2]
uri = dev2n2:22
Testing Arbitrators
Alive test of an arbitrator:
$ om node ping --node a1
The om mon
output show all arbitrator alive state from the point of view of every node.
$ om mon
...
Arbitrators n1 n2
a1 warn | X X
a2 warn | X X
a3 | O O
...
Best Practices
- Configure
minus 1 arbitrators - Host all arbitrators on the same 3rd site
- Use one of the arbitrators as a relay for the relay heartbeat driver
- Disable quorum or freeze all nodes when doing a relayout of the cluster