Let’s implement a basic leader election algorithm using Go with RPC 🚀

Published in

ITNEXT

5 min readAug 7, 2023

Motivation

When I read a book called Database Internals, I read a chapter about leader election. During the reading, to learn more, I decided to implement the most straightforward algorithm, called the bully algorithm.

Working Demo

https://asciinema.org/a/600162

Source Code

https://github.com/Abdulsametileri/leader-election-bully-algorithm

Introduction

In distributed systems, a leader is a concept used to manage coordination and communication among multiple nodes or servers. Some essential functions of a leader in distributed systems include:

Consensus: The leader facilitates reaching an agreement (consensus) among the participating nodes about the state of the system or the data it manages.
Coordination: The leader coordinates the actions of other nodes, ensuring that they all work together in harmony to achieve the system’s goals.
Data Management: The leader may be responsible for managing the distribution, replication, and consistency of data across the nodes in the system.
Failover and High Availability: In the case of node failures, the leader may be responsible for detecting failures and initiating failover processes to ensure the system remains operational and highly available.

How does it work?

In the bully algorithm, the fundamental idea is rank. It assumes that every node has a rank within the cluster, and the leader must be the highest. So it uses the node’s rank value during the election.

There are two situations for election.

The system is newly initialized, so there is no leader
One of the nodes notices that the leader is down.

The election is implemented as follows:

The node sends ELECTION messages to the other nodes with higher ranks than their own.
The node waits for ALIVE responses.

If no higher-ranked node responds, it makes itself a leader.
Otherwise, it is notified of the new leader who has the highest rank.

Let’s illustrate these scenarios:

We assumed that the highest rank order, like: node-04 > node-03 > node-02 > node-01

If the system is newly initialized

Because node-04 is the highest-ranked node in this cluster, it didn’t get any ALIVE messages, so node-04 became a leader and broadcasted the ELECTED message to notify other nodes about the election results.

When the leader is down

The other nodes periodically send PING messages to notice the leader is down and wait for the leader’s PONG responses.

If the leader is down and the first node doesn’t get a PONG message, that node starts the election process again.

Problems in this algorithm

One of the apparent problems with this algorithm is that it violates the safety guarantee (that, at most, one leader can be elected at a time) in the presence of network partitions. It is easy to end up where nodes split into two or more independently functioning subsets, and each subset elects its leader. This situation is called split brain. [1]
Another problem with this algorithm is a strong preference toward high-ranked nodes, which becomes an issue if they are unstable and can lead to a permanent state of reelection. An unstable high-ranked node proposes itself as a leader, fails shortly after that, wins reelection, fails again, and the whole process repeats. [2] But this is not specific to the bully algorithm.

Implementation

I used docker compose to simulate a couple of nodes in the cluster. All nodes are based on this docker file.

To implement the algorithm, every node must be aware of each other. This requires a service discovery implementation. I didn’t want to implement it, so at the beginning, each node was aware of the network information of other nodes. In the past, I implemented a basic service discovery mechanism and wrote an article about it; if you are interested in the concept, you can check it out :)

I hardcoded all nodes as follows

// nodeAddressByID: It includes nodes currently in cluster
var nodeAddressByID = map[string]string{
 "node-01": "node-01:6001",
 "node-02": "node-02:6002",
 "node-03": "node-03:6003",
 "node-04": "node-04:6004",
}

at the startup based on this map; every node tries to send a PING message to the others.

func (node *Node) ConnectToPeers() {
 for peerID, peerAddr := range nodeAddressByID {
  if node.IsItself(peerID) {
   continue
  }

  rpcClient := node.connect(peerAddr)
  pingMessage := Message{FromPeerID: node.ID, Type: PING}
  reply, _ := node.CommunicateWithPeer(rpcClient, pingMessage)

  if reply.IsPongMessage() {
   node.Peers.Add(peerID, rpcClient)
  }
 }
}

After awareness of nodes, the election process is started. In this step, the node sends an ELECTION message to the other nodes with a higher rank than its own. If no higher than a node is available, it elects itself a leader and broadcasts an ELECTED message.

func (node *Node) Elect() {
 isHighestRankedNodeAvailable := false

 peers := node.Peers.ToList()
 for i := range peers {
  peer := peers[i]

  if node.IsRankHigherThan(peer.ID) {
   continue
  }

  electionMessage := Message{FromPeerID: node.ID, Type: ELECTION}
  reply, _ := node.CommunicateWithPeer(peer.RPCClient, electionMessage)

  if reply.IsAliveMessage() {
   isHighestRankedNodeAvailable = true
  }
 }

 if !isHighestRankedNodeAvailable {
  leaderID := node.ID
  electedMessage := Message{FromPeerID: leaderID, Type: ELECTED}
  node.BroadcastMessage(electedMessage)
 }
}

As soon as the leader is elected, the LeaderElected event is fired via the event bus, and because of the subscribe step, the nodes start to send PING messages continuously.

func (node *Node) PingLeaderContinuously(_ string, payload any) {
 leaderID := payload.(string)

ping:
 pingMessage := Message{FromPeerID: node.ID, Type: PING}
 reply, err := node.CommunicateWithPeer(leader.RPCClient, pingMessage)
 if err != nil {
  log.Info().Msgf("Leader is down, new election about to start!")
  node.Peers.Delete(leaderID)
  node.Elect()
  return
 }

 if reply.IsPongMessage() {
  time.Sleep(3 * time.Second)
  goto ping
 }
}

During this communication, if the leader is down, its connection is dropped and returns an error so that a new election process will start. You can check the live demo.