Introducing Consistent Hashing

Teiva Harsanyi
ITNEXT
Published in
11 min readApr 30, 2020

--

With the rise of distributed architectures, consistent hashing became mainstream. But what is it exactly and how is it different from a standard hashing algorithm? What are the exact motivations behind it?

First, we will describe the main concepts. Then, we will dig into existing algorithms to understand the challenges associated with consistent hashing.

Main Concepts

Hashing

Hashing is the process to map data of arbitrary size to fixed-size values. Each existing algorithm has its own specification:

  • MD5 produces 128-bit hash values.
  • SHA-1 produces 160-bit hash values.
  • etc.

Hashing has many applications in computer science. For example, one of these applications is called checksum. To verify the integrity of a dataset it is possible to use a hashing algorithm. A server hashes a dataset and indicates the hash value to a client. Then, the client hashes its version of the dataset and compare the hash values. If they are equal, the integrity should be verified.

The “should” here is important. The worst case scenario is when a collision occur. A collision is when two distinct pieces of data have the same hash value. Let’s take a real-life example by defining the following hashing function: given a person it returns his birthday date (day & month of birth). The birthday paradox tells us that if we have only 23 people in a room, the probability of two persons having the same birthday (hence a collision) is more than 50%. Therefore, the birthday function is probably not a good hashing function.

A poor hashing function

As a fast introduction on hashing it is important to understand the main idea is about spreading values across a domain. For example:

  • MD5 spreads out values across a 128-bit space domain
  • A hashtable (or hashmap) backed by an array of 32 elements has an internal hashing function that spreads out values to any index (from 0 to 31).

Load Distribution

Load distribution can be defined as the process of spreading load across nodes. The term node here is interchangeable with server or instance. It’s a computation unit.

Load balancing is one example of load distribution. It is about distributing a set of tasks over a set of resources. For example, we use load balancing to distribute the API requests amont web server instances.

When it comes to data, we rather use the term sharding. A database shard is an horizontal partition of data in a database. A typical example is a database partitioned in three shards where each shard has a subset of the total data.

Load balancing and sharding share some common challenges. Spreading data evenly for example to guarantee that a node is not overloaded compared to the others. In some context, load balancing and sharding also need to keep associating tasks or data to the same node:

  • If we need to serialize, handle one by one, the operations for a given consumer, we must route the request to the same node.
  • If we need to distribute data, we must know which shard is the owner for a particular key.

Does it sound familiar? In these two examples, we spread values across a domain. Be it a task spread to server nodes or data spread to database shard, we find back the idea associated with hashing. This is the reason why hashing can be used in conjunction with load distribution. Let’s see how.

Mod-n Hashing

The principle of mod-n hashing is the following. Each key is hashed using a hashing function to transform an input into an integer. Then, we perform a modulo based on the number of nodes.

Let’s see a concrete example with 3 nodes. Here, we need to distribute the load among these nodes based on a key identifier. Each key is hashed then we perform a modulo operation:

The benefit of this approach is its statelessness. We don’t have to keep any state to remind us that foo was routed to node 2. Yet, we need to know how many nodes are configured to apply the modulo operation.

Then, how does the mechanism work in the event of scaling out or in (adding or removing nodes)? If we add another node, the modulo operation is now based on 4 instead of 3:

As we can see, the keys foo and baz are not associated anymore to the same node. With mod-n hashing, there is no guarantee to maintain any consistency in the key/node association. Is this a problem? It might.

What if we implement a datastore using sharding and based on the mod-n strategy to distribute data? If we scale the number of nodes, we need to perform a rebalancing operation. In the previous example, rebalancing means:

  • Moving foo from node 2 to node 0.
  • Moving baz from node 2 to node 3.

Now, what happens if we had stored millions or even billions of data and that we need to rebalance almost everything? As we can imagine, this would be a heavy process. Therefore, we must change our load distribution technique to make sure that upon rebalancing:

  • The distribution remain as uniform as possible based on the new number of nodes.
  • The number of keys we have to migrate should be limited. Ideally, it would be only 1/n percent of the keys where n is the number of nodes.

This is exactly the purpose of consistent hashing algorithms.

The term consistent might be somewhat confusing though. I met engineers who were assuming that such algorithms kept associating a given key to the very same node even in the face of scalability. This is not the case. It has to be consistent until a certain point to keep the distribution uniform.

Now, it’s time to dig into some solutions.

Rendezvous

Rendezvous was the first algorithm ever proposed to solve our problem. Though the original study, published in 1996, did not mention the term consistent hashing, it does provide a solution to the challenges we described. Let’s see one possible implementation in Go:

How is it working? We traverse each node and we compute its hash value. The hash value is returned by a hash function that produces an integer based on a key (our input) and a node identifier (the easiest approach is to hash the concatenation of the two strings). Then, we return the node with the highest hash value. This is the reason why the algorithm is also known as highest random weight hashing.

The main downside of rendezvous is its O(n) time complexity where n is the number of nodes. It is very efficient if we need have a limited number of nodes. Yet, if we start maintaining thousand of nodes, it might start causing performance problems.

Ring Consistent Hash

The next algorithm was released in 1997 by Karger et al. in this paper. This study mentioned for the first time the term consistent hashing.

It is based on a ring (an end-to-end connected array). Though it’s the most popular consistent hashing algorithm (or at least the most known), the principle is not always well understood. Let’s dive into it.

The first operation is to create the ring. A ring has a fixed-length. In our example, we partition it into 12 parts:

Then, we position our nodes. In our example we will define N0, N1 and N2.

The nodes are distributed evenly for the time being. We will come back to this point later on.

Then, it’s time to see about how to represent the keys. First, we need a function f that returns a ring index (from 0 to 11) based on a key. We can use mod-n hashing for that. As the ring length is constant, it will not cause us any problem.

In our example we will define 3 keys: a, b and c. We apply f on each one. Let’s assume we have the following results:

  • f(a) = 1
  • f(a) = 5
  • f(a) = 7

Therefore, we can position the keys on the ring this way:

How do we associate a given key to a node? The main logic is to move forward. From a given key, we return the first node that we find next while progressing forward:

In this example, we associate a to N1, b and c to N2.

Now, let’s see how rebalancing is managed. We define another node N3. Where should we position it? There is no space for the overall distribution to be uniform anymore. Should we reorganize the nodes? No, otherwise we wouldn’t be consistent anymore, isn’t it? To position a node, we reuse the same hashing function f we introduced. Instead of being called with a key, it can be called with a node identifier. So the position of the new node is decided randomly.

One question arises then: what should we do with a as the next node is not N1 anymore:

The solution is the following: we must change its association and get a be associated to N3:

As we discussed previously, an ideal algorithm should rebalance 1/n percent of the keys on average. In our example, as we are adding a fourth node, we should have 25% of the possible keys reassociated to N3. Is this the case?

  • N0 from indices 8 to 12: 33.3% of the total keys
  • N1 from indices 2 to 4: 16.6% of the total keys
  • N2 from indices 4 to 8: 33.3% of the total keys
  • N3 from indices 0 to 2: 16.6% of the total keys

The distribution it is not uniform. How can we improve that? The solution is to use virtual nodes.

Let’s say that instead of positioning a single spot per node, we can position three. Also, we need to define three different hashing functions. Each node is hashed three times so that we get three different indices:

We can apply the same algorithm by moving forward. Yet, a key would be associated to a node regardless of the virtual node it met.

In this very example, the distribution would now be the following:

  • N0: 33.3%
  • N1: 25%
  • N2: 41.6%

The more virtual node we define per node, the more the distribution should be uniform. On average, with 100 virtual nodes per server, the standard distribution is about 10%. With 1000, it is about 3.2%.

This mechanism is also useful if we have nodes with different sizing. For example, if a node is configured to theoretically handle two times the load than the others, we could spin up twice as many virtual nodes.

Yet, the main downside with the virtual nodes is the memory footprint. If we have to handle thousand of servers, it would require megabytes of memory.

Before to move on, it’s interesting to note that sometimes an algorithm can be substantially improved by changing a small part. This is the case for example with an algorithm published by Google in 2017 called consistent hashing with bounded loads. This version is based on the same ring idea that we described. Yet, their approach is to perform a small rebalancing upon any update (a new key or node added/deleted). This version outperforms the original one in terms of standard deviation with the tradeoff of a worst time complexity.

Jump Consistent Hash

In 2007, two engineers from Google published jump consistent hash. Compared to the ring-based algorithm, this implementation “requires no storage, is faster and does a better job of evenly dividing the key space among the buckets and of evenly dividing the workload when the number of buckets changes”. Said differently, it improves the distribution of the workfload among the nodes (a bucket is the same concept that a node) without any memory overhead.

Here is the algorithm in C++ (7 lines of code 🤯):

In ring consistent hash, with 1000 virtual nodes the standard deviation was about 3.2%. In jump consistent hash, we don’t need the concept of virtual nodes anymore. Yet, the standard deviation remains less than 0.0000001%.

There is one limitation though. The nodes must be numbered sequentially. If we have a list of servers foo, bar and baz, we cannot remove bar for example. Yet, if we configure a data store, we can apply the algorithm to get the shard index based on the total number of shards. Therefore, jump consistent hash can be useful in the context of sharding but not load balancing.

What is the Perfect Consistent Hashing Algorithm?

As we have now some experience with consistent hashing, let’s take a step back and see what would be the perfect algorithm:

  • Only 1/n percent of the keys would be remapped on average where n is the number of nodes.
  • A O(n) space complexity where n is the number of nodes.
  • A O(1) time complexity per insertion/removal of a node and per key lookup.
  • A minimal standard deviation to make sure that a node is not overloaded compared to another one.
  • It would allow associating a weight to a node to cope with different node sizing.
  • It would allow arbitrary nodes name (not numbered sequentially) to support both load balancing and sharding.

Unfortunately, this algorithm does not exist. Based on what we saw:

  • Rendezvous has a linear time complexity per lookup.
  • Ring consistent hash has a poor minimal standard deviation without the concept of virtual nodes. With virtual nodes, is space complexity is O(n*v) with n the number of nodes and v the number of virtual nodes per node.
  • Jump consistent hash does not have a constant time complexity and it does not support arbitrary nodes name.

The topic is still opened and there are recent studies that are worth the look. For example, the multi-probe consistent hash released in 2005. It supports O(1) space complexity. Yet, to achieve a standard deviation of ε, it requires O(1/ε) time per lookup. For example, if we want to achieve a standard deviation of less than 0.5%, it requires hashing the key about 20 times. Therefore, we can get a minimal standard deviation but in the effort of a higher lookup time complexity.

As we said in the introduction, consistent hashing algorithms became mainstream. It is now used in countless systems such as MongoDB, Cassandra, Riak, Akka, etc. be it in the context of balancing load or distributing data. Yet, as often in computer science, every solution has tradeoffs.

Speaking about tradeoffs, if you need a follow-up, you may want to take a look at the excellent post written by Damian Gryski:

Some other resources that are worth the look:

--

--

Software Engineer @Google | 📖 100 Go Mistakes author | 改善