Consistent Hashing

NamyaLG
3 min readDec 27, 2022

--

Problem Statement

In the case of horizontal scaling, there are multiple servers that are a replica of the intended functionality.
It may be needed to route a certain client’s requests to the same server, possibly in the case of a transaction, or if the same request needs to be serviced many times, it can be cached, thereby reducing the overhead. There can be various other scenarios where maintaining a persistent client-server connection is needed.

To achieve persistence, a hashing function is used. The client information is hashed and based on the hash value obtained, it is always sent to the same server. A hash function with minimal collisions is chosen to ensure equal distribution of load.

Consider a case where the load is distributed amongst 3 servers

Servers are identified by the modulo value (0, 1 and 2)

The incoming client information is “abc”, “mno” and “rst”, on using the inbuilt hash function in python, the hash value is found, further, the server index is obtained by the modulo 3 (since there are 3 servers)

Now on adding 2 more servers, the hashes obtained are as shown, this indicates a drastic change, i.e the servers that the requests need to be routed to changes for all clients.

Servers are identified by the modulo value (0, 1, 2, 3, and 4). In the new arrangement, two new servers are added

This complete redistribution is not a feasible option, as based on the requirements and the health of the servers, servers can be added and removed.

Consistent Hashing

Consistent hashing is a way to ensure that there is minimal disruption in the routing of requests to the servers when there is an addition or deletion of servers to the server pool. There is a redistribution of only a fraction of the keys, which reduces the overhead.

The server ids (maybe an IP address) are hashed and placed in a ring-like structure

The blue boxes represent server hashes, i.e there are 3 servers, server with hash 100, will service requests that have a hash value from 1 to 100, server 2 will service requests with a hash value of 101 to 200, and server 3 with hash values 201 to 0

Server IDs hashed, resulting in values 0, 100 and 200

An incoming client request will be hashed and sent to the server with the nearest hash value.
Examples :

Now in the case where 2 new servers are added, the distribution changes to

Server IDs hashed, resulting in values 0, 60, 120, 180 and 240

Examples :

Thus, here it can be seen, that though the problem of the remapping of servers cannot be removed completely, it can be reduced to a large extent.

References

https://www.youtube.com/watch?v=zaRkONvyGr8&list=PLMCXHnjXnTnvo6alSjVkgxV-VH6EPyvoX&index=4

https://www.youtube.com/watch?v=p6wwj0ozifw

https://www.youtube.com/watch?v=UF9Iqmg94tk

--

--

NamyaLG

Tech-enthusiast | Runner_for_life | #NoHumanIsLimited