2021-12-24
This time is a little different. Here's a little demo of consistent hashing built with Vanilla Js.
Consistent Hashing is one technique that is used when sharding elements. One of its benefits is that it reduces the number of elements moved when resharding. An example use case for sharding would be in a database storing a number of users.
In the below image, we choose to shard the users by their first name. We do this by running their first name through SHA-256 and then taking the modulus 36000 and multiplying by 100 to map it to the unit circle in degrees. Now, imagine we have 3 shards. We can do a similar hashing with the shard number "shard_number:0" and map those to degrees as well. Then, we determine which shard a user belongs to by looking counter clockwise for the closest shard to it. For example, "Hugh" maps to 13.76° and we see that it fits in Shard 0.
Go to the next page to see what happens when we add a new shard
Now, we've added a new shard, Shard 3.
We see that it covers the angles 130.56°-206.4° and that "Mark" and "Rowan" belongs in that range. So, in adding a new shard, we merely have to move "Mark" and "Rowan" from Shard 2 to Shard 3. This minimizes the number of elements moved.
With a naive hashing algorithm, every time the number of shards changes, the majority of elements have to be moved. So, one benefit is that consistent hashing does well with resharding.
One downside with this current setup is that the number of elements per shard is imbalanced. Shard 0 contains 4 elements, but Shard 2 only has 1 element. Go to the next page for how this can be resolved.
On the previous page, we saw that often times the number of elements per shard are not very evenly distributed. In our database example, that could lead to undesirable characteristics such as one shard being overloaded with queries.
One way to solve this issue is using so-called "virtual nodes". Instead of each shard corresponding to a single angle, we can generate a large number of nodes for each shard. For example, if we have 100 virtual nodes, we can SHA-256 has "shard_number:99" to get the location of the 100th virtual node and "shard_number:49" to get the location of the 50th virtual node. Then each shard will be much more evenly spread out throughout the unit circle. This also means that each shard will take up a much more even amount of the unit circle.
Now that we have 100 virtual nodes, we have a pretty even distribution of elements per shard and so the likelihood of a hot spot is significantly lower.
The next page lets you play with the settings to explore for yourself how consistent sharding works.
Feel free to play around with the various parameters. For example, look at the distribution of users when there are 2 shards, 2 virtual nodes, and 100 users. It's not very evenly distributed at all! Increasing the number of virtual nodes helps balance the distribution significantly.
Warning: large numbers of users make the graphic hard to read.
Any error corrections or comments can be made by sending me a pull request.