Abstract
ZeroMove hashing is a novel data distribution technique for distributed
systems that offers several key benefits. In contrast to the consistent
hashing algorithm, which requires data migration when scaling the
system, ZeroMove hashing enables the addition of clusters of nodes on
demand without the need to move data between nodes. A cluster is located
using an encoded unique identifier, while a node is identified with a
hash function within a cluster. This approach ensures that data remains
in the node where it is hashed, thereby increasing availability and
improving system performance. Furthermore, the ZeroMove hashing
technique can significantly reduce facility and administrative expenses,
making it an excellent option for largescale distributed systems. Our
tests on consistent hashing and ZeroMove hashing have shown that scaling
from one node to six nodes with 480,000 data records took 6100 seconds
in a system based on consistent hashing. In contrast, it took only 1.2
seconds for ZeroMove hashing to achieve similar scaling under the same
settings. With consistent hashing, the time taken and amount of data
moved increase proportionally with the amount of data stored in the
system. However, with ZeroMove hashing, these values does not increase
in proportion to the amount of data being stored. This is because
ZeroMove hashing only involves the exchange of small amount of metadata
between nodes during scaling processes.