Distributing Data with Zero Migration

Jonathan Yue

doi:10.36227/techrxiv.22974845.v1

loading page

Distributing Data with Zero Migration

Jonathan Yue

Abstract

ZeroMove hashing is a novel data distribution technique for distributed systems that offers several key benefits. In contrast to the consistent hashing algorithm, which requires data migration when scaling the system, ZeroMove hashing enables the addition of clusters of nodes on demand without the need to move data between nodes. A cluster is located using an encoded unique identifier, while a node is identified with a hash function within a cluster. This approach ensures that data remains in the node where it is hashed, thereby increasing availability and improving system performance. Furthermore, the ZeroMove hashing technique can significantly reduce facility and administrative expenses, making it an excellent option for largescale distributed systems. Our tests on consistent hashing and ZeroMove hashing have shown that scaling from one node to six nodes with 480,000 data records took 6100 seconds in a system based on consistent hashing. In contrast, it took only 1.2 seconds for ZeroMove hashing to achieve similar scaling under the same settings. With consistent hashing, the time taken and amount of data moved increase proportionally with the amount of data stored in the system. However, with ZeroMove hashing, these values does not increase in proportion to the amount of data being stored. This is because ZeroMove hashing only involves the exchange of small amount of metadata between nodes during scaling processes.