loading page

Outperforming Sequential Full-Word Long Addition with Parallelization and Vectorization
  • Andrey Chusov
Andrey Chusov
Far-Eastern Federal University

Corresponding Author:[email protected]

Author Profile


The paper presents algorithms for parallel and vectorized full-word addition of big unsigned integers with carry propagation. Because of the propagation, software parallelization and vectorization of non-polynomial addition of big integers have long been considered impractical due to data dependencies between digits of the operands. The presented algorithms are based upon parallel and vectorized detection of carry origins within elements of vector operands, masking bits which correspond to those elements and subsequent scalar addition of the resulting integers. The acquired bits can consequently be taken into account to adjust the sum using the Kogge-Stone method.
Essentially, the paper formalizes and experimentally verifies parallel and vectorized implementation of carry-lookahead adders applied at arbitrary granularity of data. This approach is noticeably beneficial for manycore, CUDA and vectorized implementation using AVX-512 with masked instructions.
01 Dec 2022Published in IEEE Transactions on Parallel and Distributed Systems volume 33 issue 12 on pages 4974-4985. 10.1109/TPDS.2022.3211937