TechRxiv
Chusov.pdf (402.47 kB)
Download file

Outperforming Sequential Full-Word Long Addition with Parallelization and Vectorization

Download (402.47 kB)
preprint
posted on 13.01.2022, 17:09 by Andrey ChusovAndrey Chusov
The paper presents algorithms for parallel and vectorized full-word addition of big unsigned integers with carry propagation. Because of the propagation, software parallelization and vectorization of non-polynomial addition of big integers have long been considered impractical due to data dependencies between digits of the operands. The presented algorithms are based upon parallel and vectorized detection of carry origins within elements of vector operands, masking bits which correspond to those elements and subsequent scalar addition of the resulting integers. The acquired bits can consequently be taken into account to adjust the sum using the Kogge-Stone method.
Essentially, the paper formalizes and experimentally verifies parallel and vectorized implementation of carry-lookahead adders applied at arbitrary granularity of data. This approach is noticeably beneficial for manycore, CUDA and vectorized implementation using AVX-512 with masked instructions.

History

Email Address of Submitting Author

chusov.and@gmail.com

ORCID of Submitting Author

0000-0002-7931-5368

Submitting Author's Institution

Far-Eastern Federal University

Submitting Author's Country

Russian Federation