Data Amplification Using the Iterated Newcomb-Benford Distribution
The Newcomb-Benford (NB) distribution of first digits has been applied widely in many areas ranging from engineering to natural and biological sciences for the investigation of self-similarity and randomness. In this article, we consider systems for which the data is not enough to obtain proper first digit statistics, and we propose the use of an iterated version of the distribution where the statistics are aggregated over different scales on grounds that the first digit distribution is approximately scale invariant across a wide range of phenomena and also because scaling and recomputing first digits is not a linear process and so this process generates new data. We provide examples of the use of the iterated test for data in two different biological applications, viz. that of the secretome and the genetic code in both of which the raw data does not include all the nine different first digits. The paper includes proposals for further research on the idea of data amplification using scaling transformations.