On the Maximum Size of a Prefix Code

A prefix code minimal with respect to a bitstring <inline-formula> <tex-math notation="LaTeX">$x$ </tex-math></inline-formula> is a prefix code where <inline-formula> <tex-math notation="LaTeX">$x$ </tex-math></inline-formula> is a concatenation of its codewords and it is minimal with respect to this property. What is the maximum size <inline-formula> <tex-math notation="LaTeX">$M(n)$ </tex-math></inline-formula> among all minimal codes over all bitstrings of length <inline-formula> <tex-math notation="LaTeX">$n?$ </tex-math></inline-formula> In this paper we determine the value of <inline-formula> <tex-math notation="LaTeX">$M(n)$ </tex-math></inline-formula> for all natural numbers <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula>, discuss its computational complexity, relation to the Lambert function, provide tight upper bounds, and describe how the value of <inline-formula> <tex-math notation="LaTeX">$M(n)$ </tex-math></inline-formula> enables one to construct efficiently a Huffman code in the case of uniform probability distribution of the codewords.

of cryptosystems based on prefix codes. One such cipher is described in [6]. These cryptosystems typically use an unknown prefix code as a secret key and the ciphertext, a binary string of length n, is a concatenation of codewords of that prefix code. The security of such ciphers is based on the fact that the attacker does not know the prefix code and hence finds it difficult to divide the ciphertext into the corresponding codewords.
In order to find the secret prefix code used to create some ciphertext of length n bits, the attacker can try to enumerate all possible prefix codes, which could have been used to create the ciphertext. Therefore, the value M (n) provides the upperbound on the number of codewords of prefix codes the attacker has to consider, and therefore is directly connected to the upper-bound on the complexity of such an attack.
In this paper we determine the value of M (n) for all natural numbers n, discuss its computational complexity, relation to the Lambert function, provide unified upper bounds, and describe how the properties of M (n) enable one to construct efficiently a Huffman code in the case of uniform probability distribution of the codewords.

II. PRELIMINARIES
In this section basic concepts and notions as well as results that constitute key ingredients in our study of the function M (n) will be stated.
Definition 1: Let A be a finite alphabet and P be a set. Then a code is a bijection κ : A → P ; elements of P are called codewords and P is also called a dictionary of the code. Specifically, a prefix code is a code where no codeword is a prefix of another codeword. A message x is a concatenation of finitely many words from the dictionary P .
Definition 2: Let x be a binary string. Then a prefix code P with x ∈ P + is called minimal with respect to x, if for any w ∈ P , x ∈ (P − {w}) + . The collection of all prefix codes minimal with respect to x will be denoted by V x .
The next theorem, stated in [6], plays a key role in determining the invariant M (n). Therefore, for the reader's convenience, its proof is given here as well.
Theorem 4: Let x be a binary string. Then a prefix code P is minimal with respect to x if and only if P is a partition of x into substrings.
Proof: (⇐). Let elements of a prefix code P form a partition of x into substrings; say where v i 's are not necessarily different. Assume, by contradiction, that P is not minimal. Then there would be w ∈ P such that x ∈ (P − {w}) + . This in turn implies that there exists a partition x = v j1 . . . v jt , where v ji = w for all 1 ≤ i ≤ t. If |v j1 | > |v 1 | or |v j1 | < |v 1 |, then v j1 would be a prefix of v 1 or vice versa. Therefore, |v j1 | = |v 1 |, which implies v j1 = v 1 . Repeating this process one has to arrive at i, 1 ≤ i ≤ t, for which v ji = w, a contradiction. (⇒) If P is a minimal prefix code with respect to x, then x ∈ P + . Therefore, P has to contain elements whose concatenation is x; these elements form a partition of x into substrings. Because of minimality of P , there is no other element there. Now we recall Kraft-Szilard inequality (Theorem 5) ( [1], pp. 12-13, [12]) that is valid not only for prefix codes but due to McMillan theorem ( [1], pp. 13-14, [13]) also for arbitrary uniquely decodable codes.
is a necessary and sufficient condition for n 1 , . . . , n k to be the lengths of codewords of a prefix code of size k.
For the reader's convenience, at the end of this section the so-called Lambert W function ( [3], [8]) is recalled; history of its name is described in [8]. Lambert W function is a multivalued function given implicitly by This function has various applications in combinatorics, theoretical physics, and theoretical computer science. As we will be dealing only with positive real numbers, by Lambert function we will mean y = W 0 (x), the principal branch of the function, and denote it simply by y = W (x). Directly from its definition we have For the purposes of this paper we will also need a variant of the Lambert function. By w(x) we will denote a function given implicitly by Now the relation of the Lambert function and the function w(x) will be formally expressed. We have Further, W (x)e W (x) = x and thus W (x ln 2)e W (x ln 2) = x ln 2.
Comparing (5) with (6) one gets as W (x) is an increasing function. Thus III. VALUES OF THE FUNCTION M (n) We start by reformulating the invariant M (n) by means of an integer partition of n; i.e. by expressing M (n) as a number theory problem.
Definition 6: An integer k-partition of n is a way how to write n as a sum of k natural numbers n 1 , . . . , n k ; i.e., n = n 1 + · · · + n k .
A partition n = n 1 + · · · + n k will be called a K-S partition (Kraft-Szilard partition) if the numbers n i satisfy inequality (1). Let M * (n) = max{k; there exists a K-S k -partition of n}.
Now, it will be demonstrated that for each n ≥ 2, M (n) = M * (n). Let M (n) = t. Thus, there exists a binary word x, |x| = n, and a prefix code P minimal with respect to x such that |P | = t. By Theorem 4 each minimal dictionary in V x is a partition of x into codewords. If those codewords are of lengths n 1 , . . . , n t , then by Theorem 5 the partition n = n 1 + · · · + n t is a K-S t-partition. Therefore, for each n ≥ 2, On the other hand, assume M * (n) = t. Hence, there is a K-S t -partition of n, and by Theorem 5 there is a prefix code P whose codewords are of length n 1 , . . . , n t . Let y be a concatenation of codewords in P . By Theorem 4, P is minimal with respect to y, which in turn implies In what follows, we will work with the new equivalent definition of M (n): The notion of a canonical partition will turn out to be essential for finding the values of M (n). A K-S k-partition n = n 1 + · · · + n k will be called canonical if |n i − n j | ≤ 1 for all 1 ≤ i, j ≤ k. Specifically, a canonical partition with parts in {t, t + 1} will be denoted as a (t, t + 1) partition. Now it will be shown that one can confine himself/herself to canonical partitions.
Theorem 7: If there exists a K-S k-partition of n, then there is a canonical K-S k-partition of n as well.
Proof: Let s, r be two terms of a K-S k-partition of n such that s − r ≥ 2. We consider a new partition obtained from the original one by replacing s, r by terms s − 1, and r + 1. Clearly, this new partition is a k-partition as well, and by it is a K-S partition as well. Repeating the above procedure results in a canonical K-S k-partition of n.
The following obvious observation will turn out to be useful. Claim 8: For all n ≥ 2, M(n) is a non-decreasing function. Indeed, if n − 1 = n 1 + · · · + n k is a K-S k-partition of n − 1, then obviously n = n 1 + · · · + (n k + 1) is a K-S k-partition of n as well.
The two statements below will be essential for determining the values of M (n).
Theorem 9: Let there exist a canonical K-S k-partition n = Proof: A canonical K-S k-partition of n can be written in the form n = a + · · · + a + (a + 1) + · · · + (a + 1) .
where a, α, β, α + β = k are natural numbers. (a) Assume by contradiction that M (n) = r > k. Then, by Theorem 7, there is a canonical K-S r-partition P of n = m 1 + · · · + m r , say, Similarly, for γ ≥ α, P would not be a K-S partition. We are left with the case γ < α, which obviously implies that m 1 + · · · + m r > n.
(b) With respect to Claim 8, the case M (n − 1) > M(n) does not have to be considered.
Assume by contradiction that M (n − 1) = M (n). Then, by Theorem 7 there exists a canonical K-S k-partition of n − 1, say, n−1 = m 1 +· · ·+m k = d+· · ·+d+(d+1)+· · · (d+1) . = γd + δ(d + 1), where γ + δ = k. By the same token as in the proof of (a), for d > a, it would be m 1 + · · ·+ m k > n; while d < a would imply Next theorem constitutes a main result of this paper. We note that, for any n ≥ 2, there are uniquely determined natural numbers t, s, r such that t2 t ≤ n < (t + 1)2 t+1 , and 0 ≤ s < 2 t , 0 ≤ r < t + 2, with n = t2 t + s(t + 2) + r Theorem 10 (Values of Function M (n)): For n ≥ 2, n = t2 t + s(t + 2) + r as above, M (n) = 2 t + s. In particular, Clearly, t mentioned above is uniquely determined by n. For n ≥ 2, the computational complexity of determining M (n), i.e. of finding the value of t for given n, as well as values of s and r, will be discussed in Remark 13.
Theorem 11: Let n = t2 t + s(t + 2) + r, where t, s, r are as above. Then is a (t, t+1) and a (t+1, t+2) canonical M (n)-partition of n, respectively, where M (n) = 2 t + s. In particular, of (t + 2)2 t numbers in [t2 t , (t + 1)2 t+1 ] only t+1 2 have (t + 1, t + 2) canonical M (n)-partition, the others have (t, t + 1) one, Proof: By a trivial calculation, the expression for n is 2 t + s = M (n) -partition of n, and obviously it is canonical. Hence, we only need to show that it is a K-S partition as well. In the proof of Theorem 10 it is showed that, for r = 0, n = (2 t − s − r)t + (2s + r)(t + 1) is a K-S 2 t + s = M (n)partition of n (in this case, in addition, In order to obtain a canonical M (n) partition T for n = t2 t + s(t + 2) + r, r > 0, r smallest terms of the (t, t + 1) canonical M (n )-partition T of n = t2 t + s(t + 2) are increased by 1.
Since T is a K-S partition, the obtained partition T is a K-S partition as well ( 1 2 m+1 < 1 2 m ). The last part of the proof follows from the observation that the inequality 2 t − s < r ≤ t + 1 has 1 + 2 + · · · + t = t+1 2 solutions for s < 2 t . Remark 12: We note that although, for all 0 ≤ s < 2 t , the partition of n = t2 t + s(t + 2) given above is a (t, t + 1) canonical partition, the constructed partition of n = t2 t + s(t+2)+r, r > 0, might be a (t+1, t+2) canonical partition. E.g., if n = t2 t + (2 t − 1)(t + 2), the canonical partition of n comprises one term equal to t and the other terms equal to t+1. Thus, a canonical M (n) partition of t2 t + (2 t − 1)(t + 2) + 1 has all terms equal to t+1, and a canonical M (n)-partition for In the following note, it will only be shown that the parameters t, s, r can be calculated efficiently without trying to find an optimal algorithm. Remark 13: We recall that, for any n ≥ 2, there are uniquely determined natural numbers t, s, r such that t2 t ≤ n < (t + 1)2 t+1 , and 0 ≤ s < 2 t , 0 ≤ r < t + 2, with n = t2 t + s(t + 2) + r. Once t is found, s, r can be calculated by the integral division in q log 2 q time, where q is the number of digits of n. The complexity of determining t is the complexity of calculating the largest t such that t2 t ≤ n. From t2 t ≤ n one gets t + log 2 t ≤ log 2 n. Thus t is in the interval [1, log 2 n]. For the sake of simplicity, we set log 2 n . = m. Clearly, the complexity of calculating the expression t2 t grows with the value of t; we denote by C the complexity of calculating m2 m . Therefore, the complexity of a crude method that calculates the expression t2 t till it finds t 0 with t 0 2 t0 > n is the number of iterations ×C ≤ mC.
To calculate m2 m , we first compute 2 m . Using binary exponentiation, this value can be calculated with O(log 2 m) = O(log 2 log 2 n) multiplications. Using the Harvey-Hoeven algorithm [7], which multiplies two numbers with q digits in O(q log 2 q) time, we get the complexity O(log 2 log 2 n × log 2 n × log 2 log 2 n) = O(log 2 n × (log 2 log 2 n) 2 ) of calculating 2 m . Further, to determine C, the complexity of calculating m2 m , we multiply two numbers of maximum log 2 n digits, i.e. the complexity of that multiplication is O(log 2 n × log 2 log 2 n). Altogether, the complexity C of calculation of m2 m is therefore and the total complexity of determining the largest t such that t2 t ≤ n is at most This exhibits that t can be calculated efficiently even by a crude method. Using the bisection method would provide the same complexity C for each iteration but the maximum number of iterations would be O(log 2 log 2 n) instead of O(log 2 n). Likely, even better computational complexity of determining t would be obtained by applying Newton's method.
We note that t can also be determined by means of the Lambert function y = W (x) given implicitly by x = ye y , x, y ≥ 1 ∈ R. In practice, the values of W (x) can be calculated by using a mathematical software, but the complexity of this computation depends on the corresponding implementation. In addition, the calculation has to be done over R, and not over N. Commonly, the numerical evaluation of W (x) is done either by Newton's method or by Halley's method [3].

IV. TIGHT UPPER BOUNDS ON M (n)
Formula for M (n) given by Theorem 10 depends on the parameter t that has to be determined for the given n. Therefore, two tight unified (direct) bounds on M (n) for all N will be provided below. In order to simplify formulation of the results and proofs in this section a function ε(x), given implicitly by x = y log 2 y, will be used. Clearly, ε(n) = 2 w(n) . Proof: Differentiating ε(x) implicitly, that is, differentiating x = y log 2 y one gets y = ln 2 ln y + 1 > 0 for y ≥ 1, y = − (y ) 2 y(ln y + 1) < 0 for y ≥ 1.
By Remark 15, the upper bound M (n) ≤ ε(n) is tight. On the other hand, the difference between ε(n) and M (n) can be arbitrarily big.
Proof: For t ∈ N, we set n t . = t2 t + 2 t−1 (t + 2) = 2 t ( 3 2 t + 1). By Theorem 10, M (n t ) = 2 t + 2 t−1 = 3 2 2 t . Function ε(n) satisfies n = ε(n) log 2 ε(n). Hence, to prove that, for t big enough, length for a i is n i , then the average length of a codeword is L = k i=1 n i p(a i ). One of the fundamental problems in source coding is to construct a binary code with minimum L. Such a code is obtained by the Huffman algorithm and is called the Huffman code ( [1], pp. 17-21, [9]). A code with minimum L is not unique in general.
An efficient way of constructing a Huffman code in the case of uniform probability distribution of symbols in the source alphabet was likely already known to Huffman himself. Such an efficient construction is indicated in [4] and explicitly described in the accompanying solutions manual [5]. The purpose of this section is to exhibit the relation between the function M (n) and the Huffman code in this special case, and to briefly describe a way how the properties of M (n) lead to another efficient construction algorithm.
We recall that the value of M (n) represents the largest number k for which there exists a K-S k-partition of n; i.e., by Theorem 5, there exists a binary prefix code with k codewords of total length n. By the same token, for M (n) < s, there is no binary prefix code with s codewords of total length n.
Therefore, the minimum total length of codewords is attained for the smallest natural number ν with M (ν) = k. We recall that M (n) = k for all n from an interval t2 t +s(t+2) ≤ n < t2 t +(s+1)(t+2), where t, s are suitable numbers. Hence, the smallest number ν such that M (ν) = k, is of the form where t and s satisfy k = 2 t + s, 0 ≤ s < 2 t .
In what follows, technical details are left to the reader. Based on basic properties of the function M , the value of t is given by t = log 2 k .
By Theorem 11, the unique canonical K-S k-partition of ν consists of 2 t+1 − k parts equal to t and 2(k − 2 t ) parts equal to t + 1. To construct k codewords of the required length it suffices to take first all binary words of length t and then replace k − 2 t of them by two binary words of length t + 1 by adding a 0 and a 1 as the first digit. We stress that all calculation needed for the above construction of the code has boiled down to calculating the value of t from t = log 2 k.
As a prefix code with k codewords of total length l < ν does not exist, the constructed code has to be a Huffman code.