Decoding of NB-LDPC codes over Subfields

The non-binary low-density parity-check (NB-LDPC) codes can offer promising performance advantages but suffer from high decoding complexity. To tackle this challenge, in this paper, we consider NB-LDPC codes over finite fields as codes over \textit{subfields} as a means of reducing decoding complexity. In particular, our approach is based on a novel method of expanding a non-binary Tanner graph over a finite field into a graph over a subfield. This approach offers several decoding strategies for a single NB-LDPC code, with varying levels of performance-complexity trade-offs. Simulation results demonstrate that in a majority of cases, performance loss is minimal when compared with the complexity gains.


I. INTRODUCTION
Low-density parity-check (LDPC) codes, which were first introduced by Gallager in 1962 [1], have become the errorcorrecting codes of choice for many practical applications, such as Ethernet, Wi-Fi, and digital television, due to their capacity approaching performance and low-complexity decoding algorithms [2].Davey and Mckay introduced the nonbinary (NB) counterparts of these codes in 1998 [3], and it was soon realized that NB-LDPC codes outperform the binary LDPC codes of comparable length, especially for short-tomoderate code lengths.But these performance gains are yet to be realized in practice due to the high complexity of decoding algorithms.
Best performing decoding algorithm for NB-LDPC codes is the Q-ary sum-product algorithm (QSPA), a generalization of the sum-product algorithm used with binary LDPC codes [3].Complexity of QSPA is of the order O(q 2 ), where q is the cardinality of the algebraic structure over which the code is defined.However, this complexity is too high for most practical applications, and in addition, QSPA also requires a lot of resources, particularly since messages used in decoding are vectors of length q.Fast Fourier transform based implementation of QSPA (FFT-QSPA) reduces decoding complexity to O(q log q), but still requires similar levels of hardware resources [4].Log-domain implementations of QSPA (LLR-QSPA) have also been considered in the literature [5].
In [5], the authors introduced a simplified version of LLR-QSPA, referred to as 'max-log-SPA', by extending the simplification used in min-sum decoding to NB-LDPC codes and QSPA.This approach was further developed in [6] by introducing the 'Extended Min-Sum' (EMS) algorithm.Instead of considering the complete length q vectors at check node operations, EMS uses the n m most significant values of each vector, resulting in a complexity order of O(n m q).A different approach to simplifying operations of QSPA was proposed in [7], where the 'Min-max' algorithm was introduced that has the same complexity order as QSPA, but requires only additions and comparisons.Efficient hardware implementations were proposed for both EMS and min-max algorithms in [8], [9].
Expanding the parity-check matrix (PCM) of a NB-LDPC code into a binary one allows devising low-complexity bitlevel decoding strategies for the non-binary code.Such expansion was proposed in [10], called the 'extended binary representation'.Additionally, a decoding algorithm for NB-LDPC codes over the binary erasure channel was introduced in [10].This strategy is adapted to general channels, as discussed in [11].In [12], the authors used the binary image of the nonbinary PCM to decode NB-LDPC codes.
While it is possible to construct NB-LDPC codes over many algebraic structures, they are often defined over finite fields, particularly those of characteristic 2 [3], i.e.F 2 r .A finite field F p r contains a unique subfield F p m , for every m | r [13].In this paper, we consider expanding the PCM of a NB-LDPC code over F p r to a matrix over any such subfield F p m .For codes over F 2 r , this includes the expansion into F 2 , a binary expansion.We then propose a general decoding algorithm, usable with any of the many possible expansions.Since the operations of the decoder would now be with a smaller size field, significant gains in complexity is achievable, and simulation results demonstrate that the performance loss in comparison to QSPA is minimal.Also, it now becomes possible to decode the same code over several different fields, each offering a different performance-complexity trade-off.
The remaining of the paper is organized as follows.Section II introduces the mathematical concepts used for the expansion, while Section III provides the expansion along with examples.Section IV presents the decoding strategy, whereas Section V includes simulation results.Section VI analyzes the complexity and resource requirements of the new decoding scheme, and Section VII concludes the paper.

II. α-CONNECTED SUBGROUPS
Consider a finite field of characteristic p, F p r , and one of its additive subgroups, G.We denote a primitive element of F p r with α.It is easy to verify that multiplying all the elements in G with some α i ∈ F p r yields another additive subgroup.Then we have the following definition.Definition 1. Additive subgroups G 1 and G 2 of F p r , where one can be obtained from the other by multiplying by some power of α, are called α−connected subgroups.
If subgroups G 1 and G 2 are α-connected, and so are G 2 and G 3 , then clearly G 1 and G 3 are also α-connected.This yields the following denifition.
An α-connected set S can be generated using any G i ∈ S, simply by multiplying with increasing powers of α.Each generated subgroup will be added to the set until for some power i * , α i * .G i results in G i itself.This i * would be the cardinality of the set S, which we denote with |S|.Lemma 1 considers the minimum possible cardinality of an α-connected set.
Lemma 1.Consider F p r and let m | r.Then the smallest possible α-connected set of additive subgroups of order p r−m has a cardinality of p r −1 p m −1 .Proof.Let G be an additive subgroup of F p r , and Minimum possible cardinality of an α-connected set is the minimum possible value i satisfying Let S α im be the set of elements in F p r generated by α im .Note that i m is the minimum non-zero power of α in S α im .If Since we are focused on the minimum, we only consider i m , for which the following relation holds.
Since we assume As G is an additive subgroup, it must contain the additive identity 0, and 0 • S α im = {0}.Note that for g 1 , g 2 ∈ G that are both = 0, sets g 1 .S α im and g 2 .S α im would be of the same size, and they should either be the same set or disjoint.Then, as order of G is p r−m , disregarding 0, following should hold for some value n.
From ( 1) and ( 2), we see that |S α im | is a factor of both (p r −1) and (p r−m −1).Since (1) shows that i m and |S α im | are inversely proportional, |S α im | should be the greatest common divisor of (p r − 1) and (p r−m − 1).We note the following.
Above allows us to conclude that gcd(p r − 1, p r−m − 1) = (p m − 1), and using (1); Even though lemma 1 shows that the smallest α-connected set should be of cardinality p r −1 p m −1 , it does not reveal how to construct such a set.It should also be noted that the additive property of G is not necessary for the proof.Only the existence of the additive identity is used.
Lemma 2 outlines a method to construct an α-connected set of subgroups of order p r−m .Lemma 2. Let G be a subgroup of order p m in H ′ = {F p r , +}, and ψ some surjective homomorphism ψ : H ′ → G.The kernels of the set of homomorphisms ψ i (h ′ ) = ψ(α −i h ′ ), for i = {0, 1, ..., p r − 2}, form an α-connected set of additive subgroups of order p r−m .
Cardinality of an α-connected set generated as in Lemma 2 depends on the homomorphism ψ.Therefore, to construct the smallest α-connected set, one must find a suitable homomorphism.The homomorphism we use is based on the representation of F p r as an extension of the subfield F p m .The following Lemma establishes the structure of F p m in F p r .Lemma 3. Let S β be the set of elements in F p r generated by β = α p r −1 p m −1 .Then, S β ∪ {0}, where 0 is the additive identity of F p r , is the subfield F p m .Proof.Since m | r, F p r should contain the subfield F p m .Let K ′ = {F p r , ×} and K = {F p m , ×}.K is a subgroup of order (p m − 1) of K ′ .Note that both K and K ′ are cyclic.From properties of subgroups of cyclic groups [13], there should be only one unique subgroup of a specific order in K ′ .Set of elements generated by β, S β , is such a subgroup, of order (p m −1), and thus K = S β .This allows the conclusion F p m = S β ∪ {0}.
We are interested in the polynomial representation of F p r as an extension of F p m .In such a representation, some α i ∈ F p r is represented with a polynomial E α i (x) over F p m , of degree at most ( r m − 1).In the case of elements belonging to F p m (for β i ), the polynomials would be of degree 0. Primitive polynomial for the representation, Π(x), is of degree r m .Also note that since Π(x) is irreducible, it should have a nonzero constant term.Based on this representation, we define a homomorphism ψ * between the additive groups of F p r and F p m as follows.
Using homomorphism ψ * in the method proposed in Lemma 2 generates an α-connected set of minimum cardinality.
Let g j ∈ ker(ψ * 0 ) and α im g j = γ j (j = 1, ..., p r−m ).Let polynomial representations (in the extended representation) of α im , g j and γ j be, respectively, E α im (x), E gj (x) and E γj (x).These are related as follows, where K j (x) is some polynomial over F p m .
Since g j ∈ ker(ψ * 0 ), constant term in E gj (x) is zero, which makes the constant term in E α im (x)E gj (x) also zero.Note that for γ j ∈ ker(ψ * 0 ), constant term of E γj (x) should be zero.As observed earlier, Π(x) has a non-zero constant term, and therefore, for γ j ∈ ker(ψ), K j (x) should be a polynomial with a zero constant term.For α im ker(ψ * 0 ) = ker(ψ * 0 ), this should be true for j = 1, .., p r−m .
Polynomial representations of elements in ker(ψ * 0 ) contains at least one polynomial of each possible degree, from 0 to r m − 1.Then, if deg(E α im (x)) > 0, for at least one value of j, E α im (x)E gj (x) would be of degree r m .Since Π(x) is also of degree r m , this requires K j (x) to be a non-zero constant for that particular value of j, resulting in α im ker(ψ * 0 ) = ker(ψ * 0 ).Therefore, for γ j ∈ ker(ψ * 0 ) for all j = 1, .., p r−m , deg(E α im (x)) = 0.In such a case, K j (x) = 0 for all j.This requires α im ∈ F p m , and since we require the minimum, i m = p r −1 p m −1 .Thus, using homomorphism ψ * as in lemma 2, it is possible to construct an α-connected set of additive subgroups of order p r−m , that has the minimum cardinality p r −1 p m −1 , as proved in Lemma 1.

III. GRAPH EXPANSION
In this section, we will present how a graph over F p r can be expanded into a larger one over F p m , where m | r, using the smallest set of α-connected subgroups of order p r−m in F p r , constructed as detailed in the previous section.We represent this special α-connected set by Θ p r−m from here onwards.Basic mathematical concepts used in the expansion are briefly over-viewed in subsection A, while the expansion is presented in subsection B, along with an example.

A. Preliminaries
Consider some surjective homomorphism ψ : H ′ → H, where H ′ = {F p r , +} and H = {F p m , +}.As remarked earlier as well, ker(ψ) is a subgroup of H ′ , of order p r−m .Since the homomorphism is surjective, according to the first isomorphism theorem [13], quotient group is isomorphic to H. Q ψ contains the p m cosets of ker(ψ), including the trivial coset (ker(ψ) itself).In the isomorphism between Q ψ and H, this trivial coset would map to the identity element of H (the additive identity of F p m ), and the other cosets would map to the remaining elements of H. Let where each C j ψ represents some coset of ker(ψ), with C 0 ψ representing the trivial coset.Cosets contain elements in F p r , and using the multiplicative properties of the field, we define a 'multiplication' operation on Q ψ as follows.
Definition 4. Operation βQ ψ , for some β ∈ F p r , is defined as Given that two subgroups in H ′ are α-connected, then the respective quotient groups are also related in a similar way, as shown in the following lemma.
Here the trivial cosets C 0 ψj are the subgroups themselves, and all the cosets can be represented using the respective subgroup and some coset leader term as follows: Using the multiplication operation on Q ψ2 yields As cosets of any subgroup are mutually exclusive, and due to the multiplicative properties of the field, any , and its (p m − 1) proper cosets, albeit the coset leader terms could have changed.Thus, Q ψ1 and α k Q ψ2 are the same sets.Comparing the original representation of Q ψ1 with α k Q ψ2 , it is apparent that the position of the trivial coset is not changed, but there is no such guarantee for other cosets.Thus, when elements of the quotient groups are considered in some specific order, then The homomorphisms ψ * i we use in constructing the smallest α-connected set, Θ p r−m , are all surjective.Θ p r−m consists of the kernels of these homomorphisms, and it is possible to construct a set of quotient groups with those kernels.Let that set be provide some insights on how to decode a code over F p r over one of its subfields F p m .Instead of traditionally used symbol probabilities, we consider the probabilities of a variable node belonging to each coset of each quotient group in Θ Q p r−m .Then, for each variable node, p r −1 p m −1 probability vectors of length p m are required, which we refer to as 'coset probability vectors (CPVs)'.Complexity bottleneck in decoding NB-LDPC codes are the check node operations [6]- [7], and advantages of our approach become apparent when the impact on that step is assessed.Check node operations in decoding NB-LDPC codes consist of two major sub-steps: permutation and convolution of probability vectors [4].In the permutation sub-step, the simpler one of the two, symbol probability vectors received by the check node are permuted, where the permutations are defined by the respective edge weights.Since Θ Q p r−m is constructed using the smallest α-connected set Θ p r−m , it is clear from Lemma 5 that CPVs will also have to be permuted similarly.Thus, complexity of the permutation step will not be significantly affected in the proposed approach.
In order to understand how our approach changes the convolution sub-step, consider the simple case of a degree 3 check node in a code over F p r , where the parity-check equation is v 1 + v 2 + v 3 = 0.A convolution has to be carried out using the incoming symbol probability vectors of v 1 and v 2 for computing the outgoing symbol probability vector of v 3 , p s v3 .Since these vectors are of length r, convolution will be of complexity order O(p 2r ).Now assume we have to compute some i'th CPV of v 3 , p c v3,i .This computation also only requires the incoming i'th CPVs of the remaining two variable nodes.Note that these are of length m, where m | r.As all quotient groups in Θ Q p r−m are isomorphic to the additive group of F p m , computation of p c v3,i should be the same as the convolution sub-step at a check node of a code over F p m .Thus, complexity is now only of order O(p 2m ).However, with , that many CPVs will have to be computed, resulting in an overall complexity of p 2m × p r −1 p m −1 ≈ O(p m+r ).Nevertheless, particularly for the cases where m ≪ r, this is a significant reduction of complexity.
Motivated by the observation that using coset probability vectors instead of symbol probability vectors can allow faster decoding of NB-LDPC codes, we will provide a more detailed analysis of these complexity advantages in Section VI.In the following subsection, we explain how to expand a graph over F p r into one over F p m so that CPVs can be used in decoding.

B. Graph Expansion
We assume that a Tanner graph of a code over F p r is to be expanded into a graph over F p m , where m | r.The set of quotient groups, Θ Q p r−m , will be of cardinality +}, and in decoding, an associated CPV has to be used.Observations on how CPVs impact decoding suggest that it is possible to simply replace each node in the original graph, i.e., the so-called F p r nodes, with p r −1 p m −1 F p m nodes.Each variable node over F p m would represent some CPV, and check nodes would calculate their estimates.How the set of F p m variable and check nodes of a single neighboring variable-check node pair of the original graph are connected will depend on the original edge weight, as evident from Lemma 5.
Consider a check node and a variable node in the original graph, connected with an edge of weight α k ∈ F p r .According to Lemma 5, Q i ∈ Θ Q p r−m becomes a permutation of some Q j ∈ Θ Q p r−m when multiplied with α k .Then, in the expansion, the F p m variable node representing the i'th CPV should be connected to the F p m check node calculating estimates of the j'th CPV.As Q i turns into a permutation of Q j , CPVs transmitted along this edge is permuted as well.Thus, this is a 2-step process, where first the set of CPVs are permuted, and then each CPV is permuted within itself.From the pointof-view of expansion, it is equivalent to connecting the set of F p m variable nodes with the set of check nodes using edges labeled with elements from F p m .
As an example, consider parity-check equation ρ from a code over F 2 4 , where α denotes a primitive of the field.
Fig. 1 presents the initial expansion for ρ.The shaded graph is the original Tanner graph over F 2 4 , and the graph beneath is the expansion over F 2 2 .In both graphs, circles denote variable nodes and squares denote check nodes.Note that ω is a primitive of F 2 2 and that edges in the expanded graph are labeled with F 2 2 elements.
contains different groupings of the same set of symbols, the associated CPV contains some information about all other CPVs.Unfortunately, initial graph expansion is unable to capture these dependencies.In order to clearly visualize the relationships between CPVs, we propose an alternate representation of F p r symbols below.
As each as the element of H that maps to the coset containing γ.
In the general case of F p r and F p m , alternate representation vectors would form a r m dimensional vector space, or in other words a ( p r −1 p m −1 , r m ) code, over F p m .The p r −1 p m −1 F p m nodes of every F p r variable node would form this code, and since each such instance only involves the set of F p m nodes of a single F p r variable node, we refer to it as the 'local code'.We propose using the parity-check matrix (PCM) of the local code, p H r,m L , to succinctly represent the dependancies between CPVs.Note that codes with parameters of the form ( p r −1 p m −1 , r m ) are from the family of non-binary simplex codes.Since dual of such a code is the Hamming code over F p m [14], parity-check equations would be Hamming codewords, and the PCM would contain p r −1 p m −1 − r m of those.As an example, the 'local' PCM for the case of F 2 4 and F 2 2 , 2 H 4,2 L , which consists of 3 codewords of the (3, 5) Hamming code over F 2 2 , is given below.
From the perspective of the expansion, parity-check equations due to the local code are a set of additional check nodes, which have to be added to the expanded graph.Since the set of F p m nodes of every F p r variable node forms one instance of the local code, p r −1 p m −1 − r m additional check nodes representing the local PCM have to be added per variable node of the original graph.Adding such a large number of check nodes might seem to increase complexity, but it should be noted that these new nodes are of low degrees.Since the dual is a Hamming code, it should always be possible to find a set of degree 3 parity-check equations for the local PCM.With each new check node only being connected with a subset of the p r −1 p m −1 nodes of one F p r variable node, they will be referred to as 'local check nodes' here onward.Check nodes resulting from expanding F p r nodes will be called 'regular check nodes'.
Performance with any form of iterative message passing decoding is dependant on features of the graph used, and it is well-known that short cycles in the graph negatively impact decoding.While with binary codes, where the edge labels are all 1, effects of cycles depend only on their length, edge labels themselves have an impact in the non-binary case [15].Particularly troublesome there are the cycles created by sub-matrices in the PCM that are not of full-rank [15]- [16].Decoding performance of the expanded graph may be improved if the subgraph induced by the local PCM is free of these undesirable structures as much as possible.Although one could use a canonical generator matrix of a Hamming code as the local PCM, the graph induced may not entirely suit iterative decoding.In such a scenario, raw operations can be carried out on the matrix until a 'better' one is obtained.This is particularly important when the expansion results in a binary graph (p = 2 and m = 1), since then any short cycle is detrimental for decoding.We have explored this case separately in [17], where the expansion is arrived at in a slightly different way than the more general approach presented here.In the non-binary case (m > 1), it might not be possible to remove all short cycles, and one might have to be satisfied with a local PCM only free of short cycles not satisfying the 'full-rank condition', such as 2 H 4,2 L given in (4).Decoding scheme we propose in the next section employs a technique to further reduce the possible negative effects of cycles among local check nodes.
Local check nodes enable us to adequately capture the various dependancies between CPVs, and adding those to the expanded graph wraps up the expansion.Different steps necessary for expanding a graph over F p r into one over F p m , where m | r, can be summarized as follows.
1) Obtain the smallest set of α-connected subgroups Θ p r−m , using the homomorphism presented in definition 3, and following the steps outlined in lemma 4. Use that to derive the set of quotient groups with elements of H = {F p m , +}.Use these isomorphisms to obtain the alternate representation vectors of F p r elements.
3) Find a PCM more suited to iterative decoding for the code formed by alternate representation vectors.4) Expand each node in the original graph into p r −1 p m −1 F p m nodes.Connect the new variable and check nodes and label the edges, based on edge labels in the original graph.5) Add local check nodes to represent the local PCM found in step 3. Fig. 2 presents the complete expansion for the earlier example of parity-check equation ρ, given by (3).F 2 4 graph is shaded grey, and the expansion to F 2 2 is depicted in white.Circles represent variable nodes, squares regular check nodes, and hexagons local check nodes.Note that in the interest of a clearer figure, edge labels are only shown for the instances where they are = 1.
IV. DECODING SCHEME An iterative message passing decoding algorithm that utilizes the Tanner graph representation of NB-LDPC codes can Fig.2: Final Expansion be used with the expanded graph.Advantage herein is the expansion being over a smaller field than the original graph, leading to a lower decoding complexity.Note that a few different options are available for expanding a graph over F p r , one for each factor of r.Each of these would offer a different complexity-performance trade-off, which may suit different applications.
Any generic decoding algorithm can be applied straightforwardly to decode the expanded graph with some simple modifications.In the following, we present these modifications, and explain why they are required.Note that the explanation is from the perspective of a soft decision decoding (SDD) algorithm, such as QSPA [3], and its many variations [4]- [8], but can be also applied to other algorithms such as majority-logic decoding [18].

1) Computing Channel Estimates:
Any SDD algorithm has to be initialized with probability estimates based on channel observations.In QSPA and its variants, for initializing the decoder, variable nodes compute channel estimates that are of the form of symbol probability vectors.In the proposed expansion, each variable node represents some CPV.Thus, when using SDD algorithms on expanded graphs, it is required to compute initial estimates for CPVs.
Note that each coset contains a subset of elements in F p r .This makes computing initial estimates of CPVs quite straightforward, i.e., probability of a (F p r ) variable node belonging to a particular coset of some subgroup can be calculated by simply summing up probabilities of those symbols that belong to the coset.Equation ( 5) presents this computation, where p s n is the symbol probability vector of original variable node n, p c n,i is the i'th CPV of that node, C j i is the j'th coset in i'th quotient group, and a j k 's are F p r elements in that coset.
In most practical applications, decoders operate on either log or log-likelihood ratio (LLR) domain, due to hardware stability concerns [19].In such a case, p c n,i has to be converted to the desired domain, for example For one F p r variable node, p r −1 p m −1 F p m nodes that represent CPVs have to be initialized as in (5).Only a single symbol probability vector, corresponding to the single F p r element transmitted through the channel, will be used for all those computations.This implies that channel observation is only sufficient to initialize r m F p m symbols.However, in this approach, there are p r −1 p m −1 nodes that are initialized.Thus, channel observations are duplicated and dependencies are created between initial estimates of CPVs.Any error in channel estimates gets multiplied, and propagates through the graph, leading to performance losses.
Recall the fact that the set of F p m nodes of a single F p r variable node are 'connected' via the local code.Local code is an ( p r −1 p m −1 , r m ) code, and therefore, r m out of the p r −1 F p m nodes can be thought of as representing information symbols, and others parity symbols.We propose first picking a suitable set of r m nodes to represent information symbols, and initializing only these as in ( 5) and (6).For rest of the nodes, those that represent parity symbols of the local code, an additional scaling factor δ (0 ≤ δ ≤ 1) will be used in (6).Our simulation results show that this modification helps in reducing propagation of errors in channel information, but δ has to be optimized per code.Equation ( 7) presents this modification.
After initialization, operations of the decoder would be similar to those of a decoder for a code over F p m except for a couple of minor modifications that are explained in the following.
2) Distinguishing Local Checks from Regular Checks: Expanded graphs contain two different types of check nodes; local check nodes that represent dependencies between CPVs, and regular ones, resulting from expanding check nodes in the original graph.Here, local check nodes are only connected with F p m nodes of a single F p r variable node, whereas a regular check node will only be connected with one such.Thus, local check nodes do not represent relationships between different variables of the original code, and regular ones represent only those.This means that estimates from the two types of check nodes are based on two separate linear codes, and treating them similarly may not be the best approach to take.
As discussed in Section III B, local PCM may contain some short cycles, and these would be present in the expanded graph among the local check nodes.Estimates computed by a check node involved in one such cycle in two different iterations will be correlated with each other to some degree.This can make the estimates 'over-confident' of a variable node taking a particular value.
Taking into consideration the need to distinguish between estimates of local and regular check nodes, and also since local check nodes could be involved in short cycles, we propose using another scaling factor ψ (0<ψ<1) with estimates of local check nodes.In the literature, similar approaches have been taken to mitigate effects of short cycles with satisfactory results, for example in [20].
Combining probability estimates with this modification, at some variable node i of the expanded graph during k'th decoding iteration, is given by (8).There, L i is the initial estimate for node i, R (k) i is the combined estimate, and r (k) j− →i is the estimate sent from j'th check node to i'th variable node, in k'th iteration.L i , R Similar to scaling factor δ used in initialization, ψ also has to be optimized per code.
3) Testing for Convergence: In iterative decoding of NB-LDPC codes, a tentative decision is taken by every variable node in each iteration to test whether the decoder has converged to a valid codeword.If so, then the check-sum at every check node should be zero, and the decoding process can be terminated.Same approach may be taken when decoding on expanded graphs.Tentative decision at each variable node would be the F p m element most likely for the node, and check-sums would be computed at all check nodes, including local ones.Output of the decoder would be a vector of F p m elements that's p r −1 p m −1 times longer than the original code length.Original codeword can be recovered by mapping each set of p r −1 p m −1 F p m elements to a single F p r element, via the 'local' code, as discussed in Section III B.
Even though we replace each F p r node with p r −1 p m −1 F p m nodes, just r m F p m elements are sufficient to represent a single F p r element, which is also evident from the local code.This observation leads to a slightly easier approach to checking convergence.Rather than deciding on all F p m nodes of a single F p r variable node, we propose only using the r m nodes selected as the 'information symbols' of the local code.Most likely F p m elements of these would map to a single F p r element, once more through the local code.Check-sums of original parity-check equations can then be computed with these F p r elements.Note that even though now check-sums are computed over the larger field, computations involve only simple field arithmetic, and also there will be a significant reduction in the number of computations required when compared with the straight-forward approach.
With these three modifications, any iterative soft-decoding algorithm [3]- [8] proposed for NB-LDPC codes may be used with expanded graphs.This allows a large number of decoding strategies.For applications where decoding latency is the primary concern, a simplification of QSPA, such as min-max decoding [7], can be used with an expanded graph, thereby achieving the complexity gains of both the simplification and the expansion.Section V presents some results from simulations where a few of these different strategies were evaluated.

V. SIMULATION RESULTS
In this section, we compare error-correcting performance of decoding schemes discussed in Section IV against some existing decoding algorithms for NB-LDPC codes.We consider different expansions of the same Tanner graph (different m for a fixed graph), and use QSPA [3], and one of its well-known simplifications, min-max decoding [7], with each expansion.QSPA and min-max decoding are also used on the original graph, along with max-log-SP algorithm [5], which is a special case of the extended min-sum (EMS) algorithm [6], where n m and n c are set to the maximum possible values of the size of the field and check-node degree, respectively.All algorithms were implemented in LLR domain [5], and simulations were done over the BI-AWGN channel, with maximum decoding iterations of 50 for all.Algorithms over expanded graphs were used with the modifications proposed in Section IV, and scaling factors δ and ψ were optimized through simulations.In the following, we use the algorithm along with the field size to refer to different decoding setups, for example, we let F p r -QSPA denote QSPA on a graph over F p r , and etc. Fig. 3 shows FER performance of decoding schemes with C 1 , a rate 0.89 code over F 2 6 , of 1998 symbols in length.Code was generated through random re-labeling of a regular binary LDPC code of column weight 4, obtained from [21].
In Fig. 3, we observe that decoding algorithms over expanded graphs perform close to the best known decoder, QSPA over the original graph.In fact, QSPA over the F 2 3 expansion performs within 0.2dB of F 2 6 -QSPA, at a FER of 10 −4 .When using the F 2 2 expansion, this widens slightly to 0.3dB.While min-max decoding over the original graph has a gap of only about 0.08dB with F 2 6 -QSPA, it should be noted that decoding is still over F 2 6 , and thus, it is more complex than QSPA over expanded graphs, as made evident in Section VI.Interestingly, other simplification of QSPA, maxlog-SP algorithm, is outperformed by all proposed decoding schemes, although it operates in the original field.Max-log-SP shows a gap of about 0.55dB with F 2 6 -QSPA, at a FER of 10 −3 .We also evaluate performance of min-max decoding over expanded graphs, which is quite satisfactory.In the case of F 2 3 expansion, min-max only has a gap of 0.06dB with F 2 3 -QSPA, while the gap between F 2 2 -QSPA and F 2 2 -min-max is around 0.1dB.Interestingly, these two decoding setups, which have complexity advantages of expansion and simplification, manage to outperform the max-log-SP algorithm over the original graph.Optimum values for scaling factors (δ, ψ) were found to be (0.75, 0.25) for F 2 3 -QSPA and F 2 2 -QSPA, (0, 0.3) for F 2 3 -min-max, and (0, 0.4) for F 2 2 -min-max.Fig. 4 illustrates the FER performance of proposed schemes with a rate 0.861 code over F 2 4 , of 1000 symbols in length (C 2 ).The C 2 was generated by re-labeling a regular binary graph of column weight 3, constructed with the progressive edge growth algorithm [22].For this code, we consider expansions over F 2 2 and F 2 .Expansion over F 2 is of special interest, since it results in a binary graph.When using this binary graph, we replace QSPA and min-max decoding with SPA and its well-known simplification, min-sum algorithm (MSA).Unique features and advantages offered by the binary expansion have been explored separately in [17].
Fig. 4 shows that the performance losses of the proposed schemes are quite small in this case as well.Gap between using QSPA on the original graph and its F 2 2 expansion is less than 0.3dB at a FER of 10 −4 .Loss of replacing QSPA by its simplification min-max decoding is about 0.1dB for both original and expanded graphs.With C 2 , max-log-SP algorithm seems to perform a bit better than with C 1 .Here, its performance is very similar to that of using minmax algorithm on F 2 2 expansion, with a gap of close to 0.4dB with F 2 4 -QSPA, at a FER of 10 −4 .When compared with QSPA on the original graph, using SPA on the binary graph results in a 0.5dB loss in performance.Simplifying SPA to MSA only loses a further 0.05dB.Although a 0.5dB loss seems significant, as explored in [17], decoding on a binary expansion provides unique advantages in decoding complexity and hardware implementations.Optimum values for scaling factors (δ, ψ) here were (0.5, 0.25) for F 2 2 -QSPA, SPA, and MSA, and (0, 0.3) for F 2 2 -min-max.
Simulation results show that decoding algorithms implemented on proposed graph expansions are capable of performing quite close to those using the original graph.For any algorithm, performance gap of decoding on the expanded graph and using the original widens when the size of the field used for the expansion decreases.With a few different graph expansions possible, many decoding options become available for any given code.As discussed in the next Section, all these decoding schemes provide attractive complexity gains, with different levels of performance-complexity trade-offs.

VI. DECODING COMPLEXITY
In the following, we analyze the complexities of some decoding schemes on expanded graphs.We consider implementing the two popular versions of QSPA, LLR-QSPA [5] and FFT-QSPA [4], and also min-max decoding [7] on proposed expansions and compare them in terms of complexity with the same algorithms implemented on the original graph.Since NB-LDPC codes are most often defined over finite fields of characteristic 2 [3], a code over F 2 r , where r has a factor m, is used in the complexity analysis.Complexities of the two major steps in iterative decoding, check node operations and variable node operations, are compared separately.For the comparison, we consider operations at a single node of each type during one iteration.Since the proposed expansions replace each node over F 2 r with E f = 2 r −1 2 m −1 nodes, complexity of all those is the total complexity for the decoding schemes on expanded graphs.As explained in Section III B, these graphs also have the additional feature of local check nodes.Since L f = 2 r −1 2 m −1 − r m such nodes are included per variable node of the original graph, their complexities are included with that of variable nodes.
At hardware level, apart from the number of operations, the type of operation also affects the complexity.It is well-known that operations such as multiplications are more complex than comparisons [19].Therefore, we consider the number of operations of a few different types; comparisons (Comp), additions/subtractions (Add), multiplications/divisions (Mult) and table look-ups (LUT).Note that max * operation in LLR-QSPA can be performed with one comparison, two additions, and one table look-up [5], and that transformation between log and probability domain, required in FFT-QSPA, can be carried out with look-up tables.It has also been assumed that the forward-backward approach [7] is used in check node operations of the three algorithms.Further, cost of permuting probability vectors has been disregarded, since its impact on total complexity is negligible.
Table III Substituting the values for E f , L f and d v yields Note that degrees of regular check nodes in the expanded graphs remain d c .When presenting complexities of decoding schemes on these graphs, we let E f , L f and d v denote the number of new nodes per original node, number of local check nodes, and average variable node degree, respectively.
From Table III, it can be seen that complexity gains of proposed schemes at check node operations depend on the decoding algorithm being used.For both LLR-QSPA and minmax decoding, using an expanded graph instead of the original results in a significant reduction in complexity, while for FFT-QSPA, the gains are modest.In the case of LLR-QSPA, using the original graph requires approximately 3d c × 2 2r comparisons, additions, and table look-ups, which results in an overall complexity of O(2 2r ).However, with the expansion over F 2 m , there are only approximately 3d c ×2 r+m operations of each type, which reduces overall complexity to O(2 r+m ).This is a significant gain, especially in the cases with a large r, and we feel that, as a trade-off, the small performance losses observed in Section V are justifiable.Using an expanded graph can reduce the complexity order from O(2 2r ) to O(2 r+m ) in check node operations of min-max decoding as well.It should be noted that although they are of the same complexity order, min-max decoding is simpler than LLR-QSPA, since only comparisons are required.Gains of the proposed scheme reduce in the case of FFT-QSPA.Here, the number of multiplications and table look-ups required are almost the same (approximately 2d c × 2 r ) when using the original graph or an expanded one.There is a slight reduction in the number of additions though, from approximately 2d c × 2 r r to 2d c × 2 r m.Thus, the overall complexity of FFT-QSPA on an expanded graph is O(2 r m), slightly lower than O(2 r r) on the original graph.
When considering variable node operations of decoding schemes on expanded graphs, we include the complexity of the L f local check nodes added for each original variable node.Note that complexity of one such node can be derived by substituting d l = 3 as the node degree, and 2 m as the field size, in the expressions for the respective algorithm in Table III.Due to this additional cost, complexity at variable nodes are higher in proposed schemes.However, this complexity increase is not sufficiently high to completely offset the gain obtained at check node operations, especially for LLR-QSPA and min-max decoding.As Table IV shows, complexity orders of these algorithms change from O(2 r ) on the original graph to O(2 r+m ) on an expanded one, while in Table III, this change is from O(2 2r ) to O(2 r+m ) at check node operations.Hence, the overall gain is still significant for LLR-QSPA and minmax decoding, especially for larger values of r.In the case of FFT-QSPA, the complexity increase is comparatively smaller, from O(2 r ) to O(2 r m).Since its gain at check nodes was also quite modest, the overall complexity gain would be minimal.
Tables III and IV demonstrate that decoding on expanded graphs is advantageous in terms of asymptotic complexity, while the actual performance gains would depend on parameters of the code used, such as field sizes, code length, rate, and average node degrees.In Table V, we consider complexities of some decoding schemes used in Section V with C 1 , a code over F 2 6 with the codeword length 1998 and code rate 0.89.In this case, the original graph is over F 2 6 , and expansions over F 2 3 and F 2 2 are used for decoding.Table V presents complexities of using LLR-QSPA, FFT-QSPA, and min-max decoding on all three graphs, in terms of number of operations of each type per iteration.For decoding schemes over expansions, we also present the number of operations required as a percentage of the requirement when using the same algorithm with the original graph.
In Table V, we observe that using LLR-QSPA on expanded graphs offers exceptional complexity gains for C 1 .Less than 20% of the operations for the original graph are required when using the F 2 3 expansion.This reduces further with the F 2 2 expansion, to less than 10%.These gains correspond to speedups of more than 5 times in the F 2 3 case, and more than 10 times in the F 2 2 case.Considering that the performance losses, as shown in Section V, are only 0.2dB and 0.3dB, the complexity gains are very attractive.With FFT-QSPA though, using expansions are not particularly advantageous.Only gain of F 2 3 expansion, when compared with using the algorithm on the original F 2 6 graph, is in the number of comparisons required.Both decoding setups use a similar number of additions, while the setup on the expanded graph needs significantly more multiplications and table look-ups.This is due to the operations of local check nodes, which are absent in the original graph.With F 2 2 expansion, the number of comparisons reduces further, and the number of additions used is also slightly lesser than that of the F 2 6 case.Since F 2 2 expansion has more local check nodes than the F 2 3 one, the number of multiplications and table lookups have increased significantly.Thus, for C 1 , using FFT-QSPA with any of the two expansions is more complex than implementing on the original graph.The case of min-max decoding is very similar to that of LLR-QSPA; complexity gains are significant, and they are higher when the size of the field used is smaller.Due to local check node operations, the number of additions in proposed schemes is higher than in the original algorithm.Nevertheless, since the reduction in the number of comparisons is much higher in magnitude, min-max decoding on expanded graphs is significantly less complex.
Majority of existing algorithms are of complexity order O(2 2r ) for a code over F 2 r , and implementing those algorithms on graph expansions results in significant complexity gains with minimal performance losses.For algorithms whose complexity order is not polynomial in field size, such as FFT-QSPA, the new strategy may not be advantageous.But as [19] pointed out, out of the two variants of QSPA, LLR-QSPA is more suitable for hardware implementations, due to better numerical stability of LLR domain operations.Therefore, the strategy proposed in this paper could be applied to reduce decoding complexity in most practical applications that adopt NB-LDPC codes.In particular, our proposed strategy enables to decode a code defined over a large field using a graph over a much smaller field, while providing a good performance and complexity tradeoff, leading to a practical solution to decoding NB-LDPC codes.

VII. CONCLUSIONS
In this paper, we proposed a new method to expand a Tanner graph of a NB-LDPC code over F p r into a graph over F p m , where m is a factor of r.Most decoding algorithms proposed for NB-LDPC codes can be adapted to use these expanded graphs with simple modifications.This offers a number of different decoding options for any given code, with a different performance-complexity trade-off.Simulation results show that, in general, decoding on expanded graphs provide significant complexity gains, while performance losses are minimal.It may be interesting to note that the proposed expansion could be useful in other applications beyond decoding NB-LDPC codes.
are all length p m vectors of log or LLR values.N r i and N l i are, respectively, sets of regular and local check nodes in the neighborhood of node i.

TABLE II :
Alternate Representations of Symbols in F 24 lists complexities of check node operations in each decoding setup, while Table IV considers variable node operations.Average degrees of a check node and a variable node in the original graph are denoted with d c and d v , while d l denotes the average degree of a local check node.As discussed in Section III B, local PCM is formed with Hamming codewords, and therefore it should always be possible to set d l = 3. Due to these new check nodes, average variable node degree would slightly increase in the expanded graphs, and we denote this new value with d v , given by

TABLE III :
Check Node Complexity

TABLE IV :
Variable Node Complexity

TABLE V :
Number of Operations per Iteration with C 1