Active device detection and performance analysis of massive non-orthogonal transmissions in cellular Internet of Things

This paper investigates multiple access schemes for uplink and downlink transmissions in cellular networks with massive Internet of Things (IoT) devices. Recall that single-carrier frequency division multiple access and orthogonal frequency division multiple access, which are orthogonal multiple access (OMA) schemes, have been conventionally adopted for uplink and downlink transmissions in narrow-band IoT, respectively. Unlike these OMA schemes, we propose two non-orthogonal multiple access (NOMA) schemes for cellular IoT with short-packet transmissions. Especially, a generalized expectation consistent signal recovery-based algorithm is proposed to estimate active devices, channel state information and data in uplink transmission, where all of the active devices are allowed to transmit their pilots and data through the same resource block without authorization. On the other hand, the active devices estimated during uplink transmission are grouped for downlink transmission with a trade-off between performance and detection complexity. Additionally, the data error rates are analysed for both uplink and downlink transmissions with low-resolution analog-to-digital converters (ADCs), where the effects of critical parameters such as the estimation error, ADC bits, packet length, and message bits are revealed. Both simulation and analytical results are provided to demonstrate the excellent performance of the proposed NOMA schemes and algorithms, especially for active device, channel, and data estimations. More importantly, the obtained results show that the data error rate performance of downlink NOMA is superior to that of OMA when the message bits of devices in one group are selected following the proposed strategy.


Introduction
Machine-type communication (mMTC) has been considered a representative service category in 5G networks [1][2][3] because of its wide applications of the Internet of Things (IoT) such as smart city, smart health care, factory automation, and autonomous driving [1,4]. Notably, the number of IoT devices is growing exponentially and will reach hundreds of billions in 2030 [5]. A ket to enhance connection density is to provide device access over a large range. The cellular technique is one of the main access techniques for IoT [5,6]. In narrow-band IoT, single-carrier frequency division multiple access (SC-FDMA) and orthogonal frequency division multiple access (OFDMA) have been adopted in uplink and downlink transmissions, respectively, which are based on the conventional granted orthogonal multiple • Two short-packet non-orthogonal transmission schemes are proposed for uplink and downlink transmissions in cellular IoT. Especially, the potential active devices access to the network without authorisation and the message bits of active devices are modulated as short packets. Low-cost algorithms based on the GEC-SR algorithm are proposed to estimate active devices, channels, and data from uplink receiving ADC quantified signals. While the detected active devices are grouped for downlink transmission based on uplink estimations, where a trade-off between performance and complexity is revealed.
• With the imperfect estimated CSI, the bit error rate (BER) performance of uplink non-orthogonal short-packet transmission is obtained using a GEC-SR based linear minimum mean square error (LMMSE) detector, which shows the impact of error propagation caused by the active device detection on uplink data detection. In addition, the average block error rate (BLER) of downlink NOMA with low-resolution ADCs is derived in approximated closed-form, which quantifies the effect of channel estimation errors on system performance.
• By investigating the impact of message bits of downlink transmission based on the analyzed average BLER for a given block-length, we obtain a trade-off between the reliability and the effectiveness. Particularly, a device pairing strategy based on message bits is proposed to guarantee NOMA performance compared with OMA. Simulation results demonstrate the accuracy of the active device detection, channel estimation, and obtained analytical results. More importantly, the obtained results show that the BLER performance of downlink NOMA is superior to that of OMA when message bits are selected according to the proposed strategy.
The remainder of this paper is organized as follows. Section 2 describes the uplink and downlink shortpacket non-orthogonal transmissions in large scale cellular IoT. A low-complexity active device detection based on GEC-SR and the BER of uplink data with the LMMSE detector are presented in Section 3, whereas the downlink channel estimation and the performance analysis of short-packet transmission are investigated in Section 4. In Section 5, numerical results and simulations are applied to verify the performance of proposed algorithms and developed analysis. Finally, Section 6 concludes the paper.
Notations. The identity matrix, the all-one vector, and the all-zero vector of size M are denoted as I M , 1 M , and 0 M , respectively. The distribution of a circularly symmetric complex (or real) Gaussian random vector x with mean vector m and covariance matrix V is denoted as N c (x; m, V ) (or N (x; m, V )). ⊙ denotes componentwise multiply and ⊘ denotes componentwise divide.

System model
We consider a single-cellular mMTC network, in which one single-antenna central base-station (BS) and N single-antenna IoT devices located in the cellular. A block fading channel model with L symbol durations is considered. To meet the massive connections of devices, different NOMA schemes are designed for uplink and downlink transmission in cellular networks. On one hand, a grant-free NOMA scheme is employed for uplink transmission, where unknown active devices are allowed to access the network without authorization or scheduling. On the other hand, the detected active devices are paired/grouped for downlink transmission due to the limited capacity of the IoT devices. A hybrid NOMA scheme is proposed to serve each group of active users. Specially, the detected active devices are grouped based on uplink estimations. Orthogonal time slots are allocated to all active devices for channel estimation in the downlink training phase. The power domain NOMA scheme is used within each group for information transmission and orthogonal bandwidth resources are employed among different devices groups [31]. The details of theses two NOMA schemes for uplink and downlink of the cellular network are shown as Figure 1 and described in the following two subsections, respectively.

Uplink NOMA scheme
In uplink grant-free NOMA scheme, as shown in Figure 2, the transmission occurs in two phases and it shows the time-domain structure of the received signal from multiple asynchronous devices. This asynchronous communication may occur since the signals from different users arrive at the receiver asynchronously. Without loss of generality, it is assumed that the signal transmitted by device 1 arrives at the BS prior to that transmitted by devicen (n = 2, . . . , N ) by a time offset ∆n, ∆n > 0. Note that only a small fraction of potential devices are active and send their small packets sporadically in IoT [18]. In order to mitigate the asynchronous effect, a guard space is used to prevent the interference between pilot and data symbols [23]. For example, the cyclic prefix (CP) can be utilized as the protection prefix  if the non-orthogonal transmission is implemented on an orthogonal OFDMA tones [32]. We first detect active devices and estimate their CSI in the training phase. The BERs of active devices' data packets are then obtained in the uplink data transmission phase.

Uplink channel training phase
The BS assigns a pilot sequence a n = (a n,1 , . . . , a n,L p ) T ∈ C L p to device n (n = 1, 2, . . . , N ) in advance, where L p < N and a n,l is independently chosen from {−1, 1} with equal probability. It is assumed that only a small portion of devices are active, and we define the device activity indicator as ̟ n = 1, device n is active, for n ∈ N {1, 2, . . . , N }. Each device decides whether or not to access the channel with probability ǫ in an independent manner [18]. Then, ̟ n can be modeled as a Bernoulli random variable such that Pr(̟ n = 1) = ǫ, Pr(̟ n = 0) = 1 − ǫ, ∀n. Let T p , T d , and T g represent the pilot transmission duration, data transmission duration, and the guard interval duration, respectively. The receiver removes the data of the guard position, and selects the remaining L p as the received signals in the training phase. So there are two types of observation windows. As shown in Figure 2, the part intercepted by the receiver in type I observation window can not completely contain the pilot sequences of active devices. Then the intersymbol interference (ISI) will be introduced. Instead, type II completely contains the pilot sequences of active devices and will not effected by data symbols. Therefore, the ISI can be perfectly eliminated if the length of guard position is long enough for asynchronous time offset. The received signal at the BS for active device and channel estimations is where y u,p = (y u,p 1 , . . . , y u,p L p ) T ∈ C L p , w u,p = (w u,p 1 , . . . , w u,p L p ) ∈ C L p with w u,p l (l = 1, 2, . . . , L p ), is the independent AWGN following zero mean and σ 2 p variance complex Gaussian distribution 1) , A [a 1 , . . . , a N ] ∈ C L p ×N with |a n | 2 = 1 is the collection of pilot sequences of all devices, h u n is the channel between device n and the BS, and x p = (x p 1 , . . . , x p N ) T ∈ C N with x p n ̟ n h u n . We define h u n as h u n = g u n √ 1+d ζ n , where g u n ∼ N c (0, 1), d n is the distance and ζ denotes the path loss factor. Then we have h u n ∼ N c (0, λ n ) with λ n = 1 1+d ζ n .
1) The variances of zero mean AWGNs of pilot and data transmission phases in this paper are assumed to be σ 2 p and σ 2 0 , respectively.

Uplink data non-orthogonal transmission phase
Device n's uplink message bits are modulated as x u n chosen from the M -order constellation A when it is active; otherwise, we use x u n = 0 to represent that the device n is inactive. With spreading sequence, s n = (s n,1 , . . . , s n,L u ) T , s l u ∼ N c (0, 1/L u ), L u = L − L p , the received uplink data at the BS is is an AWGN vector in uplink data transmission phase. In order to reduce the cost, the received signal in (2) is quantized by a uniform complex-valued ADC quantization Φ c [33] with B bits and step size ∆ τ , therefore the quantized signal is where z = Ax p . The B-bit uniform ADC quantizer with 2 B bins is characterized by a set of 2 B − 1 thresholds ∐ := [τ 1 ,τ 2 , . . . , Similarly, the quantized output signal of (3) is

Downlink NOMA scheme
For downlink transmission, we propose a hybrid NOMA scheme to serve the active devices detected in the uplink, where the active devices are grouped due to the limited capacity of the IoT devices as well as the trade-off between the performance and the complexity. Here, we first focus on two-device case and the results can be easily extended to a general case in Subsection 4.2. Different from the uplink NOMA scheme, a hybrid NOMA scheme is considered in downlink transmission [31]. Especially, the power domain NOMA scheme is used within each group for data transmission and orthogonal bandwidth resources are employed among different groups. Besides, orthogonal time slots are allocated to the active devices of one group for downlink channel estimations before non-orthogonal data transmission.

Downlink channel training phase
In the downlink NOMA scheme, each active device i (i = u, v) in one group should estimate its channel response before signal detection. During the channel estimation phase, the BS sends a special pilot sequence 2) , φ φ φ i ∈ C L q (2L q < L), to device i. The received training signal at the i-th device is where w d,p i ∈ C L q denotes the AWGN vector at the i-th device during the downlink channel estimation phase. y d,p i ∈ C L q , and h d i is the channel between device i and the BS. Due to the ADC at receiver, the quantized y d,p

Downlink data transmission phase
The message bits B i of device i are encoded as a unit-power codeword, For the fairness and the superposition coding with successive interference cancellation (SIC) in downlink NOMA, we assume that the transmit powers at the BS are sorted as p v p u based on the uplink estimated channel gain 3) Then the superposition codeword of one pair of devices at the BS is The received signal at the i-th user is given by where w d i denotes the AWGN at the i-th user. Then the output signal of ADC is At device u, the signal of device v, x d v , is always treated as interference. Then the received signalto-interference-and-noise ratio (SINR) for decoding its own signal is γ u→u . The instantaneous BLER of device u is approximated as where 2 dt. In contrast, SIC is performed at device v. In particular, device v first decodes x d u by treating x d v as interference. The received SINRs for decoding x d u and x d v at device v are γ v→u and γ v→v , respectively. Similar to (10), we have the instantaneous BLERs E v→u and E v→v . Then the instantaneous BLER of device v is approximated as where step (a) holds for the case that the signal of device u can be decoded successfully with high probability. Moreover, we will design a coding strategy for each device pair to guarantee transmission reliability later.

Detection and performance analysis for uplink transmission
In this section, we propose GEC-SR based detection methods for active device detection, channel estimation, and data detection in the uplink transmission phase. In particular, the updated messages of the proposed iteration algorithms are approximated by the complex Gaussian distribution with the projection operations and a joint active device detection and channel estimation method is presented. With the estimated channel, a signal decoding method of active device is then proposed for the uplink data non-orthogonal transmission.

Active device detection and channel estimation
Note that the MMSE estimator of x p in (4) is given by [34] x p = E x p |y u,p 3) For average power allocation, we allocate the transmit powers based on E{|ĥ  Figure 3 (Color online) The factor graph, where the circle refers to variable node while the square denotes factor node. In addition, the message update rules are shown in [20].
where the expectation is taken over the posterior distribution p(x p |y u,p Φc ) denoted as The posterior probability in (13) is computationally intractable due to the discrete nature of the active pattern. Thus, we aim at constructing a multi-variate Gaussian approximation of p(x p |y u,p Φc ) and finding the corresponding mean and variance that are close to that of p(x p |y u,p Φc ) in iterative fashion. As shown in Figure 3, we initialize the forward messages, , at the back direction (red line) are updated firstly for the t-th iteration due to the observed signals are involved and the messages in the forward direction (cyan line) are then updated. Note that the passing message in each factor node is approximated by the projection operation [20], which is defined as follows: where D KL is Kullback-Liebler (KL)-divergence. Ω(x) is a Gaussian family distribution and With the factor graph shown in Figure 3 and the message update rules found in [20], a joint active device detection and channel estimation is presented in Algorithm 1, and we provide some intuition to understand the algorithm. For the second layer in the back direction, the projection of p(y u,p Φc |z) is calculated, and the mean vector and variance matrix are expressed as (A1) and (A2), where Then the extrinsic information of z is calculated by (A3) and (A4). For the first and the second layers in the back direction, the passing mean vector and variance matrix can be obtained by (A5)-(A11). On the other hand, the projection of the prior probability of x p , i.e., p(x p ), of the first layer in the forward direction is evaluated, and the mean vector and variance matrix are expressed as (A12) and (A13), where The extrinsic information of x p is calculated by (A14)-(A17). The first and the second layers in the forward direction, the passing mean vector, and variance matrix are obtained by (A18) and (A19).
Proof. See Appendix A. Remark 1. When the bit of ADCs is infinite, we have p(y u,p Φc |z) = N c (z; y u,p Φc , σ 2 p I) and v t+1,− Proof. See Appendix B. Therefore, the estimated signal in (13) isx p = m Tmax,+ x p with the variance matrix v Tmax,+ x p . Then the log-likelihood ratio (LLR) test for active pattern estimation is given by LLR (x p n ) = log px p n |̟n (x p n |̟ n = 1) px p n |̟n (x p n |̟ n = 0) .
With the estimatedx p , the conditional probability ofx p n on ̟ n is given by where v n is the n-th component of v Tmax,+ x p . Then the LLR(x p n ) in (24) can be calculated as Then, we have where the threshold is Therefore, the device activity indicator function of device n is obtained aŝ and the estimated CSI of device n isx p n if̟ n = 1.

Signal detection for uplink data transmission
DefineN = {n|̟ n = 1, n = 1, 2, . . . , N },ĥ n =x p n , ∀n ∈N , the received signal of uplink transmission data in (3) based on detected active devices can be written as whereĥ û N = (ĥ u 1 , . . . ,ĥ u |N| ) T ,Ŝ = [s 1 , . . . , s |N | ] ∈ C L u ×N , and x u = (x u 1 , . . . , x û N ) T . If the linear MMSE (LMMSE) 4) is employed at the BS, the detected data symbols are [8] x whereLN =ŜDiag(ĥ û N ), Φ A is the quantization operator on the constellation A. An LMMSE detectorbased GEC-SR is proposed to detect the data packet, which is summarized in Algorithm 2. Note that the error propagation caused by the active device detection is considered in this symbol detector.
Algorithm 2 GEC-SR-based detection for uplink data transmission 1: Input: The pilot sequences {sn}, the quantized signalỹ, the activity probability ǫ, the channel variances {λn}, the constellation A.

Design and performance analysis for downlink transmission
In this section, we focus on the design and analysis of the downlink hybrid NOMA scheme. Specially, the downlink channel information is first estimated by the devices with low-resolution ADCs. Then, we drive the BLER of short-packet transmission for one pair of devices performed NOMA. In order to obtain insight, we further approximate the analytical BLERs. Moreover, we reveal the impact of message bits with a fixed block-length, which can be used to guide the devices pairing/grouping. Finally, the analytical results of two-device case can be extended to the general case that consists of more than two devices in one group.

Downlink channel estimation
With the aid of additive quantization noise model (AQNM) [35][36][37], the quantized vector y d,p i,Φc is decomposed as where w d,p q denotes the additive Gaussian quantization noise vector which is uncorrelated with y d,p i , denoting the distortion factor of the low-resolution ADC [37]. When the ADC resolution B is large (B 5), the distortion factor ς can be approximated as [38] and its distribution isĥ In addition, the estimation error is e i = h d i −ĥ d i , which is independent withĥ d i , and the correlation coefficient is σ 2 ei = E{e i e H i } = σ 2 p + ς α (λ i + σ 2 p ), and the relationship betweenĥ d i and h d i iŝ Proof. See Appendix C. The transmit SNR in the downlink channel training phase is defined as ̺ . As shown in Figure 4 with λ i = 0.5, the distribution of estimated CSI matches well with the simulation. Remark 2. Lemma 1 reveals the impact of the parameters on downlink channel estimation. From (75) in the proof of Lemma 1, one can see that the variance of estimation error is decided by the quantify precision of ADC and the noise power with a given length of the pilot sequence. Therefore, the channel estimation accuracy can be controlled by adjusting the parameters according to the relationship shown in Lemma 1.

Average BLER analysis
From (32) and (36), the output signal of ADC in (9) is approximated as where w d q follows Gaussian distribution with zero mean and variance σ 2 Note that ∆ ei,w is also a Gaussian distribution with zero mean and variance σ 2 ∆ei,w = α 2 σ 2 ei (p u + p v ) + α 2 σ 2 0 + σ 2 q2 . Then the SINRs and SNR are where f γi→u (x) is the PDF of γ i→u . Then we derive the average BLER of NOMA with imperfect CSI and finite ADC bits in Theorem 1. Theorem 1. The average BLERs of device u and device v in NOMA with imperfect CSI are expressed as (40) and (41), respectively, shown at the top of the next page, where α u, Similarly, the individual average BLER of two-device OMA with FBC is approximated aŝ The developed approximated closed-form expressions of average BLER in Theorem 1 can be easily used to evaluate the performance of NOMA with FBC.

Discussion on device grouping
From (10), we have the maximal achievable rate Note that Q −1 (0.5) = 0, which means that if E = 0.5 the maximal achievable rate Bj L d is equal to the capacity C(γ j ). In order to reveal the impact of devices grouping, we define the outage probability as P out j = Pr(C(γ j ) <R j ), wherê R j = Bj L d is target rate. We assume that there are J devices in one group performed NOMA with power allocation coefficients α 1 > · · · > α J , J j=1 α j = 1. Then the outage probability of device j (1 j < J) is expressed as where r j = 2R j − 1. Note that the outage probability in (43) is always one when α j − r j i,i>j α i 0. Therefore, the power allocation coefficients and the target rate of device j should satisfy the following condition: By defining ϕ = max{ r1 α1−r1 J i=2 αi , . . . , rj αj −rj i,i>j αi }, the outage probability in (43) can be calculated by Similarly, the outage probability of device j in OMA is expressed as Based on (45) and (46), the condition on the superiority of NOMA to OMA for 1 j J − 1 is Similarly, the condition on the superiority of NOMA to OMA for device J can be expressed as  From the analytical results in (44), (47) and (48), we can make the following conclusion: (1) With the increase of the number of devices, to guarantee the reliability of NOMA, the target rate of each device in the group should be smaller.
(2) The power allocation and the target rates should satisfy the conditions in (47) and (48).
(3) The power allocation and the target rate selection are very complicated for the case that more than two devices in one group.
For example, we consider the case that J = 3, then the power allocation coefficients and the target rates should satisfy the following conditions.

Simulations and numerical results
In this section, we provide numerical examples to verify the proposed algorithms and the analytical results for both uplink and downlink NOMA schemes with short-packet transmissions.

Uplink short-packet transmission
The GAMP [20,39] is shown as the benchmark for the performance comparison for the uplink simulations due to the quantized system with ADC. The performance comparisons between the GAMP and GEC-SR are shown in Figures 6(a)   squared error (NMSE) is defined as the MSE between the estimatedx p and actual x p . In Figure 6(a), we can observe that the NMSE converges for both GEC-SR and GAMP with different ADC bits at 25 dB. Moreover, compared to the GAMP algorithm, the convergency value NMSE is much smaller for the GEC-SR based algorithm. It is also observed that the performance of GAMP and GEC-SR with 5 ADC bits is close to the case that with infinite ADC bits. In Figure 6(b), the active device detection error probabilities for GAMP and GEC-SR are shown as the functions of SNR, respectively. One can observe that the active device detection error probability of GEC-SR is smaller than that of GAMP from Figure 6(b). Figure 7 shows the NMSE performance of GEC-SR as a function of the length of the pilot sequence by using GAMP as a benchmark algorithm.
It can be seen that GEC-SR has a better performance than GAMP. The NMSE decreases as the length of the pilot sequence increases and it will be flat when the length of the pilot sequence is large enough.
It can be also seen that the shorter length of the pilot sequence can satisfy the same NMSE requirements of different device active probabilities.
The impact of error propagation caused by active detection estimation errors on the BER performance of uplink data is shown in Figure 8, where the BER is defined as where L num denotes the number of data detection error bit and d num is the active device detection error. And d num = 0 for the case that the active device detection is perfect. As shown in Figure 8, the BER gap between imperfect active device detection and perfect device detection is increased when the active probability is large due to the error propagation of active device detection.

Downlink short-packet transmission
The NMSE of downlink channel estimation is shown in Figure 9, where includes LS and LMMS estimations at 20 dB. It shows that the NMSE performance of LS is better than that of LMMSE. One can observe that the gap between LS and LMMSE becomes larger when the parameter λ i is smaller. It is worthy to note that the NMSE approximates 10 −4 when the length of the pilot sequence is 60. Thus, we can control the variance, σ 2 ei , of estimation error by adjusting the parameters including the length of the pilot sequence, SNR and ADC bits according to the results in Lemma 1.
In Figure 10, we set ̺ = 15 dB, λ i = 1, L q = 50, B = 5 bits for channel estimations and α u = 0.7, α v = 0.3, L d = 168 for data transmission. The ADCs at the training and data phases have the same parameters.
In Figure 10(a), the analytical results of average BLERs in (40) and (41) are presented for downlink NOMA with FBC, where NOMA with imperfect CSI and perfect CSI is considered. The two curves show that the analytical results match well with the computer simulation results for the whole range of the SNR, which confirms the accuracy of the derived expressions. Furthermore, it can be observed from Figure 10(a) that the average BLER of NOMA with imperfect CSI can result in an error floor at the high SNR region.
In Figures 10(b) and (c), the average BLERs of devices in one group are presented as a function of message bits. As shown in Figure 10(b), the average BLER of device u in one group increases as the message bits increase with a fixed block-length. It is important to note that the average BLER performance of device u in NOMA is superior to that in OMA before the threshold. While the average BLER of device u in NOMA becomes worse when the message bits are greater than the threshold. More importantly, the average BLER of device u in NOMA is always one if the message bits are sufficiently large. On the other hand, the impact of message bits on the average BLER of devices v with a fixed block-length is shown in Figure 10(c). It can be observed that the average BLER performance of device v is better than that in OMA if the message bits are greater than the threshold for the case in which device v selects proper message bits. Otherwise, the OMA transmission scheme should be selected due to its excellent performance. Moreover, the performance of device v in NOMA is always inferior to that in OMA if device u selects its message bits irrelevantly. This is because the error propagation becomes more severe. Finally, Figure 10

Conclusion
In this paper, we have proposed two NOMA schemes for uplink and downlink transmissions in cellular Internet of Things with short-packet transmission. Particularly, a low-complexity algorithm based on GEC-SR has been proposed to detect active devices and estimate their CSI for uplink grant-free NOMA. Furthermore, we have obtained the BER of uplink data transmission with imperfect estimated CSI and discussed the impact of error propagation caused by the device detection. On the other hand, a hybrid NOMA scheme is proposed for downlink transmission. The CSI is estimated and the BLERs of each pair of active devices with finite block-length coding are derived in closed-form. With the analytical results, a message bit selection strategy in each pair of devices has been proposed to ensure better NOMA performance than OMA, and an extended strategy for devices grouping has been proposed. Finally, we presented simulation results to demonstrate the accuracy of the proposed algorithms and the obtained analytical results. More importantly, the obtained results show that the performance of NOMA is superior to OMA when the message bits are selected according to the proposed strategy, which can be used to guide devices grouping in NOMA.