Efficient NOMA Design without Channel Phase Information using Amplitude-Coherent Detection

This paper presents the design and bit error rate (BER) analysis of a phase-independent non-orthogonal multiple access (NOMA) system. The proposed NOMA system can utilize amplitude-coherent detection (ACD) which requires only the channel amplitude for equalization purposes. In what follows, three different designs for realizing the detection of the proposed NOMA are investigated. One is based on the maximum likelihood (ML) principle, while the other two are based on successive interference cancellation (SIC). Closed-form expressions for the BER of all detectors are derived and compared with the BER of the coherent ML detector. The obtained results, which are corroborated by simulations, demonstrate that, in most scenarios, the BER is dominated by multiuser interference rather than the absence of the channel phase information. Consequently, the BER using ML and ACD are comparable for various cases of interest. The paper also shows that the SIC detector is just an alternative approach to realize the ML detector, and hence, both detectors provide the same BER performance. A. Al-Dweik and Y. Iraqi are with the Center for Cyber Physical Systems, Khalifa University, Abu Dhabi, UAE. (E-mail: {arafat.dweik, youssef.iraqi}@ku.ac.ae). A. Al-Dweik is also with the Department of Electrical and Computer Engineering, Western University, London, ON, Canada. (E-mail: dweik@fulbrightmail.org) K.-H Park and M.-S. Alouini are with King Abdullah University of Science and Technology (KAUST), Thuwal, Makkah Province, Kingdom of Saudi Arabia, (E-mail: {kihong.park, slim.alouini}@kaust.edu.sa) M. A. Al-Jarrah and E. Alsusa are with the School of Electrical and Electronic Engineering, University of Manchester, Manchester M13 9PL, U.K. (Email: {mohammad.al-jarrah, E.alsusa}@manchester.ac.uk) IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. XX, NO. Y, APRIL 2020, R0− 09 2


I. INTRODUCTION
The increasing number of mobile users and applications along with the development of the internet of things (IoT) pose challenging requirements for current and future wireless communications systems. Such challenges include high spectral efficiency, very low latency, massive device connectivity, high data rate, and long battery life [1]. Consequently, extensive research has been recently conducted to develop technologies that can fulfill such requirements within the limited available resources. One of the promising radio access techniques for next-generation wireless communications is the non-orthogonal multiple access (NOMA) technique, which has the potential to provide tangible performance enhancement for future communications systems [2], [3]. In fourth-generation (4G) communication systems, orthogonal multiple access (OMA) schemes were employed such as orthogonal frequency division multiple access (OFDMA) [4] and single carrier-frequency division multiple access (SC-FDMA) [5]. Although OMA works well in small communication environments and performs inter-user interference (IUI) elimination through utilizing orthogonal resource allocation at very low complexity, it does not support large scale networks due to the limited spectral resources [6]. On the other hand, NOMA enhancements of the multiuser communication systems include improved spectral efficiency and latency reduction [2]. Nevertheless, the IUI introduced by the non-orthogonal multiplexing is considered challenging [7].
Generally speaking, there are two main classes of NOMA, which are power-domain [8], [9], and code-domain [10], [11]. In the downlink power-domain NOMA, which is the main focus of this paper, multiple users are multiplexed at the transmitter side such that data symbols from different users are superimposed after being allocated different power levels according to their channel conditions [12], [13]. At the receiver side, the signals are separated and detected using coherent multiuser detection (MUD) algorithms, such as successive interference cancellation (SIC) [14]. Although coherent detection (CD) provides low bit error rate (BER), it requires accurate knowledge of the channel state information (CSI) at the receiver side. Channel estimation has received extensive attention in the literature as reported in [15] and the references listed therein, where it is shown that reliable channel estimation requires highly complex signal processing, or, utilization of pilot symbols which degrades the spectral efficiency, particularly when multiple-input multiple-out (MIMO) systems are considered [16].
Non-coherent detection (NCD) on the other hand does not require CSI, and hence, it can be attractive for systems where low receiver complexity is paramount, or when phase estimation and tracking is exigent, as in the case of systems with significant phase noise [17]. Nevertheless, to the best of the authors' knowledge, NCD has rarely been considered for NOMA, except for the case of differential phase shift keying (PSK) for NOMA-based massive MIMO [16]. The main rationale for avoiding NCD with NOMA is that CSI is essential for interference cancellation at the SIC receiver.
Moreover, non-coherent frequency shift keying (FSK) has low spectral efficiency, and non-coherent amplitude shift keying (ASK) has inferior BER performance in fading channels. Partially-coherent detection (PCD) is another approach that does not require channel-phase information, instead, it requires knowledge of CSI statistics. Therefore, Yang et al., [18] used PCD to overcome the performance deficiencies of NCD-based NOMA. Nevertheless, the results reported in [18] show that PCD performance is significantly inferior to CD, and hence, it is not capable of providing reliable BERs. It is also worth noting that power allocation in NOMA systems where only the CSI statistics are available at the transmitter has been considered in [19]- [21]. However, the system in such scenarios is different as full CSI is assumed to be available at the receiver, but the feedback of CSI to the transmitter for power allocation is limited to the statistical features of CSI.
In the recent literature, a new detection scheme, denoted as amplitude coherent detection (ACD) was proposed and it was deemed attractive as it offers the flexibility to trad-off BER with complexity [22]- [24]. More specifically, as ACD requires only the channel amplitude information, and no phase information, if unipolar M-ary ASK (MASK) is adopted as the modulation scheme at the transmitter. Unlike NCD, ACD enables using MASK over fading channels while providing reliable BER, particularly if some diversity or error correction techniques are used [24]. ACD has so far been considered only in OMA scenarios.

A. Motivation and Contribution
As can be noted from the aforementioned discussion, CSI is indispensable in NOMA systems because it is required for the SIC process. Thus, CD is the most appropriate given CSI is available anyway at the receiver. As such, system designers have limited options to choose from when optimizing the design to suit a particular application. For example, in the downlink of Internet of things (IoT), the receiving nodes typically have very limited computational and processing capabilities, therefore, having a low complexity receiver is indispensable. Moreover, NOMA has been recently considered for visible light communications (VLC) [25]- [29], and hence, MASK detection without channel phase information is highly suitable for low complexity VLC systems that adopt intensity modulated (IM) and direct detection (DD) [30]. Motivated by the aforementioned, we propose a NOMA system that does not require channel phase information to detect the users' signal. The proposed design is based on using MASK modulation at the transmitter, and ACD at the receiver. Three different multi-user detectors are designed and their BER analysis is given in closed-form. The obtained analytical and simulation results show that the ACD-NOMA can offer comparable BER performance, yet the power constraint for the users is different from coherent NOMA. Moreover, two SIC detectors are derived to flexibly enable the trade-off between BER and complexity reduction.

B. Paper Organization
The rest of the paper is organized as follows. Sec. II presents the system and channel models, where the transmitted and received signals are described. The section also presents the various ACD receivers design. Sec. III describes and compares the computational complexity of the considered detectors. The BER analysis of the considered detectors is presented in Sec. IV, V, and VI.
Numerical results are presented in Sec. VII, conclusion and future work are presented in Sec. VIII.
Appendix I presents the relation between β 1 and β 2 , and finally, BER analysis of the coherent MASK with NOMA is presented in Appendix II.

C. Notations
In what follows, unless otherwise specified, uppercase boldface and blackboard letters such as H will denote matrices, whereas lowercase boldface letters, such as x, will denote row or column vectors. Symbols with a hat such asx will denote the estimate of x. Moreover, x ∼ N , CN , U, R will denote that random variable x follows the normal, complex normal, uniform or Rayleigh distribution, respectively. Symbols such as S will denote sets with elements {S 0 , S 1 ,...} or

II. SYSTEM AND CHANNEL MODELS
This work considers a downlink data transmission power-domain NOMA system, where a basestation, Evolved Node B (eNB) in LTE terminology, and multiple users' equipment (UEs) are fitted with a single antenna, respectively. Without loss of generality, we assume that the channel attenuation coefficients for N the UEs are given by α 1 < α 2 < · · · < α N , which implies that UE 1 is the farthest from the eNB, or the weakest user. To provide reliable error rate performance for all users and enable using SIC receivers, the power for each user in power-domain NOMA is allocated such that higher power is assigned to the weak user than the strong user [12], [13].
Therefore, given that the power factor for the ith user is denoted as β i , then β 1 > β 2 > · · · > β N .

A. Transmitted Signal Model
For systems with ACD, the transmitted information symbols belong to a unipolar MASK constellation [22]. Therefore, the transmitted symbol of the nth user is given bỹ where M n is the modulation order andδ n is the amplitude spacing between adjacent symbols, which is typically considered to be fixed,s n,kn+1 −s n,kn =δ n . Therefore,s n ∼ U{s n,0 ,s n,1 , ...,s n,Mn }.
Given that the signal average energy of each user is normalized to unity, i.e., 1 MnM n kn=0 E kn = 1, E kn =s 2 n,kn , implies that,δ n = 6 (2M n − 1)M n .
In power-domain NOMA systems, the baseband representation of the transmitted signal can be expressed asX where M = n M n . The values of β n are typically selected such that N n=1 β n = 1 to ensure that the average energy of the transmitted NOMA symbol is normalized to unity, i.e. E[X 2 ] = 1.
However, this approach is valid only for the case where E [s n ] = 0 ∀n, which is not the case for unipolar MASK. Therefore, an additional normalization factor should be derived to guarantee that E[X 2 ] = 1. Towards this goal, we define a new amplitude spacing factor δ n such that which after some straightforward manipulations gives, and hence, the transmitted symbol is defined as s n = δ n × k n , and the transmitted NOMA symbol can be expressed as For N = 2, X k can be written as where k M 2 = k M 2 and k M 2 = k mod M 2 . Because s n has an MASK constellation, the NOMA symbol X will have an MASK constellation too, but with a constellation order M , and, the symbol spacing will not be uniform as in the case of single user. Moreover, the relation between β 1 and β 2 depends on the detector used, hence it will be defined later. The NOMA symbol constellation for N = 2 is shown in Figs. 1 and 2 using M 1 = M 2 = 2 and 4, respectively. For N = 2, X ∈ {X 0 , X 1 , X 2 , X 3 }. As can be noted from the constellation diagram, the leftmost bits belong to UE 1 and rightmost bits belong to UE 2 .

B. Received Signal and Receiver Design
In flat fading channels, the received signal at UE n can be represented as where w n represents the additive white Gaussian noise (AWGN), w n ∼ CN (0, 2σ 2 w ), and h n denotes the SIC-ordered channel fading between the eNB and UE n in which un-ordered channel fading of UEs is independent and identically distributed (i.i.d.) and follows complex normal distribution, e.g., CN (0, 2σ 2 h ).  Then, an amplitude coherent detector (ACDr) can be used to recover the information symbols from (8). However, in the NOMA case, there are multiple detectors that can be used each of which may offer a particular BER, complexity and delay.

1) Conventional ACDr:
A conventional ACDr simply considers the NOMA signal as a singleuser signal, and then applies the optimal [22, eq. 19], suboptimal [22, eq. 24] or heuristic [22, eq. 26] ACD rules. However, due to its low complexity, near-optimal performance, and tractable analysis, we consider the heuristic ACDr, which will be denoted as near-maximum likelihood detector (NMLD) due to its similarity with the MLD. To avoid any overlap between the superposed constellation points, the relation between the power allocation factors for UE 1 and UE 2 should satisfy For example, for the case of N = 2, M 1 = M 2 = 2, then β 1 > β 2 . For the case of N = 2, The NMLD for a NOMA signal can be formulated as where . . , EM }, and ζ n = |r n | 2 /α 2 n [22]. Therefore, for all users, the detector computes the Euclidean distance (ED) between ζ n and all possible M constellation points in the NOMA symbol, and chooses the symbol with the minimum ED. Or equivalently, the detector compares ζ n withM thresholds and chooses the detected symbol such that, where For N = 2, the energy for the kth symbol can be computed as Then, UE n can extract its own symbol using the mappingX → [ŝ 1 , . . . ,ŝ N ]. It is worth noting that all UEs have exactly the same receiver structure when the NMLD is considered. Moreover, all UEs will experience the same detection delay, which is generally small because all EDs can be calculated simultaneously.
2) Conventional SIC-Based ACDr: The C-SIC-Based ACDr (C-SIC) is similar to coherent NOMA SIC detectors where the signal with the maximum power is detected first while considering all other users' signals as unknown additive noise, i.e., UE 1 considers that it can detect its own signal by cancelling the interference caused by UE 1 using the estimated value of s 1 , and considering the signals s n ∀n > 2 as unknown additive noise, and so forth for all remaining users. For UE 1 , ignoring the interference can be realized by assuming that k n = 0 ∀n > 1. It is worth noting that assuming β n = 0 ∀n > 1 could be misleading for UE 1 because As can be noted from Fig. 3a, UE 1 can use a conventional ACDr to detect its own symbol, whereŝ 1 is the estimated data symbol of UE 1 and S 1 = s 1,0 , s 1,1 , ..., s 1,M 1 . Or equivalently, the detector may compare ζ 1 with the following set of thresholds, and λ 0 = 0, λ M 1 = ∞, and perform the decision similar to (11).
However, as depicted in Fig. 3a, the C-SIC is expected to provide reliable BER at high SNRs, because the constellation points are generally far from the thresholds, because β 1 β 2 . However as shown in Fig. 3b, decreasing β 1 slightly introduces severe interference because one or more superposed NOMA symbols become larger than the thresholds. As an example, Fig. 3b shows that E 3 > λ 1 and E 7 > λ 2 . Therefore, although the NMLD can detect both configurations reliably, the C-SIC will not because the interference introduces high error floors. To avoid error floors, the relation between β 1 and β 2 should be set to The derivation is given in Appendix II. Therefore, the C-SIC is expected to have worse BER performance as compared to the NMLD.
To derive the C-SIC for UE 2 , we recall the coherent NOMA where UE 2 first eliminates the interference of UE 1 by computingŕ 2 = r 2 /h 2 − √ β 1ŝ1 , and then it computeŝ where S 2 = s 2,0 , s 2,1 , ..., s 2,M 2 . By substitutingŕ 2 in (17) we obtain By replacing r 2 /h 2 by |r 2 | 2 / |h 2 | 2 ζ 2 , and generalizing the result for UE n we obtain Interestingly, the detector in (19) can be used to detect the symbols for UE 2 to UE N successively, which reduces the complexity as compared to (10), however it is different from the conventional SIC detector used in coherent NOMA because the interference is not effectively cancelled.

C. Improved SIC Detector
As noted for the C-SIC case, ignoring the interference may cause severe BER degradation, even when the appropriate values of β 1 and β 2 are used. Therefore, we propose an improved SIC (I-SIC) detector that considers the interference to reduce BER. Based on the bit mapping of the constellation diagram, it can be noted that UE 1 bits are always fixed over M 1 points. As an example, Fig. 1 shows that case where M 1 = M 2 = 2, therefore, the leftmost bit in the first two constellation point is 0, and it is 1 in the third and fourth constellation points. Therefore, the detector does not have to compute the distance with the M constellation points as in the case of the NMLD. Instead, it is sufficient to compute the distance with the constellation points where the bits of UE 1 change their values, which corresponds to the set E I which has 2M 1 elements, Consequently, the I-SIC detector can be expressed aŝ For other UEs, the detector in (19) can be used since both detectors have generally the same principle.

III. COMPUTATIONAL COMPLEXITY ANALYSIS
Because there are three possible ACDr implementations, it is necessary to evaluate and compare the performance of each detector. In this section, we discuss the complexity of the three detection designs.

A. NMLD
As can be noted from (10), all UEs should search for the symbol that has the minimum ED with respect to ζ n . Therefore, each UE should compute ζ n , and then compute M EDs for each received symbol. The decision variable ζ n should be computed one at each UE, hence, it requires computing two complex multiplications (C M ), and one real division (R D ). By noting that one C M requires four real multiplications (R M ) and three real additions (R A ), then computing ζ n requires eight R M , one R D and six R A . Using the mapping in [31] where one R M is equivalent to four R A and one R D is equivalent to 11R A , then ζ n requires 49 R A . The next step is to . Therefore, the detector needs to compute the constellation points E 0 , E 1 ,..., EM . However, these values can be computed off-line, and should be updated or the modulation order for any user changes. Therefore, β can be considered fixed for a large number of symbols, and hence, its complexity can be ignored. In such scenarios, computing each ED requires one R A and one R M , which are equivalent to 5 R A [31]. The average complexity per user in terms of R A is R G A = 5M + 49. Although the NMLD can be performed by comparing ζ n toM thresholds as described in (11), comparing two multi-bit numbers requires more hardware complexity as compared to five real additions. Therefore, we consider in this work that all detectors are based on ED computation.

B. C-SIC
The complexity of the C-SIC depends on the UE order, where UE 1 has the lowest complexity because it needs to compute the M 1 EDs. Therefore, R A (U E 1 ) = 5M 1 + 49. For other users, they have to perform the SIC, whose complexity can be summarized as follows. For UE n , the receiver should initially compute {ŝ 1 ,ŝ 2 , . . . ,ŝ n−1 }, and then compute its own symbolŝ n . Such process should be performed by computing (19) successively, starting with UE 1 and ending with UE n . By considering that the interference and trial values of s n term in (19), be computed once off-line and stored in a look-up table (LUT), then detectingŝ n | {ŝ 1 ,ŝ 2 , . . . ,ŝ n−1 } requires computing M n EDs. Consequently, UE n needs to compute M 1 + M 2 + · · · + M n EDs.

C. I-SIC
The I-SIC structure is generally similar to the C-SIC except that the number of EDs user UE n should compute is given by, 2 M 1 +M 2 + · · · +M n ∀n < N , and 2 M 1 +M 2 + · · · +MN +M N for n = N . Therefore, the complexity for UE n is given by and the average complexity per user can be computed as It is also worth noting that some additional complexity reduction can be achieved for the C-SIC and I-SIC detectors, by noting that some of the EDs used to obtainŝ i can be used to detect symbol s i+1 as well. For example, the UE 2 in Fig. 4 has to findŝ 1 by computing the ED with reference points E 3 , E 4 , E 7 , E 8 , E 11 and E 12 . Then, given thatŝ 1 = s 1,1 , which corresponds to bits 01, then to findŝ 2 , the detector can use the already computed EDs for E 4 and E 7 . The C-SIC can also use the same approach, but it can benefit only from one ED. Having said that, by noting that the complexity of computing ED is small because it is performed using real values, and the modulation orders are typically small, we consider the complexity for the C-SIC and I-SIC to be as described in (24) and (26), respectively. Table I presents R A,n for the three detectors using N = 3, and M n = 2 and 4. As can be noted from the Table, the complexity for the three detectors is roughly close for M n = 2, but the difference becomes apparent for M n = 4, particularly for the NMLD. For all scenarios, the C-SIC and I-SIC have small complexity difference.
Another aspect that the detectors should be compared for is time delay. While the NMLD X 0 · · ·X 3X4 · · ·X 7X8 · · ·X 11X12 · · ·X 15 X 0 · · · X 3 0 1 2 1 X 4 · · · X 7 1 0 1 2 X 8 · · · X 11 2 1 0 1 X 12 · · · X 15 1 2 1 0 can compute all EDs simultaneously, SIC-based detectors have to perform the detection process sequentially. Consequently, the hardware throughput may decrease significantly. The throughput in this context is defined as the number of clock cycles required to detect one information symbol.
The relative throughput of the NMLD with respect to the SIC for UE n is 1/n.

IV. NMLD BER ANALYSIS
Based on (10), the NMLD compares the decision variable ζ n with all possible symbol energies, and selects the one with the minimum ED. Therefore, the bit error probability for UE n P (n) B can be defined as where Pr X k ,X m = Pr X = X k ,X =X m , Pr(E k ) = 1/M , and H n X k ,X m is the Hamming distance between X k andX m with respect to UE n . As an example, given that M 1 = M 2 = 4 and the leftmost two bits correspond to UE 1 , then H 1 X k ,X m can be computed as shown in Table II.
The conditional probability Pr X m |X k at UE n is given by where the thresholds t m for m ∈ 1, 2, ...,M are defined in (12). By noting that ζ n depends on α n |h n |, then f n (ζ n |E k ) is conditionally non-central Chi-squared with respect to α n and E k . Therefore, where ω = 2σ 2 w . To satisfy the assumption that α 1 < α 2 < · · · < α N , we consider that α n has an ordered statistics [12], [13]. Therefore, according to the order statistics theory, the probability density function (PDF) of α n is given by [32] where Ψ n = N ! (n−1)!(N −n)! , f (α) and F (α) are the Rayleigh PDF and the cumulative distribution function (CDF), which are respectively given by, and where Ω = E [α 2 ] = 2σ 2 h . For N = 2, Ψ 1 = Ψ 2 = 2, and thus 1 Ω (33) Therefore, the conditioning on α n can be eliminated by averaging over the ordered Rayleigh PDFs in (33) and (34) for UE 1 and UE 2 , respectively, which gives, . (35) To simplify the notations, we define the following two indefinite integrals, (37) Consequently, BER for UE n can be expressed as where

V. C-SIC BER ANALYSIS
For the C-SIC, UE 1 assumes that s 2 = 0. Therefore, the constellation points correspond to one of the following energies, Consequently, UE 1 uses the following M 1 + 1 thresholds to detect the received signal, Following the same approach for the NMLD, the BER for UE 1 can be computed as where By substituting (43) in (42), and noting that Pr(E mM 2 ) = 1/M 1 , we obtain For UE 2 , the same energies as in (13) will be used. However, in this case, both sets of thresholds (12) and (41) will be used. The BER for UE 2 can be computed as where Pr(ŝ 1,l ,ŝ 2,q |s 2,v ) Pr(ŝ 1,l ,ŝ 2,q |s 1,m , s 2,v ). Pr(ŝ 1,l ,ŝ 2,q |s 1,m , s 2,v ) =

VI. I-SIC BER ANALYSIS
Unlike the NMLD, it can be noted from (21) that UE 1 is not required to decode the symbols of UE 2 to detect its own signals. Moreover, it is different from the C-SIC because the detection process partially considers the interfering signal of UE 2 . Nevertheless, it is straightforward to show that the detector described in (21) would actually produce the same result as the NMLD, which is due to the special structure of the bit mapping. Therefore, BER of UE 1 can be described as in (44). However, the BER derivation is different from the NMLD as described below.
For UE 1 , P B is defined as where H 1 (s 1,m ,ŝ 1,l ) denotes the Hamming distance between the transmitted and detected bits, and Pr(ŝ 1,l |s 1,m ) Pr(ŝ 1 = s 1,l |s 1 = s 1,m ), is the probability of detecting s 1,l given that symbol s 1,m is transmitted. Following the same approach used for the C-SIC, P (ŝ 1,l |s 1,m ) can be calculated as Substituting (52) into (51) gives As will be demonstrated in Sec. VII, although the two formulas for P (1) B given in (53) and (38) are not identical, they provide the same numerical BER results.
For UE 2 , the detector has the structure given in (19), which corresponds to a SIC detector.
Therefore, the outcome of the detection process for UE 2 depends onŝ 1 . However,ŝ 1 is actually obtained using the NMLD and (19) is also based on the NMLD. Therefore, the I-SIC for UE 2 can be considered as a two-step NMLD. Consequently, the BER performance of UE 2 using the NMLD and I-SIC will be similar, which is derived below.
The analytical and simulated BER of the NMLD/I-SIC for UE 1 and UE 2 are presented in Figs. 5 and 6, respectively. The results are generated for M 1 = M 2 = 2 using several values of β 1 and β 2 . As can be noted from the results, increasing β 1 consistently improves BER of UE 1 because it improves the power of EU 1 and reduces the interference from UE 2 . For UE 2 , increasing β 2 improves the power of UE 2 , but does not necessarily reduce the interference from UE 1 , because decreasing β 1 deteriorates the success rate of the SIC process. Therefore, BER of UE 2 mostly depends on the difference between the used β 2 and the optimum β 2 , denoted as β O 2 , that minimizes the BER. In Fig. 6, β O 2 is about 0.2, and thus, significant BER degradation can be noted by deviating from β 2 = 0.2. As can be noted from both figures, the analytical and simulation results match very well. Moreover, the BER of the NMLD and I-SIC are identical.   respect to UE 1 , increasing β 1 improves the SNR and reduces the interference. For UE 2 , increasing β 2 improves the SNR, but also increases the interference. Therefore, the BER is ordered according to the difference between β 2 and β O 2 , moreover, the BER is more sensitive if β 2 > β O 2 . In Fig. 8, Figs. 9 and 10 compare the BER of the NMLD and coherent detector. As can be noted from Fig. 9, the difference between the two detectors for UE 1 depends on the interference, which is inversely proportional to β 1 . When the interference level is high, the impact of the interference becomes more significant than the channel phase information, and vice versa. Therefore, for high values of β 1 , it can be noted that the interference is small and the BER difference is high. By decreasing β 1 , the interference increases and the BER difference decreases. Such behavior can be justified by noting that the BER is dominated by the interference level, not the channel phase information.  For the BER of UE 2 in Fig. 10, the results follow the same trend observed in Fig. 9. However, the interference depends on the difference between β 2 and β O 2 , and if β 2 < β O 2 or not. Moreover, the BER difference is affected by the fact that the coherent and NMLD have different optimum values of β 2 . To clarify this point, Fig. 11 shows the BER of the NMLD and coherent detector versus the normalized β 1 (β 1 ). for M 1 = M 2 = 4,β 1 = 10(β 1 − 0.9), and for M 1 = M 2 = 2, β 1 = 2(β 1 − 0.5). As can be noted from the figure, the difference forβ 1 < 0.35 (β 1 < 0.935, β 2 = 0.065) is negligible, and it becomes more significant when β 2 approaches β O 2 for both detectors. In the worst case scenario, the difference is about 3 dB. The results generally follow those of the M 1 = M 2 = 2 case, where decreasing β 1 increases the interference level, and thus, the difference between the two detectors vanishes, particularly at high SNRs. At low SNRs, the coherent detector is more robust to AWGN, which is manifested as a slight BER improvement. Nevertheless, the BER for both detectors is relatively high at such SNRs.
It is also worth noting that the effect of interference becomes more significant at higher order modulation, therefore, the BER difference becomes less apparent. For example, the maximum  BER difference is about 0.75 dB at β 1 = 0.99. For the case of UE 2 , the BER difference between the two detectors is about 0.36 dB for all values of β 2 , except for β 2 = 0.09, which becomes roughly nil. Such behavior is due to the fact that higher order modulations are more sensitive to interference, and hence, the effect of the channel phase becomes less pronounced. Fig. 11 shows the BER for the impact of increasing the modulation order from 2 to 4. Fig. 14 presents the BER versus β 2 for the NMLD and C-SIC. The figure shows that UE 1 , in case of the C-SIC, it has a clear disadvantage for high β 2 values, which is due to the fact that UE 1 ignores the interference.
As β 2 → 0, the BERs of UE 1 using both detectors converge to the same values. For UE 2 , the BER using both detectors is almost equal for β 2 0.023. For β 2 > 0.023, UE 1 fails to detect the signal of UE 1 successfully, and hence, the SIC process fails most of the time, which causes the BER of UE 2 to rise sharply. Therefore, the range of β 1 and β 2 for which the C-SIC provides reliable BERs is much less than the NMLD and I-SIC.

VIII. CONCLUSION
This paper presented a NOMA system that overcomes the need for knowledge of the channel phase information by utilizing an amplitude-coherent detector instead of the typically used coherent  detector. Three different detectors were designed and their BER and complexity were analyzed.
One of the derived detectors is based on the ML principle, denoted as NMLD, while the other two are based on SIC and denoted as the C-SIC and I-SIC. The BER analysis, verified by Monte Carlo simulations, shows that the NMLD and I-SIC have the same BER performance, though I-SIC has less complexity. On the other hand, while the C-SIC has the lowest complexity among the three detectors, this comes at the expense of some BER degradation. Moreover, the relation between β 1 and β 2 for the C-SIC is different from the other two. More specifically, β 1 /β 2 for the C-SIC is larger than that for the I-SIC and NMLD. The BER of the proposed detectors also shows that the impact of the channel phase is small in NOMA systems owing to the fact that multiuser interference reduces out-weights any degradation due to missing the channel-phase. As such, the proposed system is a potential alternative for systems where unipolar MASK is used.
APPENDIX I: RELATION BETWEEN β 1 AND β 2 FOR THE C-CIS To avoid severe BER degradation for the C-SIC we should have E mM 2 −1 < λ m , i.e., From (13) By rearranging the terms and noting that the edge constellation point corresponds to the worst case scenario, then m should be replaced by M 1 − 1, which yields APPENDIX II: BER ANALYSIS OF COHERENT DETECTION For CD, the detector can be derived by applying the MLD rule. After phase equalization, the received signal r n in (8) can be written as y n = α n X + w n , k ∈ 0, 1, ...,M .
Therefore, the MLD detector can be formulated aŝ After some straightforward manipulations, it can be shown that the detector can be expressed aŝ X = X k if τ k < a n ≤ τ k+1 where a n = Re (y n ) /α n , The pairwise error probability (PEP) for UE n , Pr n X k ,X i Pr X k −→X i | i =k , can be defined as where F An (τ i |X k ) is the CDF of a n given X k . By noting that a n ∼ N (X k , σ 2 w /α 2 n ), thus, f An (a n |X k ) can be written as f An (a n |X k , α n ) and F An (a n |X k , α n ) = α n √ πω e − α 2 n θ 2 n,k ω da n = 1 2 − Q θ n,k σ w α n .
where θ n,k a n − X k . To compute the CDF for UE 1 , the conditioning over α n should be eliminated, by averaging over the PDF of α n . Thus, For UE 2 , which can be solved as Therefore Finally, substituting (73) in (27) gives P