A Tensor-Based Optical Camera Communication (OCC) System With Joint Data Detection and Video Restoration

The Optical Camera Communication (OCC) technology allows Visible Light Communications (VLC), one of the key technologies for the Sixth Generation (6G) of mobile communications, to be used with image sensors on the receiver side. For Screen-to-Camera (S2C) applications, in which Light-Emitting Diode (LED) screens are used to send information symbols, increasing either (transmitted) image resolution or video frame rate may negatively impact the bit error performance and the visual quality of the OCC-encoded videos. This work proposes a completely new OCC/S2C transmission and reception system, in which it is possible to recover both the transmitted symbols and to restore the video quality at the same time. For this dual-task, a Parallel Factor Analysis (PARAFAC) decomposition is applied to model an OCC system for the ﬁrst time in the literature. Computational simulations corroborate the correct formulation of the proposed models, propositions and algorithms, in addition to demonstrating that increasing the number of frames of the OCC-encoded video can generate diversity gains. This work also includes discussion on the practical aspects of data rate, data volume, and computational complexity of the receiver, accordingly to the system’s parameters.


Index Terms
Optical communication, signal processing, camera, video signal processing

I. INTRODUCTION
Optical Camera Communication (OCC) is a recent technology with a high potential for rapid implementation in society. This technology was standardized in [1] as a substandard of the Visible Light Communication (VLC) technology. What differentiates OCC from VLC is that in the first one, the receiver is not based on discrete photodiodes (PDs), but on Image Sensors (IS), typically found in all modern smartphones [2]. Since there is a dizzying growth of the market penetration of these devices for the following years, the OCC technology is highly attractive and with several applications already foreseen or tested, such as vehicular communications [3], [4], indoor location positioning/localization [5], [6], broadcasting [7], digital signage, among others [8]. OCC, and VLC as a whole, are considered potential technologies of the Sixth Generation (6G) of wireless communications [9], [10].
requirements. For the parallel task of video transmission, one must encode the binary message into the video pixels without impairing the subjective visual quality perceived by the human eye.
Due to the limitation of the camera's frame rate, in the order of less than a hundred frames per second for low-cost smartphones, the concept of Optical Multiple-Input Multiple-Output (MIMO) is usually exploited to enhance data rate. Thanks to the high number of LEDs on a screen and the high number of photodetectors present in an image sensor, ultra-massive MIMO theoretical channel capacity can be achieved with S2C systems [13].
In spite of such promising benefits, the duality in the transmission of symbols and visible image conditions the performance of an S2C system to two distinct but related propagation issues. The first one concerns the so-called interpixel interference (IPI), which can generally be summarized as the case in which the light intensity of a pixel hinders the identification of the intensity of its neighbors. In addition, in general, IPI can be related both to the blooming effect and to classical image degradation phenomena, such as optical blurring or motion blurring.
For the blooming effect, the luminous power of a given pixel is strong enough to illuminate the neighboring captured pixels. The works [15], [16] propose ways to mitigate this issue. In this case, even classic image restoration methods can be used to mitigate the blurring effects, generated either by the low resolution or by the low capture rate of the optical camera.
The second propagation issue is the negative visual impact of embedding information symbols on the pixels of encoded video. Unlike the previous issue, inherently caused by hardware limitations or due to non-idealities of the optical channel, this issue is exclusively related to the OCC coding and modulation.
All these modulations specifically proposed for VLC or OCC systems inherently carry characteristics that affect the perception of light by human vision to a greater or lesser degree. For example, the CSK modulation prevents flickering by transmitting at constant power. In contrast, the UFSOOK modulation [21] cleverly handles the need for a high pulse rate in the transmission (to avoid flickering perception) with an undersampling approach at the receiver (to deal with low capture rates of the commercial cameras). While the accepted threshold of screen flickering is less critical in S2C systems [18], their satisfactory performance inevitably ends up being also defined by the deterioration of the subjective visual quality caused by the symbol-pixel modulation [22]. This work proposes a new S2C/OCC system based on tensor modeling. More precisely, this is the first time the OCC (or S2C) technology has been combined with a tensor modeling. This tensor approach allows the concomitant processes of symbol detection and video restoration on the video captured by the optical camera. In other words, it proposes a system with a better compromise between data estimation performance and video transmission quality.
The use of tensor models have seen considerable growth in the past decades, ranging from applications in wireless communications [23], [24], signal processing and machine learning [25], Big Data analysis [26], [27], , biology and health studies [28], [29], and many other fields.
Perhaps the most significant advantage of using tensors in system modeling lies in relaxing parameter estimation conditions. For example, in wireless communications, tensor modeling of the transmission system usually allows the receiver to estimate symbol and channel matrices from multidimensional signals under less stringent conditions than in purely matrix-based approaches.
Several tensor models have been proposed in the literature. Although the vast majority have been proposed in the last two decades ( [25]), perhaps the best-known decomposition is the PARAFAC decomposition [30]. In the trilinear PARAFAC decomposition, up to three matrix factors can be jointly estimated, whose estimation solution is unique under certain relaxed conditions. DRAFT September 7, 2021 Briefly, the combination of tensor modeling in S2C/OCC systems is responsible for the following contributions of this article: • A complete OCC/S2C communication system based on the PARAFAC decomposition is developed. In this work, information signals modulate each LED/pixel of a digital screen using a Khatri-Rao Space-Time (KRST) coding [31]. More precisely, arbitrary digital video frames, such as from a movie, are encoded in blocks by sequences of symbols to generate a video tensor. The tensor's multidimensionality comes from the original video's pixel resolution, the video frames, and the symbol time-slots. System parameters, as the number of video frames, define the trade-off between Symbol Error Rate (SER), video restoration performance, and receiver's complexity.
• A non-iterative blind receiver named OCC-KRF is proposed for this scheme. Unlike other OCC receivers in the literature, OCC-KRF has a special dual function: it allows at the same time the estimation of information symbols and the image restoration of the captured video.
• Formulas for data rate, generated data volume, and computational cost are derived. Design rules are proposed to assure the essential uniqueness and identifiability conditions of the proposed system. It is also shown that the receiver easily avoids all ambiguities of the PARAFAC decomposition if it knows at least a single (pilot) frame of the original video.
This work is segmented into eight sections. In Section II, the definitions of mathematical notations used in this work are presented, along with a brief review of the PARAFAC tensor decomposition. In Section III, the proposed system is developed. Section IV describes how the OOK, PPM, and PWM modulations can be compatible with the proposed scheme are presented.
In Section V, the OCC-KRF algorithm is introduced, along with data rate, data volume, and receiver complexity calculations. In Section VI, numerical analysis through computer simulations is performed to solidify the contributions' validity. After the conclusion in Section VII, the Appendix brings three design rules in the form of mathematical propositions dedicated to establishing uniqueness and identifiability conditions for the models and systems of this work.

II. NOTATIONS AND REVIEW ON PARAFAC
Scalars, column vectors, matrices, and tensors are denoted by lower-case (x), boldface lowercase (x), boldface capital (X), and calligraphic (X ) letters, respectively. X T , X * and X † are the transpose, the conjugate and the pseudoinverse of X, while X denotes an eventual estimate. The i-th row-vector of X is given by X i· , while and the j-th column of is given by X ·j . Moreover, 1 I×J defines the I × J all-ones matrix, and I J is the identity matrix of J × J size.
For a third-order tensor X ∈ C I×J×K with a PARAFAC [30] its mode-1, mode-2 and mode-3 unfoldings are respectively given by where R is the tensor-rank and the number of columns of A (1) ∈ C I 1 ×R , A (2) ∈ C I 2 ×R , and The powerful essential uniqueness property of the PARAFAC decomposition is summarized in Theorem 1. Notations and definitions used in the following theorems follow those defined in the previous paragraphs of this section.
Theorem 1 (c.f. [30]). Given the model X = A (1) , A (2) , A (3) ; R , then the property of the essential uniqueness of the PARAFAC decomposition means that its matrix factors are unique up to column scaling and permutation ambiguities. In other words, there is a permutation matrix Π and diagonal scaling matrices Λ A , Λ B , and Λ C such that whereÂ Sufficient uniqueness conditions are reviewed in the Theorems 2, 3, and 4.
Then this decomposition is essentially unique, almost surely, if (

III. PROPOSED TENSOR-BASED OCC SYSTEM
In our proposed OCC/S2C system, each pixel of a digital video of resolution J pixels by L pixels is modulated by a string of S symbols. Each symbol is formed by P samples (pulses), such that K = SP is the number of samples per pixel. In other words, real baseband symbols stored in the matrix S ∈ R K×JL determine the information to be encoded into JL pixels.
Let this video comprise a set of F frames, and that x f ∈ R JL×1 is a vectorized form of the in which it is desired to encode the message contained in S.
The OCC-encoding proposed here is based on the KRST coding [31], so we have the following coded video for displayingX Realize that thanks to the Khatri-Rao operation in (9), the F frames of X are replicated by an K factor. In other words, the native encoded video becomes K lengthier than the original one, and then must be played at a speed K times greater.
Consider that during transmission, the videoX is degraded accordingly to the degradation where Y is the captured video. The degradation matrix H may be given by where D ∈ R M N ×JL , W ∈ R JL×JL and B ∈ R JL×JL represent respectively the generic decimation, warping and (optical and motion) blurring matrices [38]. Note that H introduces decimation if M N < JL, where M × N is the spatial resolution of the received video.
Combining (10) and (11), the noiseless video Y generated by the Image Sensor (IS) is given where the rows of Y represent the spatial domain (i.e., pixels) and the columns the time domain (i.e., video frames). In practice, the acquisition of Y is indeed done by capturing KF video frames of resolution M by N pixels.
DRAFT Comparing (12) with the transpose of (3), and making the correspondence The concept of constructing Y is illustrated in Fig. 1.

Figure 1: Construction of Y from the received video
Assuming the presence of additive noise and interference in the capturing process, the general system model is given byỸ where N ∈ R M N ×K×F is the additive noise tensor, which may be thermal, shot noise, sampling noises, among others [17].
From (13) the mode-1, mode-2, and mode-3 unfoldings of Y can be given respectively by The proposed OCC system is summarized in Fig. 2. In the OOK modulation two LED states represent the two possible symbols. That is, Bit 1 can represent the non-null pulse and Bit 0 the null pulse. Fig. 3   For OCC systems, uncoded OOK modulation is typically undesirable, as a long string of zeros can mislead the human viewers to think of dead pixels. Appropriate compensation symbols or coding schemes can be used to control LED dimming [17], [40].

B. Pulse Width Modulation -PWM
In PWM modulation, the symbol determines the duration of the transmission pulses at high or low levels. The ratio of the time interval of the high-level pulse (non-zero pulse) to the symbol period is called Duty Cycle. The Duty Cycle (D), according to Fig. 4, is given by:  In the PPM modulation of Fig. 5, the position of the pulse at low-level is determined by the bit to be transmitted, while all the remaining pulses are at high-level. Here low level means that pixel is off, while high level means the pixel is on. For Bit 0, the low-level pulse occurs at the beginning of the symbol, while for Bit 1, the low-level pulse occurs at the last pulse.
As anyone might suspect, higher-order PPM can easily be implemented with few adjustments.
In the PPM, it is not possible to control the LED dimming, as the level of direct current (DC) of the signal is the same for any of the transmitted symbols. This characteristic is less critical for S2C systems than for conventional VLC, as ambient lighting is not the main goal. The OCC-KRF receiver is based on a non-iterative framework, where degradation matrix H in (11) must be known a priori. The OCC-KRF algorithm it is based on the Least Squares Khatri-Rao Factorization (LSKRF) algorithm adapted from [23], [24].
Given that H is known, then the Khatri-Rao product S X in (15)  Therefore, there is only a arbitrary diagonal matrix ∆ ∈ C JL×JL that satisfies (8), i.e., Ambiguity ∆ is easily avoided if X ·f is known beforehand by the receiver, because X ·f =X ·f ∆.
The ambiguity removal is done Steps 4 and 5 of Alg. 1. Note in (21) thatX ·f (and X ·f ) must not have any zero element.
A. Considerations on Frame Rate, Data Volume, and Symbol Rate In S2C systems, information symbols are encoded into video pixels and then recovered by an optical camera receiver, a priori without impairing perceived visual quality if the screen's refresh rate is greater than 70 Hz [41].
Let r S be the video effective playback rate, r V the original (native) uncoded video frame rate, and r C the capture rate of the IS installed at the receiver. Due to the KRST encoding, r S becomes K times greater than r V , i.e., the video must be played at r S = Kr V . On the other hand, by definition, r V = F/T frames per second, where T is the video block duration (in seconds). Finally, considering the acquisition by the IS, one can establish that For r C > r S , the video Y is oversampled at the receiver.
For an IS resolution of M × N pixels, with b bits of quantization, the total volume of data (in Bytes per second) to be processed by the receiver is Regarding the symbol rate r, since JL screen's pixels carry S symbols each per T seconds, then r is equal to r = JLS/T = JLSr V /F = JLr S /(F P ). iii. Find the i-th rows ofŜ andX from the rank-one approximation: where U ·1 and V ·1 are respectively the first columns of U and V. Neglect the singular values in D.

4:
For an arbitrarily f -th known video frame, find the ambiguity matrix where operator ./ indicates the element-wise division. 5: Remove column ambiguities onŜ andX: In other words, comparing (24) to (25), increasing the number of frames F per block increases V T while reduces r. Realize that rates r S and r V are dependent variables on F/T , while r C depends only on the camera configuration.

B. OCC-KRF computational complexity
The complexity of OCC-KRF is based mainly on the SVD operations employed in As the complexity of the JL SVD calculations of F × K matrices in Step 3 is The screen's resolution JL appears in both (26) and (27), with a heavier weight in the former.
It is evident the process of least-squares restoration in Step 2 of Alg. 1 is more cumbersome than the SVD factorization in Step 3. However, with a time-invariant or very slow degradation in (11), then H T † can be calculated only once for several transmission blocks.

VI. SIMULATIONS
In this section, numerical analyses demonstrate the proposed system's functionality and predict its behavior in terms of some of its parameters. Due to the inexistence of tensor-based works for Screen-to-Camera communications, the proposed system is compared to classical matrixbased schemes: in Section VI-B, the video restoration performance is contrasted with the classic degradation matrix inversion process, while in Section VI-C, symbol detection performance is measured against the conventional S2C approach. The simulations are done around the Symbol Error Rate (SER), as well as the video reconstruction error, called here Video Normalized Mean Square Error (Video NMSE) and given by  (14), i.e., SNR = Y 2 F / N 2 F . The Frobenius norm of a tensor is defined in [25]. The Video NMSE is given by (28) and the Symbol NMSE by .
Indeed, by From varying P , one may note that it brings a coding gain, mainly at a low SNR regime, whereas increasing S does not. This insensitivity of SER to the increase in S is evident from the superposition of the curves for P = 4. It is convenient to verify by definition that both S and P determine the size of the matrix S and the third dimension of the tensor Y, but only the former increases the number of symbols to be estimated.  which is nothing but the conventional non-OCC video degradation model [38]. Then, the video NMSE curve for uncoded video estimation in Fig. 8 is obtained from the following restoration inversion processX which for simulation in Fig. 8
Both P and S impact video restoration performance. As K = SP gives the number of rows of S, an improvement seen with increasing K was expected, primarily since the video matrix X does not depend on any of these parameters.
From a communication point of view, having a large S value, as in Fig. 7 and Fig. 8, is justified only by a higher symbol rate. However, increasing S from a certain point does not bring benefits in symbol estimation (Fig. 7) or significant gains in video estimation (Fig. 8), which can be achieved by raising P in both cases. In the simulated range of Fig. 8, doubling P led to a consistent 3 dB gain in SNR, while doubling S from 50 to 100 for P = 4 only led to a marginal and negligible benefit.  C. Frame length F Fig. 9 compares our proposed tensor-based scheme to a matrix-based approach. As conventional OCC matrix-based receivers usually estimate symbols on a frame by frame basis, if in (16) one makes F = 1, then one obtains the following matrix formulation In other words, the third-order tensor Y ∈ C M N ×F ×K is reduced to the matrix (32). The first thing to note in Fig. 9 is that F brings a diversity gain, not achieved with P or S in Fig. 7.
As the number of unknowns S does not depend on F , then the increase of this parameter brought the referred estimation gain. This behavior was expected, typical of other communication schemes that use the PARAFAC model. It helps that the entries of X are random, of a continuous uniform distribution, then this matrix naturally has a conditioning number. As the data rate is inversely proportional to F , a trade-off between diversity gain and data rate exists, similar to countless other space-time block coding schemes.
Now, when the data rate must be kept constant, the results in Fig. 9 indicate that it may be convenient to compensate for the larger frame length by increasing the number of symbols per block. The same diversity gain with F in Fig. 9 can be found, but the negative effect of the rate reduction was avoided. Negatively, the increase in S impacts both the volume of data generated and the complexity of the receiver. In addition, this increase also requires a proportional increase in the camera's capture rate r C , as mentioned in Sections V-A and V-B.
In terms of the video NMSE, the impact of F (and S) is the opposite of Fig. 9. While diversity gains are attributed only to the increase of F in this figure, this parameter does not improve the video estimation in Fig. 10. Once again, this behavior was expected, as the number of unknowns symbols depend on S, while the length of the video is a function of F . In both Fig. 9 and Fig.   10, the best estimation scenario occurred when F and S increased proportionally, keeping α constant.

D. Comparison between modulations
The three modulations presented in Section IV serve to demonstrate the flexibility of the proposed transmission scheme for different pulse modulations.
Comparing video restoration performance using the three suggested modulations, Fig. 11 demonstrates that PWM and PPM modulations have similar performance -and are superior to OOK modulation. Given the hypothesis that the generated and simulated symbols within an alphabet are equiprobable, there is a greater presence of zeros in the symbol matrix with the OOK modulation. This sparsity in S is likely the culprit for a worse performance of OOK compared to PWM and PPM.

E. Real videos
The performance of the proposed system for real videos were also verified by numerical analysis. The results and conclusions found were very similar to those seen in the last simulation results. They are omitted here for the sake of conciseness only.
As an example, Fig. 12 shows

F. Overall discussion
The screen resolution is proportional to the maximum achievable symbol rate at the same time it produces more data to be processed by the receiver. This conclusion is valid for basically all OCC systems. However, there are no symbol or video estimation improvements when increasing this spatial resolution. It is noteworthy that for now, the screen resolution favors multiplexing gains over spatial diversity gains, which could be obtained if a set of pixels could provide redundancy. The cubic-order complexity of OCC-KRF shall not be neglected for high-resolution screens.
It was observed that increasing symbol pulses K leads to estimation gains in both video reconstruction and symbol estimation. If the number of S per video block is increased, there is a beneficial gain in terms of symbol rate but a marginal SNR gain on video restoration and no gain for symbol detection. On the other hand, the number of pulses per symbol P brings SNR gain for both symbol and video estimation. Besides, a greater P can be used to provide room for dimming control [17], to improve synchronization [42] or to allow even more flexibility to implement other pulse modulations.  (13), in terms of Theorem 1, is approached by Proposition 2 and Proposition 3.
Proposition 1. Assume that the degradation matrix H is a full-rank matrix. A sufficient and necessary condition for identifiability of S and X using the OCC-KRF algorithm is that M N ≥

JL.
Proof: The proof of Proposition 1 is straightforward. For OCC-KRF to work as intended, one needs that H T † be also known, such that degradation can be mitigated, i.e., H T H T † = I JL (Step 2 in Alg. 1). Once H is a full-rank matrix by assumption, in order that H T H T † = I JL , the number of rows of H must be greater than its number of columns, i.e., M N ≥ JL.
Proposition 1 suggests that using the OCC-KRF algorithm is linked to a proper resolution of the IS at the receiver. In other words, the Region Of Interest (ROI) (in pixels) of the captured video must be at least the size of the screen resolution. Mathematically, this hypothesis is ordinary if the spatial decimation in the image degradation model (11) is disregarded.
Overall, for Proposition 1, the hypothesis of the full rank of H is generically true for Linear Time-Invariant (LTI) degradation models. For LTI systems, H is typically a doubly-block Toeplitz (or Circulant) matrix, built from the elements of an image Kernel (e.g., blurring mask) [43].
Proposition 2. Assume that the H is a full-rank matrix, with M N ≥ JL. Also, assume that S and X are random matrices with full rank, and that F 1 and JL 1. Under these hypotheses, the essential uniqueness of the model (13) is almost surely if This is a sufficient, but not necessary, condition. Proof: If one of the matrix factors of (13) has a full column rank, and other factors are random, then a more relaxed sufficient uniqueness condition than Kruskal's condition in Theorem 2 can be used.
Therefore, from the hypothesis that H has a full-column rank, and that S and X are random, The- In [34] it is emphasized that (34) might be used even if S or H were not random. Finally, considering the statement that F 1 and JP 1, then (34) becomes K 2 − K ≥ 2 JL F 2 , whose only integer positive solution is given by (33), which ends this proof.
As F 1 by hypothesis stated in Proposition 2, then (36) becomes 35, ending the proof.
The conditions on K present in Proposition 2 are more relaxed than those presented in Proposition 3. In both cases, screens with higher resolutions force an increase in the number of frames captured by the optical camera.