Fully-Integrated SPAD-Based Receiver With Nanosecond Dead Time for Optical Wireless Communication

This paper presents the design and characterization of a fully-integrated receiver based on single-photon avalanche diodes (SPADs) with nanosecond dead time for high-speed high-sensitivity optical wireless communication (OWC). The receiver consists of a 4x4 SPAD array that is based on a p-well/deep n-well (DNW) structure, and each SPAD is integrated with a tunable front-end circuit to perform quench and reset. In addition, an OR tree is designed to combine the 16 channels of output from the front-end circuits to generate a single data stream for signal processing, and an output buffer is implemented as the interface to drive 50-Ω loading for testing purpose. Fabricated in a standard 180 nm CMOS process, the receiver achieves a minimum dead time of about 2.5 ns. Bit error rate (BER) measurement of the implemented receiver indicates a sensitivity of −31.6 dBm at 100 Mb/s for a BER of 2 × 10−3 and a wavelength of 520 nm, where on-off keying (OOK) modulation and a 215-1 pseudorandom binary sequence (PRBS-15) are employed. To recover the transmitted data stream from the received signal, a signal processing flow specific for SPAD-based receivers is proposed and implemented.


I. INTRODUCTION
O PTICAL wireless communication (OWC) is an attractive technique to meet the ever-increasing demand on data traffic and complement existing radio frequency (RF) systems for various applications, such as light fidelity (Li-Fi), underwater communication, satellite communication and so on [1], [2], [3], [4], [5], [6], [7], [8]. OWC demonstrates many advantages. First, it provides ultra-wide and unlicensed electromagnetic spectrums covering visible light, infrared and ultraviolet for communication, and is capable to coexist with RF systems. In addition, intensity modulation and direct detection (IM/DD) can be applied, resulting in efficient system architectures with low complexity. In an OWC system, a high-sensitivity receiver is critical to extend the link distance and reduce the system power consumption. Many existing OWC receivers are based on PIN photodiodes (PDs) and avalanche photodiodes (APDs) in linear mode, of which the sensitivity is restrained due to the limited gain of photodetection devices and the noise generated from front-end circuits [1], [2], [3], [4], [5], [9]. Lots of endeavors have been made to improve the sensitivity of OWC receivers [1], [10], [11], [12]. Single-photon avalanche diodes (SPADs) are p-n junctions that leverage high electrical field to yield large avalanche multiplication gain and operate in Geiger mode. Therefore, SPAD-based receivers can detect weak optical signal and inherently achieve very high sensitivity, which is promising for communication [13], [14], [15], [16], [17], [18], ranging and imaging [19], [20], [21], [22], [23].
Since SPADs are biased beyond the breakdown voltage, quench and reset circuit are required to maintain the proper operation. Specifically, a quench circuit is needed to restrain the self-sustained avalanche current by reducing the bias to below the breakdown voltage. Both passive and active quench can be employed [13]. Reset is performed after quench to restore the initial bias status of the SPADs for further signal detection. However, during the quench and reset process, there is a period of dead time when the triggered SPADs are not sensitive to incident photons and are treated as idle elements. Therefore, multiple SPADs are usually grouped into one macro device to deal with the dead time and improve the operation speed [13], [14], [15], [16], [22]. Since SPADs are very sensitive devices, they suffer from non-ideal effects such as dark count and afterpulsing, where the noise generated from the internal of SPADs or induced by the incident photons can trigger error output pulses. Therefore, the arrangement of SPADs as a macro device can also improve the quality of the received signal by employing concurrent detection [21].
Significant progress has been made in developing SPAD receivers for communication. In [13], a 32x32 SPAD array is integrated to achieve a sensitivity of −31.7 dBm at 100 Mb/s and 450-nm wavelength employing on-off keying (OOK) modulation, and the minimum dead time is 5.9 ns. In [14], an array of four SPADs in combination with the front-end circuits is presented, demonstrating an OOK sensitivity of −46.3 dBm at 100 Mb/s and −43.8 dBm at 200 Mb/s, respectively, for a wavelength of 635 nm and a dead time of 3.5 ns. In [15], a custom SPAD receiver containing 64x64 elements is implemented to support a data rate of 500 Mb/s and a sensitivity of −46.1 dBm at 450-nm wavelength, where four-level pulse amplitude modulation (PAM-4) and equalization are employed, and the dead time is 12 ns. In [16], a commercial module consisting of 5676 SPADs as one macro device is characterized, indicating an OOK sensitivity of −29 dBm at 2.4 Gb/s and 405-nm wavelength with equalization.
In this paper, a fully-integrated receiver based on SPADs with nanosecond dead time is presented for high-speed highsensitivity OWC. The receiver consists of a 4x4 SPAD array that is based on a p-well/deep n-well (DNW) structure, and each SPAD is integrated with a tunable front-end circuit to perform quench and reset. In addition, an OR tree is designed to combine the 16 channels of output from the front-end circuits to generate a single data stream for signal processing, and an output buffer is implemented as the interface to drive 50-Ω loading for testing purpose. To recover the transmitted data stream from the received signal, a signal processing flow specific for SPAD-based receivers is proposed and implemented.
The remainder of this paper is organized as follows. Section II presents the design and implementation details of the 4x4 SPAD-based receiver, including the system architecture, the SPAD structure, the tunable front-end circuit and other building blocks. Section III elaborates the proposed signal processing flow specific for SPAD-based receivers. Experimental results are provided in Section IV. Section V concludes this paper.

A. System Architecture
As illustrated in Fig. 1, the fully-integrated SPAD-based receiver consists of a 4x4 SPAD array with each SPAD integrating a tunable front-end circuit to perform quench and reset, an OR tree to combine the multiple outputs from the front-end circuits into a single data stream, and an output buffer to improve the driving capability for testing purpose. Although quench and reset are indispensable for the proper operation of SPADs, they generate a period of dead time when SPADs are disabled for signal detection. Therefore, 16 SPADs are arranged in an array to deal with the dead time and improve the operation speed.

B. SPAD Structure
The SPADs are designed based on a p-n junction formed by the p-well and the DNW layer, as depicted in Fig. 2. The DNW not only work as a functional layer but also provides isolation between different SPAD devices. The diameter of the active area is 13 µm, and the passivation layer above the active area has been removed by defining a pad open window. When the SPADs are properly biased to operate in Geiger mode, a high electric field will be generated at the interface between the p-well and the DNW. To prevent premature edge breakdown, the SPADs are implemented in a round shape. The anode P of each SPAD is connected to the p-well through a p+ region, while the cathode N is connected to the DNW through an n+ region and an n-well. The p-substrate is biased by node B through another p+ region.

C. Tunable Front-End Circuit
Each SPAD is required to integrate a tunable front-end circuit to perform quench and reset, so that any triggered avalanche current can be stopped promptly to prevent permanent damage to the device, and the initial bias status of the device can be restored. The schematic of the tunable front-end circuit is illustrated in Fig. 3(a), where mixed quench and active reset are employed to enable fast reaction with well-defined dead time [24]. In particular, mixed quench is implemented via R 1 , M 1 and M 2 , while reset is achieved via M 3 . V FE is the output of the front-end circuit.
The operation principle of the front-end circuit is described as follows, while related simulation results are presented in Fig. 3(b). Initially, V A is below the threshold of inverter INV 1 , and V qc is high to switch on M 1 while turning off M 2 , and V rs is low to turn off M 3 . Therefore, the SPAD is biased above the breakdown voltage when a proper DC bias voltage V B is applied to its cathode. Once a sustained avalanche process is triggered  by incident photons, the generated current will flow through R 1 , resulting in an increase of V A and passive quench is activated. As long as V A reaches the threshold of INV 1 , the output of the inverter is flipped and V qc is turned to low to switch on M 2 and switch off M 1 , which triggers the active quench. V A is then quickly charged to V DD , which is 1.8 V, through M 2 based on a positive feedback.
The duration of the active quench is set by a tunable delay unit D 1 inserted after INV 1 to generate V qc with a controlled pulse width. As shown in Fig. 3(a), the tunable delay unit consists of M 4a , M 4b and a NOR gate to form a monostable circuit, while the delay and thus the negative pulse width of V qc is controlled by M 5 -M 7 via V ctrl1 and V ctrl2 . Specifically, M 5 forms a control branch for short pulse width, while M 6 and the diode-connected M 7 are combined as the control branch for wide pulse width. The advantage of combining dual control branches is that a variable pulse width ranging from 1.2 ns to beyond 10 ns can be achieved with control voltages only ranging from 0 V to 0.9 V, which allows a large overdrive voltage on M 5 and M 6 to enable a stable control. When the control voltage is set to V DD , the corresponding branch is disabled. The delay of the active quench is simulated to be about 99 ps, which refers to Following the end of the active quench process when V qc turns from low to high, an active reset process is triggered by pulling V rs from low to high to switch on M 3 and provide a low-resistance path between the anode of the SPAD and ground. Finally, V A is low and the SPAD is restored to its initial status for further signal detection. Another delay unit D 2 is added to tune the reset process of the SPAD.

D. Other Building Blocks
An OR tree is implemented to combine the outputs from the 4x4 SPAD array and generate a single data stream for signal processing. As illustrated in Fig. 4, the OR tree consists of twoinput NOR gates and NAND gates based on complementary logic. To balance the delay, a special arrangement is made in both logic gates. Specifically, the pull-up network of the NOR gates and the pull-down network of the NAND gates are split into two parallel paths, and each input signal is connected to a different transistor in each path. This balanced connection can ensure that for a pair of input signal, there is no delay difference no matter which one is connected to A (or B). Simulation has been performed to characterize the OR tree, indicating a maximum delay of about 286 ps between the input and the output.
A buffer is implemented as the output interface, which consists of four stages of cascading inverters with scaling size to drive external 50-Ω loading for testing purpose, and AC coupling is employed. The output swing of the buffer loaded with 50-Ω resistance is about 0.8 V pp .

III. SIGNAL PROCESSING
Since the outputs from multiple SPADs are grouped into one to deal with the dead time issue and increase the operation speed, each transmitted bit is usually detected on the receiver side in the form of multiple pulses. In addition, the effect of error output pulses, which are triggered by the noise from the internal of the SPADs or induced by the incident signal, needs to be reduced. Therefore, a signal processing flow specific for SPAD-based receivers is proposed and implemented. As shown in Fig. 5, the output of the receiver is first sampled by an oscilloscope, and then is connected to Matlab for signal processing. The processing flow of the received signal consists of three parts, including waveform shaping, re-timing and data recovery.
First, the received waveform is shaped by comparing to a threshold and quantizing to either 1 or 0, where the threshold is set to be half of the peak-to-peak value of the received signal. This step can significantly reduce the volume of data to be processed and improve the effectiveness of edge detection afterwards.
To extract the timing of the received signal, each pulse is first identified by detecting the rising and the falling edge through differentiation. Then pulse extension can be performed to extend the width of a pulse from its rising edge to the left or to the right by a pre-defined length, which is equal to the interval between the rising edges of two typical adjacent pulses. For any two adjacent pulses, if the extended falling edge of the former one can connect the un-extended latter one, or if the extended rising edge of the latter one can connect the un-extended former one, then these two pulses are treated as part of the same bit. After pulse extension, a specific data pattern that exists in the transmitted data can be used to further identify the timing of the received signal. Since multiple copies of the specific data pattern are distributed in the received signal, several timing reference points can be marked. If one of the reference points is corrupted by error pulses, it can be identified as it usually results in wrong interval with respect to the other reference points. In addition, the small deviation of the selected reference points to the ideal case can also be utilized for the fine search of the correct timing. Finally, the timing of the received bit stream is extracted.
To recover the data correctly, the received pulses after waveform shaping are segmented based on the extracted timing of the received bits. Following the segmentation, the number of quantized sampling points that are 1 within a bit can be counted, which is utilized to identify bit "1" and "0" by comparing to a threshold. For example, for a data rate of 100 Mb/s and a sampling rate of 10 GS/s, each received bit ideally contains 100 sampling points that are quantized to either 1 or 0. However, for a bit "1", since the corresponding received waveform typically consists of multiple short pulses, the number of sampling points that are 1 within this bit is usually smaller than 100. On the other hand, for a bit "0", the number of sampling points that are 1 within this bit can be larger than 1 due to erroneous triggering. Therefore, a threshold is needed to identify each received bit. To find an optimum threshold, histogramming is performed at the beginning by mapping the sum of quantized sampling points that are 1 within each received bit to the corresponding transmitted bit. After accumulating enough data on the histogram, the optimum threshold can be decided, which will be used for data recovery afterwards.
To facilitate the comparison of the received and the transmitted data for bit error rate (BER) measurement, the transmitted data is also processed, as illustrated in Fig. 5, where waveform shaping and edge detection are similar to those on the receiver side.

IV. EXPERIMENTAL RESULTS
As shown in Fig. 6, the fully-integrated SPAD-based receiver has been fabricated with a standard 180 nm CMOS process, and occupies a core area of 720 × 247 µm 2 , where the 4 × 4 SPAD array, the tunable front-end circuits and the OR tree have a total area of 247 × 203 µm 2 . Since each SPAD has an active area of 13 µm in diameter, the fill factor is about 4.2%. The typical power supply of the receiver is 1.8 V, while the cathode of the SPAD array is biased with a tunable supply V B , as shown in Fig. 3(a). The breakdown voltage of the SPADs is attained by first recording the I-V characteristics of a separate SPAD fabricated on the same substrate, and then identifying the bias voltage that causes a sharp change of the reverse current from tens of pA to tens of µA, which is treated as the breakdown The optical experimental setup is presented in Fig. 7. On the transmitter side, a 2 15 -1 pseudorandom binary sequence (PRBS-15) can generated by an arbitrary waveform generator (AWG, Siglent SDG7052A), which is then superimposed onto a DC voltage through a bias-tee to drive a laser diode with a peak emission wavelength at about 520 nm. The transmitted light is attenuated by a tunable optical attenuator and incident onto the SPAD-based receiver, of which the output is sampled by an oscilloscope (LeCroy WaveRunner 9404) for further processing.  To estimate the optical power received by the 4x4 SPAD array, the area of the laser light spot on the receiver side is recorded as A LD , and the total optical power corresponding to the laser light spot is measured as P total by an optical power meter (Thorlabs PM100D). Assuming a constant power density over the whole area of the laser light spot, the received optical power by the SPADs can be calculated as P total · A SPAD /A LD , where A SPAD is the total active area of the 16 SPADs.
When measuring the dark count rate (DCR) of the receiver, no incident light is needed, and the receiver is put in a blackbox for ambient light shielding. In particular, this is the DCR of the whole SPAD array, and is recorded at the output of the buffer in Fig. 1. The measured DCR as a function of excess bias voltage V EX is plotted in Fig. 8, indicating a DCR of 30 cps at a V EX of 1.19 V. A lower DCR can be achieved when V EX is further reduced, which is helpful in suppressing the error output pulses of the receiver and improving its BER performance. As expected, the DCR increases with V EX . One possible reason to cause the high DCR is that, the SPAD array consists of 16 SPADs, and there may exist breakdown voltage variations between different SPADs due to process variations. On the other hand, all SPADs share the same cathode bias voltage V B , as shown in Fig. 3(a). As a result, it is possible that some SPADs would bear a higher excess bias voltage than the others, which causes the high DCR. Therefore, the size of the SPAD array, the design of the SPADs and the front-end circuits could be further optimized to reduce the DCR. In a large-array design for practical applications, it is true that different SPADs and different arrays could have different characteristics, including DCR. One possible solution is to employ a mechanism to detect the DCR or the breakdown voltage variation of each SPAD, and then rely on feedback control to tune the excess bias voltage of each SPAD locally. In the following measurements, V EX is set to 1.1 V to minimize the effect of dark count unless otherwise specified. However, the low V EX results in a photon detection probability (PDP) of smaller than 0.01%, which is increased to about 2% when V EX is 1.32 V. Here the PDP refers to that of the whole SPAD-based receiver. Although the PDP can be increased by increasing V EX , the issue related to the high DCR needs to be addressed to avoid deteriorating the BER performance. Other possible reasons to cause the low PDP could be the wavelength of the incident light and the relatively high switching threshold of INV 1 in Fig. 3(a).
Afterpulsing probability can be measured by obtaining the histogram of the inter-arrival time (IAT) between the rising edges of consecutive output pulses. In this measurement, the transmitted light is kept constant without modulation, and is attenuated to maintain a low count rate of about 29 kcps, so that afterpulsing events can be detected effectively [25]. All the triggered pulses at the output of the receiver are recorded to find the IAT and construct the histogram, which is then utilized to estimate the afterpulsing probability based on exponential fitting. Fig. 9 shows the measurement of afterpulsing probability with a minimum dead time of about 2.5 ns. The red line is the exponential fitting of the measured histogram data, representing the case without afterpulsing, and the area between the histogram data and the red line is the afterpulsing probability, which is 2.67%. In addition, experimental results show a lower afterpulsing probability of 2.5% with a larger dead time of about 5.3 ns by increasing the active quench time. On the other hand, the afterpulsing probability increases with the excess bias voltage. When V EX is changed from 1.1 V to 1.25 V, the afterpulsing probability is increased to larger than 5%. Fig. 10 shows a 100 Mb/s OOK modulation signal on the transmitter side when employing PRBS-15, and the corresponding received signal at the output of the implemented receiver with a dead time of about 2.5 ns, where the pulse width of a single triggered SPAD is 1.2 ns.
The measured BER as a function of the pulse width of a single triggered SPAD at a data rate of 25 Mb/s is illustrated in Fig. 11, where the average received optical power is −26.4 dBm. Since the increase of the pulse width of a single triggered SPAD reduces the number of pulses contained in each received bit and also causes larger undesired width extension of some received bits, the BER is deteriorated. Therefore, it is important to minimize the triggered pulse width and dead time for higher speed and better BER performance. Fig. 12 presents the measured BER versus average received optical power with a dead time of about 2.5 ns at 50 Mb/s, 100 Mb/s and 150 Mb/s, respectively, where the pulse width of a single triggered SPAD is 1.2 ns. It is indicated that the BER decreases with increased optical power, and a BER of 2 × 10 −3 can be achieved at 100 Mb/s with a sensitivity of −31.6 dBm. The BER can be improved to 10 −9 if proper forward error correction is applied [14]. In addition, the BER can also be improved with an enhanced PDP, assuming the DCR and the afterpulsing probability are not affected and the SPAD array is not saturated.  Table I summarizes the performance of typical SPAD-based receivers. The implemented receiver consisting of a 4x4 SPAD array achieves a very short dead time of about 2.5 ns, demonstrating a promising potential for high-speed applications. Although [15] and [16] can support higher data rates, the size of the SPAD array is hundreds of times larger, resulting in significantly larger chip area and higher cost. In addition, equalization [15], [16] or higher-order modulation [15] is employed to improve the speed. Moreover, the implemented receiver achieves a sensitivity of −31.6 dBm at 100 Mb/s with PRBS-15 instead of PRBS-7 employed in other works. Finally, the implemented receiver is fabricated in a standard and normal 180 nm CMOS process. In contrast, the other works employ special process such as CMOS imaging process or CMOS process with epitaxial layer to optimize the performance of photodetection devices, and thus achieve impressive sensitivity.

V. CONCLUSION
A fully-integrated SPAD-based receiver with nanosecond dead time is presented for OWC applications. The receiver consists of a 4x4 SPAD array that is based on a p-well/DNW structure, and each SPAD is integrated with a tunable front-end circuit to perform quench and reset. In addition, an OR tree is designed to combine the 16 channels of output from the front-end circuits, and an output buffer is implemented as the interface for testing. To recover the transmitted data stream from the received signal, a signal processing flow specific for SPAD-based receivers is proposed and implemented. Experimental results verify the characteristics and functionalities of the implemented receiver, demonstrating a short dead time and a high sensitivity for high-speed OWC applications.