Photonic perceptron based on a Kerr microcomb for high-speed, scalable, optical neural networks

Optical artificial neural networks (ONNs) have significant potential for ultra-high computing speed and energy efficiency. We report a new approach to ONNs based on integrated Kerr micro-combs that is programmable, highly scalable and capable of reaching ultra-high speeds, demonstrating the building block of the ONN — a single neuron perceptron — by mapping synapses onto 49 wavelengths to achieve a single-unit throughput of 11.9 Giga-OPS at 8 bits per OP, or 95.2 Gbps. We test the perceptron on handwritten-digit recognition and cancer-cell detection — achieving over 90% and 85% accuracy, respectively. By scaling the perceptron to a deep learning network using off-the-shelf telecom technology we can achieve high throughput operation for matrix multiplication for real-time massive data processing.


I. INTRODUCTION
Artificial Neural Networks (ANNs) have demonstrated unprecedented success in making predictions from, and capturing simpler representations of, complex high-dimensional data. When trained on enough data, ANNs can outperform humans and other computational algorithms [1][2][3][4][5] in tasks ranging from image recognition and language translation to risk evaluation and, interestingly, sophisticated board games [6]. The computing power and speed of ANNs is dictated by the matrix multiplication operations. Current electronic devices designed for ANNs, including the IBM TrueNorth and Google TPU [7,8], generally employ ultra-large-scale arrays of processors, such as the systolic array [8], to enhance the parallelism for higher computing speeds of over 180 Tera-FLOPS. However, they are subject to either relatively inefficient digital protocols or the electrical bandwidth bottleneck of each single processor (limited to ~700 MHz) [9]. Photonic ANN hardware, or optical neural networks (ONNs), are promising next-generation neuromorphic processors, since they potentially offer ultra-large optical bandwidths dramatically accelerate computing speeds [3]. The key is to realize the weighted synapses that connect the neurons and nodes. Unlike digital approaches where the synapses are stored in memories, photonic approaches not only rely on physical embodiments of synapses where the number of synapses (i.e., the network scale) relies on the physical parallelism, but it is inherently analog.
Significant progress has been made on ONNs that explore different multiplexing approaches to parallel synapses. Spatially multiplexed schemes such as integrated coherent photonic circuits [3] and diffractive frameworks [10], have successfully demonstrated classification tasks involving vowels and handwritten digits with low-power passive operation, although with a tradeoff between parallelism and footprint. Other approaches to ONNs, such as photonic reservoir computing [11][12][13] and spike processing [14][15][16][17], employ advanced multiplexing techniques to establish synapses with much more compact schemes. Photonic reservoir computing uses time-domain multiplexing to achieve large-scale input layers with hundreds of nodes. Spike processing employs wavelength-division multiplexing and has achieved pattern recognition tasks through the use of integrated phase-change devices [17]. with a reconfigurable operation bandwidth [10][11][12]. However, most schemes have limitations. Time-division multiplexed networks are difficult to either dynamically train or scale to form deep (multi-layer) networks. Spike processing has so far been limited in its degree of parallelism by the use of discrete laser arrays. The simultaneous use of all three types of multiplexing (wavelength, time, spatial) would offer the greatest benefits in terms of scale and processing power and speed.

II. PERCEPTRON
Here, we demonstrate [4] a new approach to ONNs based on integrated Kerr micro-combs that uses wavelength, time and spatial multiplexing to compute vector dot products. Matrix operations are performed using flattening to convert into vectors, at high speed and in a scalable and dynamically trainable network structure. We demonstrate the key building block of the ONN, a single neuron photonic perceptron with 49 synapses, that operates at a single-unit matrix multiplication speed of 11.9 Giga operations/s (OPS), that, at 8 bits per OP, corresponds to a bit rate of 95.2 Gbps. We achieve this by simultaneously weighting the synapses in the wavelength domain and scaling the input data in the time domain. We apply the single perceptron to standard benchmark tests including the classification of handwritten digits, achieving > 93% accuracy, as well as to predicting benign/malignant cancer classes using a feature set extracted from microscopy images of biopsied tissue, achieving > 85% accuracy. Figure 1 shows the mathematical model of the single neuron perceptron [18] while Figure 2 shows the detailed experimental configuration that we use based on an integrated optical micro-comb source. The perceptron uses simultaneous time and wavelength multiplexing based on 49 1,7 and David J. Moss 1 , Fellow IEEE core function is a matrix multiplication operation (the matrices are pre-flattened into vectors for calculation) between the input electronic data of the image to be analysed with the synaptic weights that are implemented in a multiple-step approach in the optical domain. The raw input data for classification is a 28×28 matrix in electronic digital grey-scale values with 8-bit intensity resolution. We first resample this digitally (effectively performing digital down-sampling) into a 7×7 matrix which is then rearranged into a 1D vector: X = [x(1), x(2), …, x (49)]. This vector is then sequentially multiplexed in the time-domain via a high-speed electrical digital-to-analog converter at a data rate of 11.9 Giga Baud, where each symbol corresponds to the 8-bit pixel of the input data and occupies one time-slot of length 84 ps, so that the entire waveform duration is given by N = 4.12 ns (N=49). In traditional digital approaches, the input nodes to the neural network generally reside in electronic memories and are routed via the memory addresses. In contrast, for our ONN the input nodes are defined by temporally multiplexed symbols that can be routed according to their temporal location. Next, the electronic time-division multiplexed input waveform signal is multicast onto all 49 (e.g. equal to the number of components of the X vector) wavelength channels from the micro-comb via an electro-optic modulator, such that each wavelength contains an identical replica of the temporal data waveform X. The optical power of each comb line is then weighted with an optical spectral shaper (Waveshaper) according to the trained synaptic weight vector W = [w(1), w(2), …, w(49)], which therefore effectively multiplexes the synaptic weights in the wavelength domain. Assuming X and W are both 49×1 column vectors, the resulting weighted replicas of input X then become where the nth row (n∈ [1, N]) corresponds to the weighted temporal waveform replica at the nth wavelength channel. Hence, the diagonal elements denote the N weighted input nodes, i.e., the nth weighted input node is represented by the 8-bit symbol w (n) ·x(n) residing in the nth timeslot of the nth wavelength channel.
The replicas then pass through a dispersive element providing second-order dispersion to progressively delay the weighted replicas so as to line up all of the diagonal elements into the same timeslot, with the delay step satisfying τ = d e elay(λk) el ay( el ay( • el ay( el ay( d el ay( e l ay( l ay( a y( y ( ( λk+1). Thus, the dispersive element serves as a time-of-flight addressable memory that aligns the sequentially weighted temporal symbols w (1) · x(1), w (2) · x(2), …, w (49) · x(49) across the wavelength channels as  Fig. 3  While this process does not increase the speed of the network since only the diagonal elements are used, dramatic increases in speed can be realized by scaling to deep networks through simultaneous time, wavelength and spatial multiplexing. Finally, the optical intensity of the aligned time slots are summed by photodetection and sampling to yield the result for matrix multiplication (for 7×7 matrices, equivalent to the dot product of 49×1 vectors) of the neuron, given by: After matrix multiplication, the weighted and summed output is then biased and mapped onto a desired range through a nonlinear sigmoid function (achieved in this initial demonstration offline with digital electronics, see supplementary for details), yielding the neuron (single-neuron perceptron) output. Finally, the prediction of the input data's category is generated by comparing the neuron output with the decision boundary, a hyper-plane in a 49-dimension space found during offline digital learning, that can separate the two categories.

IV. RESULTS
We first tested the perceptron on several pairs of handwritten digits using 500 figures for each digit, from which 920 figures were randomly selected for offline pre-training, leaving the remaining 80 figures for experimental testing. The 2D handwritten digit figures were pre-processed electronically using a down-sampling method to reduce the image size from 28×28 to 7×7, followed by transforming it into a 1D array of 49 symbols. This was then time multiplexed with ~ 84 ps long timeslots for each symbol, equating to a modulation speed of 11.9 Giga-baud. For our perceptron the dimension of the data vector needs to match the dimension of the weight vector, or number of wavelengths (49 in our case). As such, a down-sampling process of the image was employed to reduce the vector length to 49. The optical power of the 49 microcomb lines was shaped according to pre-learned synaptic weights to boost the parallelism and establish synapses for the neuron. Then the input data stream was multicast onto all 49 shaped comb lines followed by a progressive (linear with wavelength) delay using a ~13 km standard single-mode fibre (SMF), which served as the time-of-flight optical buffer via its 2 nd order dispersion (~17 ps/nm/km). Hence, the weighted symbols on different wavelength channels were aligned temporally, allowing them to be summed together via photodetection and sampling of the central timeslot, to generate the results of the matrix multiplication and accumulate (MAC) operation. The result was then compared with the decision boundary, a hyper-plane found during network training that can best classify the input samples distributed in a 49-dimension hyper-space. The computed matrix multiplication results for multiple input data samples were compared with the decision boundary in intensity, thus yielding the final ONN predictions (Fig. 3). We evaluated the performance of the perceptron in determining the classification of two standard benchmark cases, handwritten digits and cancer cells. In the first case, two categories of handwritten digits (0 and 6) were distinguished by the decision boundary. Our device achieved an accuracy of 93.75%, compared to 98.75% success for the calculated results on a digital computer. We also determined the classification of cancer cells from tissue biopsy data (Fig. 3). Individual cell nuclei, from breast mass tissue extracted by fine needle aspirate and imaged under a microscope, have previously been characterized in terms of 30 features such as radius, texture, perimeter, etc. In our analysis, data for 521 cell nuclei were employed for pre-training, with another 75 used for experimental diagnosis, following a similar procedure to the above handwritten digit test. We achieved an accuracy of 86.67% as compared to 98.67% success for the calculated results on a digital computer. We follow the approach Intel has used to evaluate digital micro-processors [69]. Considering that in our system the input data and weight vectors for the MAC calculation originate from different paths and are interleaved in different dimensions (time, wavelength), we use the temporal sequence at the electrical output port to clearly define the throughput. According to the broadcast-and-delay protocol, each computing cycle consisting of a vector dot product between the 49-symbol data and weight vectors that generates an output temporal sequence with a length of 48+1+48 symbols and thus a total time duration of 97 × 84ps. The 49th symbol corresponds to the desired vector dot product as a result of 49 MAC operations, and so the throughput of our ONN is given as 49/(84ps×97) = 5.95 Giga-MACs/s. As each MAC operation contains two operations -a multiply operation and an accumulate operation, the throughput in OPS is twice the throughput in MACs/s, which in our case is (49×2)/(84 ps×97) = 11.9 Giga-OPS. The input data stream consisted of symbols with 8-bit (256 discrete levels) values, determined by both the original grey scale values of the image pixels and the intensity resolution of our electronic arbitrary waveform generator. The optical spectral Waveshaper featured an attenuation range of 35 dB, supporting 11 -bit resolution (10·log10(211) = 33 dB). As such, each computing cycle also corresponded to an equivalent throughput of (49×2)×8 / (84 ps×97) = 95.2 Gbps in terms of bit rate. For analog systems such as the one used here, the bit rate/intensity resolution is limited by the signal-to-noise ratio of the system. Hence, to achieve 8-bit resolution, the system would have to feature a signal-to-noise ratio of >20·log10(28) = 48 dB in electrical power. This is well within the capability of analog microwave photonic links including the ONN system reported here (where our OSNR was >28 dB). Our results represent the fastest bit rate for an ONN, although a direct comparison of the widely varying systems is difficult. For example, while systems that use CW sources to perform single-shot measurements [4,10,17] have low latency, they have a very low throughput since the input data cannot be updated rapidly. While the latency of our single perceptron is relatively high (~64 μs) due to the fibre spool, this does not affect the throughput of our system and can be reduced to < 200 ps with compact devices such as sampled Bragg gratings or etalon tunable dispersion compensators [70 -75].

V. CONCLUSIONS
We demonstrate a single perceptron optical neural network based on an integrated optical Kerr micro-comb operating at a single-unit throughput of 11.9 Giga-OPS/s or 95.2 Gbps. We perform standard benchmark real-life tasks including the recognition of handwritten digits and diagnosis of cancer cells. We propose architectures to realize a deep learning ONN with greatly enhanced throughput speed and processing power, enabled by the high degree of parallelism achieved through simultaneous wavelength, time, and spatial multiplexing. Our approach offers significant potential for real-time analysis of high-dimensional data for demanding applications.