Machine Learning Pattern Recognition Algorithm With Applications to Coherent Laser Combination

We analyze a new kind of machine learning algorithm designed to feedback stabilize coherently combined lasers. This algorithm learns differential, rather than absolute, values of action in phase space, in order to facilitate learning on initially unstable systems. Experiments have shown that this approach can control small-scale spatial beam combination with high stability. In this paper we analyze the algorithm’s performance and limitations in depth, showing that it can continuously learn during operation in order to track changes. Using simulation, we extend the application to temporal combination, and show that it scales to more complex instances by combining 81 beams.

the whole beam array be maintained against environmental 23 perturbations using an active stabilization controller [9], [10], 24 [11]. Often, there are challenges to identify errors and build a 25 deterministic error detector. 26 Control challenges in complex CBC lasers include large 27 dimensionality in the control parameters. For example in 28 a two-dimensional, N×M beam combination system with a 29 diffractive optical element [8], [12], the number of input phase 30 control variables is N×M, and the output/observable variables 31 include (2N-1)×(2M-1) beams in an interference pattern. 32 Optical coherence stabilization requires fast control to sup-33 press noise from the environment with high bandwidth [12]. 34 Also, measurement of laser intensity loses phase information 35 when using an optical power measurement from cameras or 36 photodiodes, thus phase information must be retrieved with an iterative process and this tends to be slow [13]. Another 38 consequence of measuring optical power is that there might be 39 nonlinear response and non-unique input conditions for each 40 output pattern, making model-based active control difficult.

41
Stochastic parallel gradient descent (SPGD) is a general 42 and commonly used solution for CBC lasers control, which 43 uses a simple, single detector [7], [10]. SPGD finds phase 44 errors by dithering the input phases and measuring combined 45 beam power to search for the right direction to move in 46 phase space. The number of correction steps SPGD takes to 47 converge is approximately ten times the number of combined 48 beams [11], which slows down when scaling to many beams. 49 Most importantly, SPGD introduces noise in the output power 50 by its need for dithering and searching, adding noise to 51 operational systems [12].

52
Machine learning control (MLC) is a promising technique 53 to provide the controls needed for complex systems [14]. MLC 54 solves optimization control problems using machine learning 55 (ML) methods. It takes advantage of both the data-driven 56 field of ML and well-developed methods from control theory, 57 and has already led to many exciting ideas and innovative 58 applications in complex nonlinear systems [15], [16], [17]. 59 Some preliminary results have indicated the value of MLC in 60 complex CBC lasers. For the unsupervised approach, a deep 61 reinforcement-learning (RL) controller has shown promising 62 capabilities when applied to 2-beam spatial combination [18], 63 with more beams combined in simulations [19]. A new algo-64 rithm combined RL with SPGD to demonstrate robust control 65 in simulations with 128 beams in temporal combination [20]. 66 With an RL controller, 100 beams have been spatially com-67 bined in a simplified experimental setup, using two spatial-68 light-modulators [21]. For supervised MLC, our previous work 69 solved the problem of non-uniqueness of patterns, and showed 70 that a simple, fully-connected neural network (NN) can be 71 trained to combine 81 beams using interference pattern recog-72 nition in simulations [22]. However, while there is progress 73 with MLC in CBC lasers, there are also significant challenges. 74 The lack of labelled data for training prevents pattern recog-75 nizing MLC from being applied to real systems [22]. Because 76 most coherent combination systems depend on uncontrolled 77 parameters which slowly drift [23], a machine trained to 78 control the system in one state of key parameters may not be 79 able to control the system if those parameters change. There 80 is also the problem of training the machine initially: if the 81 system is not stable, then the absolute value of the input and 82 output is always unknown due to random perturbations from 83 including 8 beams in an 3 × 3 array [12], and 81 beams in 140 an 9 × 9 array [26]. Alternatively, it is desirable to implement 141 a controller without having knowledge of the combiner itself. 142 A machine learning controller has proven to be an effective 143 solution where the mapping information can be learned from 144 experimental data. It was shown that a simple, fully connected 145 neural network (NN) can be trained to combine 81 beams [22] 146 from diffractive pattern recognition. We have also shown pre-147 viously in coherent spatial beam combining that the problems 148 of non-uniqueness of the output state and large numbers of 149 dimensions associated with increased beam number can be 150 mitigated by training only on data from a limited range of 151 phase space [22].

152
There are other problems with the simplified approach 153 outlined above. One is that we need to train the NN on a 154 drifting system. If the system could be completely trained 155 before any significant drift, that would work, but with the 156 current sample rate (∼1kHz) this is not possible. Several 157 thousand samples are required and with a sample time of 1ms 158 this requires several seconds, during which time the drift will 159 be unacceptable [23].  Figure 1 shows experimentally measured phase drifts. If we 161 use this data to try to label the absolute value of phase during 162 pattern recognition, this will result in large error (Fig. 1a). 163 One thousand samples exhibit a large random phase drift value 164 from the original setting, thus the real absolute value of phases 165 is unknown. In contrast, we see a relatively small error from 166 drift when we only label adjacent samples (Fig. 1b), which is 167 the basis of DDRM as discussed below. For a realistic case, 168 phase drift is only a few degrees during millisecond sample 169 delay [24]. 170 The other problem is that parameters not controlled by the 171 phase actuator (such as relative beam power) change during 172  Fig. 2 shows how the DDRM algorithm-based neural-208 network and iteration is implemented in a feedback loop in 209 a general coherent combining system, with one entire cycle 210 around the loop counting as one feedback step. We take the 211 in-time interference pattern together with a target pattern (ideal 212 pattern) as input to the trained NN. The NN recognizes the 213 phase error between the patterns, and sends error signals 214 to the PID controller which applies a phase correction to 215 bring the current pattern toward the target pattern. The laser 216 beams' phase is then updated, and a new interference pattern 217 is generated and so on. Once the phase matrix is close to the 218 optimal point, the phase correction from the NN recognition 219 algorithm is always less than the prediction error, i.e., always 220 keeping the optimal/stable state within a given tolerance during 221 the iterative process.

222
Due to continuous drift in an unstabilized system, there is 223 no way to label the unknown absolute phase values. As shown 224 in Fig. 2, to get the data-set for NN training, we can inject a 225 known phase dither and measure the diffraction pattern before 226 and after (pattern A and pattern B), with a small time interval 227 between two samples. Then we can build a mapping between 228 the phase space and the pattern space using the correlated data 229 samples of pattern A, phase dither, and pattern B. In the figure, 230 A l and B l are states in phase space, which correspond to 231 pattern A and pattern B in pattern space. The patterns are the 232 input and the corresponding dither is the label for NN training. 233 There is an error in the labelled phase dither between A and B 234 as A l drifts to state A' l due to a random drift rate within the 235 two samples. For samples spaced closely in time, the drift rate 236 is small as shown in Fig. 1b.

237
It is useful to limit most of the training to a region of phase 238 space near optimal, in order to minimize the number of needed 239 samples and the corresponding training time. Any kind of 240 As shown in [22], samples used to train the NN models must 254 be within ([−π/2, π/2]) around the optimal state, to avoid 255 interference pattern ambiguity. Also, the RMS dither σ dither 256 must be much larger than the RMS drift rate σ drift in order to 257 provide labelled data with high signal-to-noise for NN training.

258
As reported in [27], randomly chosen samples can be used  In practice, closeness to an optimal state can be maintained 267 and samples can be selected while using an SPGD controller 268 and the incremental learning process as discussed above [24]. 269 500 samples are used in the diffractive combining simula-270 tion, with a system drift rate of 4 degree per sampling interval, 271 and the RMS dither amount σ dither of 30 degree for the initial 272 exploration with selected samples. The RMS error between 273 the predicted phase and the known phase drops as we train the 274 NN as shown in Fig. 3a. We then take training data based on 275 the incremental learning process and plot the training curve 276 as shown in Fig. 3b, where new data is obtained from the 277 corrections against system drift and thus in a very limited 278 phase range. During incremental learning, new data updates 279 the trained NN model, thus producing a clear drop of the RMS 280 error.

281
The values of the RMS errors in Fig. 3 are not directly 282 relevant to the final combining efficiency and stability because 283 they come from the NN training process, which compares the 284 labelled data with the fitting data. Since our measured/labelled 285 phase neglects system drift while the real one includes system 286 drift, to really evaluate the prediction error of the NN model 287 during feedback we need to test the model with data with a 288 drift range as discussed in the following section.

290
We have simulated three types of NN that have been trained 291 in different applications; 3 × 3 diffractive combining, 9-beam 292 temporal stacking, and 9 × 9 diffractive combining. We use 293 python code to simulate the optical transition process, to gener-294 ate random input laser beams for variable measurable intensity 295 patterns and for feedback control tests. In each case the NN 296 differs, as do scanned parameters including dither range and 297 drift rate. For a given parameter-set and a given application 298 case, the trained NN model can always be reproduced by 299 running the same simulation code.

300
Data used for training is always generated from scratch 301 by running the simulation code based on the physics model 302 presented in Section V-A for temporal stacking and in 303 Section V-B for diffractive combining.

304
The outcomes of the training, testing and validation dif-305 fer for different NN types, training methods and different 306 parameters used, although the convergence trends are all 307 similar as shown in Fig. 3. The drop of the RMS error 308 curves indicate success of training the NN model. Simulations 309 described below indicate that models with selected samples 310 and incremental learning can be successfully applied in a 311 general feedback loop.

312
As we focus more on performance when implementing the 313 trained NN in feedback control, we use the prediction error 314 to judge performance instead of the direct training curves 315 since the larger prediction error leads to poor stability in the 316 feedback control, as discussed in Section V-C.

318
The Gires-Tournois interferometer (GTI) based coherent 319 pulse stacking scheme stacks a series of phase-modulated 320 pulses into one, using a series of concatenated cavities [28]. 321 Each cavity is comprised of one low-reflectivity input-output 322 mirror and other high-reflectivity mirrors to form a cavity with 323 a round-trip delay equal to the pulse interval (or a multiple 324 (1)

339
For the stacking simulations with four identical, cascaded 340 GTI cavities, we are using the parameters list in [30], with a 341 stacking sequence of 9 approximately equal-amplitude pulses.

342
The ideal stacked output intensity versus the input intensity 343 for our setup is shown in Fig. 4b.

344
The absolute pulse stacking efficiency is defined as: η = . For our control study, we didn't optimize the approach needs fewer samples to get the same accuracy since 357 it has more information (four cavity outputs). 500 samples are 358 used in the NN training, and Fig. 4c shows how the trained 359 NN is applied to correct the intensity pattern to the optimized 360 one, where the maximum peak power enhancement factor is 361 close to 8. The system has a drift rate of 80 mrad in this 362 simulation, which leads to an instability about 5%. And this 363 agrees with the analysis in [31]. A larger drift rate will cause 364 larger instability in a way similar to the diffractive combining 365 system discussed in the following section.

383
The simulated 8 input beams have equal amplitude, and the 384 ideal input beam phases match the DOE transfer function such 385 where stands for the phase 386 function [8], [12]. Still, for the control studies we only present 387 Here, σ φ , σ pred and σ drift are uncorrelated vectors with the 422 same dimension. We statistically analyze the NN prediction 423 error σ pred , and the total phase error σ φ , at different drift rates 424 σ drift , in the NN-based feedback simulations of the diffractive 425 combiner (Section V-B). The tested NN models' dither amount 426 was fixed at 30 degrees while drift rate was varied.

427
Results are shown in Fig. 6a. The prediction error σ pred 428 during feedback operation is found by testing the trained 429 NN models with samples within the correction region (RMS 430 value equal to the RMS drift rate). Each point in the curve 431 corresponds to a NN that is trained with a given drift rate. 432 Results are reproducible when re-running the same simulation 433 code. Based on the prediction error σ pred (shown in blue), 434 we can derive σ φ using Eq. 3 for a given known drift rate 435 σ drift , (shown in red).

436
It has been shown that η is related to the uncorrelated RMS 437 piston phase errors σ φ (in radians) from each channel, which 438 is approximately: η = 1 − σ 2 φ [32]. This approach works well 439 with small perturbations, but is less accurate with larger σ φ . 440 In that case we can statistically derive η versus σ φ as well 441 as the stability from Monte-Carlo simulations of the physical 442 model of our 8-way 3 × 3 diffractive combing system. For 443 the Monte-Carlo simulations, we corrupted the laser phase 444 with Gaussian-distributed noise of σ φ , and then performed 445 2D convolution as in Eq. 2. We then derived statistics of 446 combining efficiency and stability from 20k samples. Monte-447 Carlo results for given σ φ are shown as dashed lines in Fig. 6b 448 We can also statistically analyze the combining efficiency 449 and the standard deviation of η, i.e., stability after feedback, 450 from DDRM-based feedback runs like those shown in Fig 5b, Fig 5c and Fig 5d.  in Fig. 7. The zoomed inset shows the stability and average 482 Fig. 7. Scaling capability of ML-based pattern recognition: DDRM-based feedback in 9 × 9 diffractive combining with different drift rates. 100 random cases are shown with random initial states, and the zoomed insets show the curves near optimal. normalized combining efficiency. For the case of RMS drift 483 rate 5 degrees, the average combining efficiency is 99.7% 484 and RMS stability is 0.08%. For the case of RMS drift rate 485 10 degree, the average combined efficiency is 99.4% and RMS 486 stability is 1.4%. We see that even at a 10 degree drift rate, 487 the trained NN still works and typically converges in less than 488 200 steps.

489
The NN structure for the 9 beam/pulse case (in both 490 diffractive and temporal stacking) is quite simple, with only 491 3 layers. Both MultiLayer Perceptron (MLP) and Convolu-492 tional Neural Networks (CNN) types of neural network can 493 implement DDRM and feedback processes. Structurally, there 494 isn't a significant difference between the two approaches; for 495 3 × 3 coherent combining, the input data (double intensity 496 patterns) dimension is 2 × 5 × 5 for CNN, and 1 × 50 for MLP 497 respectively. For 9 ×9 combining, we are using a 4 layer MLP 498 with 1×578, (i.e., 2×17×17) input data. 500 training samples 499 are enough for the 3 × 3 diffractive beam combining and 500 9-pulse temporal stacking. For the 81-beam diffractive case, 501 training requires about 100k training samples. The inference 502 latency (i.e. the amount of computation time required to turn 503 a measured pattern into a phase error signal) is similar for 504 both cases. For our 8-beam CBC in a 3 × 3 array with 505 5 × 5 interference pattern, we found the inference time of our 506 4-layer MLP model is about 0.21 ms, while a CNN model with 507 two convolutional layers is about 0.33 ms on a typical CPU 508 without any GPU acceleration. A larger inference time is also 509 needed for the larger NN, which is about 10 to 20 milliseconds 510 in a normal CPU without acceleration. For the inference speed 511 of MLP models regarding scaling, if using single threaded