Formal Proofs of Orthogonality for Class-Incremental Learning for Wireless Device Identiﬁcation in IoT

This document provides a formal proof and supplementary information of the paper: Class-Incremental Learning for Wireless Device Identiﬁcation in IoT. The original paper focuses on providing a novel and eﬃcient incremental learning algorithm. In this document, we explicitly explain why the memory representations (latent device ﬁngerprints in our application) in Artiﬁcial Neural Networks approximate orthogonality with insights for the invention of our Channel Separation Incremental Learning algorithm


Formal Proofs of Orthogonality for Class-Incremental Learning for Wireless Device Identification in IoT
Yongxin Liu, Jian Wang, Jianqiang Li, Shuteng Niu, and Houbing Song, Senior Member, IEEE Abstract-This document provides a formal proof and supplementary information of the paper: Class-Incremental Learning for Wireless Device Identification in IoT [1].The original paper focuses on providing a novel and efficient incremental learning algorithm.In this document, we explicitly explain why the memory representations (latent device fingerprints in our application) in Artificial Neural Networks approximate orthogonality with insights for the invention of our Channel Separation Incremental Learning algorithm.
Index Terms-Internet of Things, Cybersecurity, Big Data Analytics, Non-cryptographic identification, Zero-bias Neural Network, Deep Learning, Memory orthogonality.
We reused the existing proofs and formulas in the original class-incremental learning paper but with a slightly modified expression to be more generalizable and explicit.
We use the term memory representations to replace the specific term device fingerprints [2].The decisional memory representations usually exist within the last dense layer of neural networks.And in this document, we do not consider the bias neurons and amplificative attentions, because we have proved that such a simplification will not impair the performance of neural networks [3], [4].

I. SEPARATION OF FINGERPRINTS AT CONVERGING POINT
Intuitively, if the memory representations (the devices' fingerprints), are distantly separated in the latent space, we will have less chance to confuse different concepts (wireless devices).To quantify the separation, the sum of the mutual cosine distances of all memory representations (devices' fingerprints) in a classification model can be defined as: where ) are devices' fingerprint vectors.Suppose we have C devices with N 1 -D fingerprint vectors.Noted that the fingerprints have been normalized into unit vectors.Therefore, if we need to find the optimal value of T D(•), we need to incorporate the constraints: Equation 1 has now become a constrained optimization problem.We solve this constrained optimization problem with the Lagrange Multiplier as: And we need to solve: Which results in a linear system of equations.For each kth C , we have: This is a homogeneous system of equations, and it is unlikely that it only has a trivial solution (zeros).Hence, 5 and Equation 5can be converted into one equation: We square Equation 6 and expand it.According to Multinomial Theorem [5] we have: Equations with an identical form of Equation 7. By summing them up, we have: On the left of Equation 8, the first part is the sum of the magnitude of fingerprint vectors.And its value is C. The second part is exactly two times 1.Now, we have: Remark 1.The sum of the mutual cosine distances of memory representations (device fingerprints) of DNN at a converging point is a predictable constant: When such a value is reached, the separation of memory representations are maximized in the latent space, indicating the lowest degree of conflict.Here, conflict can be expressed as interference in neuroscience.We will use the term Degree of Conflict (DoC) to describe the characteristic of the zero-bias DNN.Noted that the range of DoC is from − C 2 to C(C−1)

2
. The maximum value is reached when all fingerprints collide into one single vector.

II. ORTHOGONALITY APPROXIMATION
We define that the averaged cosine distance between N 1 classes is D0 , according to Remark 1, after initial training we have: N If N 1 becomes larger, we will have: And the averaged cosine distance between device fingerprints or memory representations approximates 90 degrees, thus orthogonal.Apparently, if all memory representations (device fingerprints) are orthogonally distributed, then D 0 will directly approximate zero.

III. INSIGHTS TO THE INVENTION OF NEW INCREMENTAL LEARNING ALGORITHMS
If the newly added memory representations are orthogonal to the existing ones, there will not be any conflicts or interference introduced.This is the most essential finding that motivates the invention of Channel Separation Incremental Learning, in which memories of different learning stages are organized into orthogonally separated spaces.And the biological evidence of our work has been revealed in the most recent advancement of neuroscience [2], but with a totally different roadmap.