Federated Learning using Distributed Messaging with Entitlements for Anonymous Computation and Secure Delivery of Model

Federated learning has become one of the most recent and widely researched areas of machine learning. Several machine-learning frameworks, such as Tensorflow Federated and PySyft and others have gained momentum in recent past and continue to evolve. Some of the frameworks involve techniques such as differential privacy, secure multi-party computation, gradient descent calculation over the network to achieve privacy of underlying data in federated learning. While these frameworks serve the need for a general-purpose federated learning model as per certain framework, in this paper we present a solution based on distributed messaging with appropriate entitlements that enterprises can leverage in a managed and permissioned network. The solution implements access controls on message source and destination in a decentralized network, which can implement any given data science model in the federated network to facilitate secure federated learning.


I. INTRODUCTION
The size of data, growing exponentially year-on-year, has created diversified opportunities to design complex artificial intelligence systems for personal and business usage, at scale. The idea of "big data" and "distributed computing" has together formed the noble field of distributed machine learning [6]. In the said method, the data from local devices are generally sent over to powerful and high compute oriented servers for combing through huge data, and deriving the global results. The resultant models are sent back to the individual devices [7]. This has made the low powered edge computers and mobile devices to use machine learning for predictive and classifying abilities, without worrying about the training process. Even though distributed machine learning enables a very efficient mode of computation with heterogeneous devices, it comes with its own challenge of data privacy and security. In the domain of finance, defence, government, health care, enterprise and many more, where data privacy is of utmost importance, distributed machine learning can't be used, where collaboration among the participants is expected, for the improvement of models, but data can't be shared.
With the advancement in computing resources across storage, cloud and mobile devices, machine learning in decentralized network is viable, along with improving the experience in artificial intelligence, both in enterprise and personal lives. Google first published a paper on the concept of "Federated Learning" [8] [9] in the year 2017, a special scenario of "collaborative learning", with multiple decentralized computing devices holding local data samples, without exchanging data for training. This satisfies the need for data privacy, which was missing in the distributed machine learning arrangement earlier.
In an operator managed, decentralized, peer-2-peer network, various regulations and compliances mandate participant's data privacy, from both "other participants" and "the network". However, by the same regulations and compliances, the network operator still is required to ensure the normal operation of the network by identifying any malicious activity and actors on the network. Further, the network participants want to learn and benefit from other participants' learning without knowing about their data. This poses a challenge to the network operator and diminishes the value for a participant to be part of such a network.
One of the potential solutions to this is federated (machine) learning, that allows for user data to be private and localized to itself while still allowing the analytics to be done on the individual data, without exposure. There are multiple federated learning frameworks -such as Tensorflow Federated [1], PySyft [2], etc. However, most of these frameworks are still evolving and have been adopted in enterprise networks and systems for their individual use-case, which is tightly coupled, and needs a lot of re-engineering for accommodating new machine learning models.
Federated learning works with both IID (independent and identically distributed) and Non-IID (non-independent and identically distributed) data [10]. Based on the data features, federate learning can be broadly classified into two groups. If the data across devices are consistent with the same feature set, it would be horizontal federated learning. And if data's feature set is not the same across all the devices, it would be vertical federated learning. We are proposing a generic framework in this paper, where homogenous models across the device are compatible for network-wide orchestration through a distributed messaging pattern. We are going to discuss both "Pub-Sub" and "Blockchain based private transaction" mechanisms of communication, which will maintain anonymity and secure distribution of models and messages. The mechanism is based upon the entitlements on communication channels and encryption of data in transit.
Federated computation is the crucial phase in the federated learning cycle, where the aggregation takes place. Mathematically, there are multiple ways of aggregating homogenous models across the network. "Federated averaging" [8], "Federated learning algorithm with periodic averaging and quantization" [11], "Dynamic model averaging" [12], data masking [18], noise addition [19] and many more are available in terms of federated aggregation algorithms. In our paper, we are going to propose a method to aggregate homogenous models, independently and asynchronously. An important fact is taken into consideration, the reliability of participants sending models for centralized aggregation. Collaborator needs to be aware of the ad-hoc scenarios and needs to adjust the computation dynamically, as per the network.

II. DISTRIBUTED MESSAGING
The core of our system, the distributed messaging [13] component, which serves as the communication and orchestration channel among the participant nodes and collaborator node. On the broader side, the nodes are classified as -• Collaborator node -for aggregation role • Participant node -participants in the network To facilitate the communication between multiple nodes and collaborator independently, we have implemented event-driven architecture. The events are being listened to by an individual threads/processes on the nodes, making it possible for noncollusive message delivery.
We use Apache Kafka [14], a popular pub-sub system for distributed messaging. Fig 1 depicts the communication topology for the same. Since messaging is the core of the whole service, high availability is achieved using a clustered setup of Kafka. We used 3 nodes setup for Kafka and Zookeeper cluster. "Topics" are the logical channels on Kafka, which separates the message flow. Every topic has been assigned a set of producers and consumers, based on the correct entitlements, which is visible in figure 1. We used a total of four topics, out of which two are used for plaintext messages and two for encrypted messages. We also implemented communication using blockchain, which allows the secure orchestration of models in the network, using encrypted channels.

A. Anonymous Computation
Using Kafka admin APIs, we set the role-based access controls (ACLs) on the topics. This essentially defines the which nodes can assume which role on the topics. Message packets don't contain any information about the source of origin in it, hence the messages are anonymous. The anonymity of message packets achieves the need for data privacy, as a part of federated learning. No participant node can consume messages from channels, on which participant nodes are producing, thus making it impossible for participants to read any other participating node's data. Security protocols based "Access Controls Lists" allows only valid participants to exchange messages on the messaging system. This takes care of data security. The messages exchanged on the network are encrypted, but still can maintain anonymity, because only one party's RSA key pair (only for collaborator) is static, whereas RSA key pairs generated by participants are rotating for every message. Details of cryptographic implementation have been explained in the subsequent section.

B. Secure Delivery
Along with the distributed message's channel-level security, through Kafka admin API [15] or through permissioned blockchain's channels, message level dynamic cryptography provides a secure communication layer with the least key exchange and setup overhead, making participation in the network more fluid in practice.
In "dynamic message encryption-decryption", there are two kinds of nodes involved -collaborator and participant. For both the sides, encryption and decryption functions are implemented, which is a combination of RSA [16] and AES (CBC mode) [17] algorithms. AES algorithm is used for encrypting the model, and the RSA algorithm is used for encrypting the key, being used for symmetric encryption.
In Fig-2, a standard "dynamic message encryptiondecryption" life cycle is depicted as DFA state diagram.
Description of the states are as follows - • S1 : Algorithm 7 -participant node encrypts the message with collaborator node's public key. • S2 : Algorithm 8 -Collaborator node decrypts the message using private key of itself • S3 : Algorithm 9 -Collaborator node encrypts the message using participant node's public key • S4 : Algorithm 10 -Participant node decrypts the message using the private key generated in the initial state S2 S3 S4 S1 where N is the total number of channels and P is the total number of participants. The multiplicand 2 signifies that for every logical channel, two sets of physical channels are defined. One for the direction participant node to collaborator node, and other being from collaborator node to participant node. 2 more channels are added which serves as the broadcast channels (both the directions).
Since we are using "dynamic message encryptiondecryption", all the messages are being consumed by all the entitled consumers of a given channel. But only the valid consumer, with the right set of RSA private key, can decrypt and read data. With the implementation of secure delivery, the problem of channel maintenance is also decreased to a certain extent. The total number of channels is 4 (2 for broadcast channels and 2 for the encrypted channel). Whenever a new participant is joining the network, or an old participant leaving the network, the only change needs to be performed from an infrastructure point of view is the update in access controlled lists for the channels. With the right set of entitlements, based on roles, the only collaborator can consume from channels, where participant nodes produce. This makes it impossible for any other sibling participant to read any other sibling participant's data. . . . return null Removing data end function framework proposed, the federation protocol facilitates the partial aggregation in the network [20] [21]. Equation 3 is the objective function, to be minimized on collaborator node. Participants would keep communicating their local models in the given frequency, and collaborator would aggregate the models. Once g(w) converges after multiple aggregation cycles, collaborator node would optimize the resource usage by pausing the global model update, until a better minimization of the g(w) is observed.
where 0 ≤ p ≤ 1, N c=1 p c = 1 and N denotes number of participants.
where W is the set of weight matrices in a given model, p is the weight score assigned (statically or dynamically computed) Algorithm 7 Encryption of message by participant node Require: M ∈ {collaborator | cardinality of set is 1} Require: N ∈ {participant | cardinality of set is ≥ 1} Require: K M pub ; K : Key of RSA pair Ensure: p = ∅; p : plaintext message to the participant. G(w) is the loss function [22] [23] of the given model, used for federated aggregation. If a subset of participants is unable to send the local model to the federated node, then the weight of that participant would be set to 0.
Weight of the participant plays an important role in federated learning environment [24] [25]. There are various ways to decide weight (priorities) of the participants, depending on the network setup like • Number of data points in the local data sample • Logical prioritization based on network and business use case • Reliability of local data • Frequency of participant's participation in the aggregation cycle Federated aggregation, as described in equation 5 gives a more generalized way of aggregation in case of artificial neural network/machine learning models. The framework proposed, supports various federated functions and their respective loss functions, as the aggregation for each machine learning model is independent of each other. The loss function used in the framework, as of now, given in equation 7 and equation 8, are "stochastic gradient descent" [26] and "adam" [27] loss functions respectively. The generic nature and flexibility of the framework supports the inclusion of heterogeneous machine learning models and loss functions, which can be extended, whenever required.
p c w c k , k → k th layer of model (5) where 0 ≤ p ≤ 1, N c=1 p c = 1.
where p is the scalar, priority value and w is the weight matrix, W c denotes the set of weights of all the layers of a given model in participant c, W G k denotes aggregated, global weight of layer k, of the model and W G is the set of all the aggregated, global weight layers of the given model.
where w t denotes weight matrix calculated through SGD optimizer for current time cycle. w t−1 denotes weight matrix of previous time cycle, α is the learning rate [26],ŷ i and y i are computed hypothesis and training target vectors respectively.
where w t denotes weight matrix calculated through Adam optimizer for current time cycle, w t−1 denotes weight matrix of the previous time cycle. η ∈ R ≥0 is the step size and e(x) is the error function [27] [28].

A. Experimental Setup
For validating the proposed solution and it's hypothesis in achieving the acceptable accuracy score in the federated network, we did set up the experimental environment. In the given set-up, we used two kinds of data -1) Synthetic data generated in a controlled environment, in real-time manner 2) MNIST data [29] We created a network with 5 participant nodes and 1 collaborator node, installed Apache Kafka with 4 topics, entitlements applied on the users and groups level. The Kafka is run in single-node mode for the simplicity of experimentation. We installed the Quorum permissioned blockchain network for private transactions. Needless to say, both Kafka and Quorum solve the same purpose, provisioning a secure and anonymized channel, hence we used both in separate experiment observations. Each node has RHEL 7.6 OS installed with Python 3.8 installed, along with all the supporting libraries like Kafkapython, Tensorflow 2.0, Web3Py, etc. The services are designed to run as a self-sufficient daemon in the node, which can run the analysis on real-time data. The pipelines are created for data ingestion, pre-processing, deep learning computations and finally visualization through dashboards. Each participant node has 4GB RAM with 2 core processor, whereas collaborator node has 16 GB RAM with 8 core processor. Figure 3 depicts the complete asynchronous network level architecture diagram of the setup, which accommodates the generalized (large number of ) participant nodes setup. The architecture can be logically thought of a star topology [30], which is depicted in figure 4. For testing framework's performance on various real-time data, we came up with a synthetic data generator, a simple python script to generate numeric data with given dimension and sample size. The synthetic data generated follows the gaussian distribution [31]. The data is continuously generated with a frequency of 1 min on each node. Since the data has time component in it, we treated it as time series analysis and implemented unsupervised anomaly detection using "Long Short-term Memory" deep neural network [32]. Anomaly detection's scores were then fed into the machine learning pipeline for supervised binary classification between true positives and false positives values. For binary classification, we used "Logistic regression based classification" technique [33].
Deep learning based anomaly detection is run every 5 minutes on the real-time data generated, and the result, of numeric data type, is passed on to the next stage in the pipeline. The anomaly detection model is not included in the federated learning, because it is of unsupervised nature. The model architecture for LSTM network we used is described in figure 5. The anomaly scores are calculated based on equation 9.
Logistic regression model is the one which participates in federated learning. A participant sends it's local model when the request is made by the collaborator. Once the aggregation, based on equation 5 is completed, the new updated global model is requested by the individual participant. Communication is secured through the set of RSA keys and entitlements based on encrypted channels. For visual analysis of the data and anomalies generated, figure 6 depicts one of the features, which plays the primary roles in synthetic anomaly injection. Figure 7 shows the anomalies generated by secondary features in the data. In figure 8, we can see the anomalies generated and true vs false positives decision prediction, which is computed through logistic regression model, using federated learning framework. Figure 9 depicts a common representation of 3 nodes' anomaly loss and logistic classifier's decision (0 being false-positive and 1 being true-positive) Fig. 6. The "magenta" colour regions in the graph are time regions where anomalies are synthetically generated. Spike in line graph shows the anomalies generated for "feature 1", depicted as "value1" Fig. 7. The "magenta" colour regions in the graph are time regions where anomalies are synthetically generated. Spike in line graph shows the anomalies generated for "feature 2", "feature 3", and "feature 4", depicted as "value2", "value3" and "value4"

C. Federated learning on MNIST data
In this section, we are describing the methodology of verifying our proposition on standard MNIST data [29]. We have used the same setup, described earlier, added artificial neural  During experimentation, we studied loss' dependence on learning rate of adam's optimizer, by changing the value of learning rate, as depicted in figure 11. We also studied accuracy and it's dependence on learning rate, as shown in figure  12. Table I represents loss and accuracy of all models in the federated setup. Table infers that using federated computation, we achieved better accuracy over time.  In this paper, we have presented the current common practices around federated learning in academia and enterprises. To overcome the generic challenge of secure and anonymized orchestration of models in the federated network, we are proposing an elegant and new method of realizing federated learning, independent of machine learning framework using entitled, anonymized and secure channels. We have emphasized the medium of communication would be a distributed messaging system, either like Kafka, NATS.io, pub-sub or like a permissioned blockhain/ledger like Quorum [34]. The system described in the given paper is flexible because of no dependency on any specific framework. It is highly available in nature because of distributed messaging through "highly available cluster" [35] or through "blockchain network"

A. Applications
In terms of application, the proposed federated framework is most applicable in an operator managed private, permis- Fig. 12. Accuracy (in percentage) vs learning rate of the neural network's Adam optimizer sioned, consortium decentralized networks. On a permissioned blockchain, we can categorize most of the activities in two broad categories: 1) Transactional data between two participants. Examples of this may include data related to trades, personal information, token, money transfer etc. The data could be classified as highly confidential and sensitive. 2) Network related meta-data such as count of a specific categories of transactions, frequency of transactions, throughput & latency of such transactions. Transactional data between two participants can be used to train machine learning models that can learn from such data to identify anomalies or specific patterns in the transactions. Such models, when fully trained, can make predictions of future transactional behaviour on the network.
To further illustrate anomalies, two participants have learnt that they have observed a large number of transactions for a specific currency in the last N hours or days that can be attributed to a specific geo-political event. Interestingly, other pairs of participants have observed similar behaviour/patterns across other events. However, because of privacy and compliance concerns, other than only those participants that were part of those such transactions have visibility into such patterns. So, how do other participants on the network benefit from such a data pattern on the network? The proposed solution can be implemented in this case where the network operator can ensure a secured (perhaps by operational procedures) messaging infrastructure (Kafka, blockchain or similar) and allow network participants to communicate with that messaging infrastructure along with proposed techniques for sharing the model. Further, the same infrastructure can be leveraged for unsupervised anomaly detection on the network. Some other applications on similar networks can include improving network efficiency by leveraging secured computation of network-wide aggregate metrics, providing transparency in the network without compromising the privacy of underlying data. As the number of enterprise blockchains and decentralized networks continue to evolve, along with increasing the volume of transactions on such networks. Further, it is evident that the more successful networks would provide its participants the benefit of the network effect and learning, without requiring the members of the network to share proprietary and competitive data to other members. It is also observed that most enterprises technology teams are quite familiar with public key infrastructure along with messaging systems, such as Kafka. A federated learning solution based on this technology stack can increase its adoption and usability.

B. Challenges and future scope
The proposed solution opens up scope for stringent safeguarding the security of encryption/decryption keys. Any compromise of the private keys of the aggregator by a malicious operator can potentially leak some data attributes, which in result can compromise data privacy in general over the network. The possible ways to safeguard the keys would be to use key management solutions [36] [37] like a vault. In the current model, there is only one and predefined collaborator, which is decided by the network operator. In future, the choice of collaborator can be based on a consensus algorithm among all of the participants. One potential algorithm could be RAFT based [38] [39], where the collaborator could be elected by the rest of the participants in the network for a certain duration. In such cases, it is possible to extend the solution to a true peer-2-peer decentralized network where there is no network operator responsible for the governance of the messaging infrastructure, which brings federated learning in more fluid and ad-hoc infrastructure.