SVFL: Efficient Secure Aggregation and Verification for Cross-Silo Federated Learning

Cross-silo federated learning (FL) allows organizations to collaboratively train machine learning (ML) models by sending their local gradients to a server for aggregation, without having to disclose their data. The main security issues in FL, that is, the privacy of the gradient and the trained model, and the correctness verification of the aggregated gradient, are gaining increasing attention from industry and academia. A popular approach to protect the privacy of the gradient and the trained model is for each client to mask their own gradients using additively homomorphic encryption (HE). However, this leads to significant computation and communication overheads. On the other hand, to verify the aggregated gradient, several verifiable FL protocols that require the server to provide a verifiable aggregated gradient were proposed. However, these verifiable FL protocols perform poorly in computation and communication. In this paper, we propose SVFL, an efficient protocol for cross-silo FL, that supports both secure gradient aggregation and verification. We first replace the heavy HE operations with a simple masking technique. Then, we design an efficient verification mechanism that achieves the correctness verification of the aggregated gradient. We evaluate the performance of SVFL and show, by complexity analysis and experimental evaluations, that its computation and communication overheads remain low even on large datasets, with a negligible accuracy loss (less than <inline-formula><tex-math notation="LaTeX">$1\%$</tex-math><alternatives><mml:math><mml:mrow><mml:mn>1</mml:mn><mml:mo>%</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="luo-ieq1-3219485.gif"/></alternatives></inline-formula>). Furthermore, we conduct experimental comparisons between SVFL and other existing FL protocols to show that SVFL achieves significant efficiency improvements in both computation and communication.


INTRODUCTION
F EDERATED learning (FL) [29] is a promising collaborative machine learning (ML) framework allowing models to be trained on sensitive real-world data while preserving its privacy.The main feature of FL is that the training data does not need to leave its local repositories.That is, FL enables participating entities to train the model on their data independently and in parallel, which greatly reduces data privacy risks and improves training efficiency/scalability.Not surprisingly, FL has been widely used in various applications such as disease outbreak discovery [32], intrusion detection for Internet of Things (IoT) [40], next word prediction [35], and autonomous driving [47].
In FL, each client trains a copy of a global model locally on their data and computes a local gradient vector, which is then sent to a centralized server (i.e., aggregator).The server combines these gradient vectors and obtains an aggregated gradient, which is then sent back to all clients.Upon receiving the aggregated gradient, each client updates the global model and proceeds to the next training iteration.However, this FL process raises at least the following three important privacy concerns.1) The server can learn information about the clients' local training data by analyzing their gradient vectors.This type of attacks is often referred to as inference attacks [51].2) The server may manipulate the global model at will by providing each client with a malformed aggregated gradient.In particular, a "lazy" server may reduce the aggregation operation to save computational cost, or worse, maliciously forge an aggregated gradient.3) The server may know the trained model.This is because the server obtains the aggregated gradient from which it can learn some information about the trained model.In some cases, clients may wish to keep the trained model private.In response to the above privacy issues, the concepts of secure aggregation [8], verifiable FL [57], and cross-silo FL [63] were proposed.In the following, we will elaborate on these concepts in sequence.

Secure Aggregation
Secure aggregation entails computing a multiparty sum where no client reveals its local gradient vector in the clear (even to the server).In fact, the secure aggregation problem has been a research hotspot, and it has been addressed by different approaches including secure multiparty computation (MPC) [9], partially/fully homomorphic encryption [39], [41], functional encryption [5], and double-masking [8], [10].We refer to Related Work for details about these solutions.
Based on different application scenarios, FL can be roughly divided into two categories: cross-device FL, where clients are a large number of mobile or edge devices (e.g., smartphones, personal computers, and IoT devices) with unreliable communication and limited computing power [60], and cross-silo FL, where clients are a small number of organizations (e.g., hospitals, financial companies, and research institutions) with sufficient computing resources and reliable communications [27].From the security perspective, the main difference between cross-device FL and cross-silo FL is that the server in cross-device FL can see the trained model, while in cross-silo FL only clients can see the trained model; that is, no external party, including the server, is allowed to access to the trained model (even the training model).In addition, in the cross-silo setting, we usually do not need to consider dropouts, which are common in the cross-device setting [60].
Most of the existing research on FL is mainly addressing cross-device FL, but in recent years, cross-silo FL is receiving increasingly wide attention, especially for healthcare applications, such as medical imaging [26], diagnostic and prognostic biomarkers [50], and epidemic detection [13].In this paper, we focus on cross-silo FL.
Recently, WeBank [4] developed an opensource crosssilo FL framework named FATE [1], which has built-in support for Pailler cryptosystem [41], a well-studied additively homomorphic encryption (HE) scheme.Subsequently, by replacing the HE in FATE with an efficient batching encryption scheme, Zhang et al. [63] proposed BatchCrypt, which substantially improves training speed and reduces the communication overhead over FATE.

Verification
Another related concern is the verifiability of the aggregated gradient.As mentioned above, the server aggregates local gradients sent by the clients and obtains an aggregated gradient, which is then sent back to each client.This raises a natural question: how to verify the correctness of the aggregated gradient?Clearly, if we cannot verify the correctness of the aggregated gradient, how can privacy issue 2) above be avoided?This led to the proposals of verifiable FL [17], [21], [57], which address the issue of correctness verification of the aggregated gradient.In verifiable FL, besides the masked gradient, each client also needs to send some auxiliary information to help the server generate a "proof" of correctness of the aggregated gradient when it is correctly computed.Then, the server returns an aggregated gradient and the "proof" to each client.Finally, each client checks whether the "proof" is correct on the aggregated gradient.

Contributions
In this paper, we propose SVFL, the first cross-silo FL protocol supporting secure aggregation and correctness verification of the aggregated gradient, with reduced communication and computation overheads compared to the previous solutions.Table 1 compares SVFL with the previous secure aggregation protocols, cross-silo FL, and verifiable FL.Technically, in the cross-silo setting, we use a simple masking technique called masking with one-time pads (Motp) to guarantee the privacy of the local gradient and trained model.We also adopt a secure homomorphic network coding signature scheme to realize the correctness verification of the aggregated gradient.Our contributions can be summarized as follows: In the cross-silo setting, we replace the additively HE used by BatchCrypt [63] with a simple masking technique (Motp) to protect the privacy of the local gradient and the trained model.Compared with BatchCrypt, SVFL accelerates the model training by about 414 to 605 times without adding any communication overhead.Furthermore, to achieve the correctness verification of the aggregated gradient, we design an efficient verification mechanism using an efficient homomorphic network coding signature scheme (HNsig [18]).We provide a comprehensive security analysis for SVFL, which demonstrates the privacy of the local gradient and the trained model as well as the verifiability (i.e., the correctness verification of the aggregated gradient).We provide a complexity analysis of computational and communication costs for SVFL.We conduct extensive experiments on convolutional neural network (CNN) with MNIST dataset and AlexNet [31] with CIFAR10 dataset.The experimental results show that SVFL is efficient in terms of both computation and communication, with a negligible accuracy loss (less than 1%).In particular, we compare SVFL with BatchCrypt [63], VFL [17], Veri-fyNet [57] and VeriFL [21].The comparison results show that our SVFL has significant efficiency advantages in both computation and communication.We remark that SVFL is also suitable for cross-device settings by using a Trusted Authority (TA) to initialize the ML model and generate the required parameters, which is commonly used in verifiable FL.We leave this as an extension to this work.

Organization
The remainder of the paper is organized as follows.We briefly outline related work in Section 2. In Section 3, we recall a homomorphic network coding signature over integers and present a simple masking technique called masking Secure Aggregation [8], [10], [7] ✓ ✗ ✗ [25], [52], [15], [6], [59], [48] BatchCrypt y Each client can verify the correctness of the aggregated gradient returned by the server.
with one-time pads.We present the system architecture and the concrete construction of SVFL in Section 4, followed by the correctness and security analysis in Section 5. We evaluate the efficiency of SVFL from theoretical and experimental perspectives in Sections 6 and 7, respectively.In Section 8, we compare SVFL with some previous solutions including cross-silo FL and verifiable FL.Finally, Section 9 concludes the paper.

RELATED WORK
In this section, we survey the existing secure aggregation protocols, cross-silo FL and verifiable FL.

Secure Aggregation
Secure Aggregation with Double-Masking.The first secure aggregation protocol was proposed by Bonawitz et al. [8], who used the double-masking technique (using pseudorandom values), Shamir's secret sharing [49], key agreement protocol [11], and symmetric encryption to protect the privacy of the local gradient and handle dropouts.However, their secure aggregation protocol requires at least 4 rounds of communication between each client and the server in every iteration.For resource-limited clients connected over a WAN, this communication overhead can be prohibitive.
Based on the framework proposed by Bonawitz et al. [8], Bell et al. [7] and Choi et al. [10] proposed secure aggregation protocols with polylogarithmic communication and computation overheads.Their protocols achieve better computational and communication efficiency than the secure aggregation protocol of [8].The key idea behind [7] and [10] is to replace the complete communication graph of [8] with a sparse random graph and to use secret sharing only for a subset of clients instead of for all clients.
To address such efficiency challenge, So et al. [52] recently proposed Turbo-Aggregate, which uses a circular communication topology to reduce the computation and communication overheads of the secure aggregation protocol of [8].Beguier et al. [6] proposed SAFER, a secure aggregation protocol between multiple servers with low computation and communication overheads.SAFER achieves low computational and communication costs by using model update compression and arithmetic sharing.
Kadhe et al. [25] proposed FastSecAgg, a secure (private) aggregation protocol based on Fast Fourier Transform multi-secret sharing.FastSecAgg is secure against adaptive adversaries where clients can adaptively be corrupted during the execution of the protocol.Similarly, Fereidooni et al. [25] proposed SAFELearn, a generic design for secure (private) aggregation.SAFELearn can be instantiated with MPC or HE and only needs 2 rounds of communication in each training iteration.
Secure Aggregation with Encryption.Homomorphic Encryption (HE) allows certain operations (e.g., addition) to be performed directly on encrypted data.Such property is exactly what is needed for secure aggregation.A number of proposals [12], [23], [34], [42], [45], [53], [58] have been made to build a secure aggregation protocol for FL using additively HE and multi-key HE.
Recently, Sav et al. [48] proposed POSEIDON, using a multiparty lattice-based homomorphic encryption scheme [38].To improve the efficiency of POSEIDON, the authors provided a generic packing approach so that single-instruction-multiple-data (SIMD) operations can be efficiently performed on encrypted data.
Similarly, Xu et al. [59] proposed Hybridalpha, using a multi-input functional encryption scheme [5] and differential privacy (DP) [14].With multi-input functional encryption, each client obtains a public key and uses it to encrypt the local gradient, while the server obtains a function key and uses it to compute the average cumulative sum of the clients' gradients.
MPC and/or DP.Other solutions [19], [37], [43], [54], [55] combine MPC techniques and/or DP with ML.However, these protocols often use some heavy cryptographic primitives and are usually customized for specific ML algorithms, which limits their flexibility and scalability.Therefore, such approaches are not suitable for FL.

Cross-Silo FL
Fig. 1 depicts a typical architecture of cross-silo FL [60], [63], [64].In cross-silo FL, a public/private key pair of a HE scheme will be distributed to each client, and the public key will be sent to the server.In each training iteration, the server will randomly select a client as the leader who generates and distributes the public/private key pair of the HE scheme.The leader also initializes the ML model and sends the model parameters to all other clients.Upon receiving the model parameters and public/private key pair, each client trains the model locally, computes the gradient, encrypts it with the public key, and sends the encrypted gradient to the server.The server aggregates all encrypted gradients and returns an aggregated encrypted-gradient to all clients.Each client decrypts the aggregated encryptedgradient and updates the local ML model.
Based on the above framework, WeBank [4] developed an opensource cross-silo FL called FATE [1], where HE is implemented as a pluggable module.Recently, Zhang et al. [63] proposed BatchCrypt using an efficient batch encryption technique.Compared to FATE, BatchCrypt achieves 23Â-93Â training speedup while reducing the communication cost by 66Â-101 Â .Their key idea is to use a new batch encoding scheme to encode a batch of quantized gradients into a long integer and encrypt it in one go.Compared with full precision encryption of a single gradient, this batch encryption substantially reduces encryption overhead and data transmission.However, compared with the plain FL, the batch encryption is still expensive in terms of computation and communication.
Remark 1.In cross-silo FL, the goal of the clients is to train a common ML model that is accessible only to the clients.Hence, it is usually assumed that the clients are honestbut-curious, i.e., they are following the protocol honestly, but will attempt to infer information about other clients' local gradients by colluding with other clients or even the server.However, if one of the clients does not follow the protocol honestly, then they cannot collectively train the desired model, which is to no one's advantage.In other words, active adversaries that deviate from the protocol, sending incorrect and/or arbitrarily chosen messages to honest clients, omitting messages, aborting, and sharing their entire view of the protocol with each other, and also with the server (if the server is an active adversary) are usually not considered in the cross-silo setting.Therefore, choosing any client as the leader will not lead to higher security risk.In fact, such active adversary scenarios are challenging not only in the cross-silo setting, but also in the cross-device setting.The main reason is that the defenses (if they exist) can easily clash with the data privacy requirement; indeed, such defenses always need to have access to the clients' data to detect the clients' behaviors.In this work, we also consider choosing the leader in this way.

Verifiable FL
The first verifiable FL was proposed by Xu et al. in [57], called VerifyNet.The authors used the double-masking technique proposed by Bonawitz et al. [8], Shamir's secret sharing, key agreement protocol, and symmetric encryption to protect the privacy of the local gradient and handle dropouts, which is similar to what Bonawitz et al. did in their secure aggregation protocol [8].To verify the correctness of the aggregated gradient, they combined homomorphic hash functions [62] with Pseudorandom Functions [16].However, VerifyNet requires huge computational and communication costs, where the communication overhead increases linearly with the gradient size.Another limitation of Verify-Net is that it relies on TA.
Another verifiable FL is proposed by Guo et al. in [21] called VeriFL, which reduces the computational and communication overhead required by VerifyNet.Unlike Verify-Net, VeriFL combines a linearly homomorphic hash with a commitment scheme to achieve the correctness verification of the aggregated gradient.VeriFL not only relies on TA, but also requires another 3 rounds of communication for the correctness verification of the aggregated gradient.
Recently, Fu et al. [17] proposed VFL using Lagrange interpolation, which not only supports secure aggregation and correctness verification of the aggregated gradient, but also protects the trained model from being leaked to the server.However, due to the use of the Lagrange interpolation, VFL incurs a large communication overhead that increases linearly with the degree of the polynomial involved in the Lagrange interpolation.In addition, VFL does not consider dropouts.Like VerifyNet and VeriFL, VFL also relies on TA.

PRELIMINARIES
In this section, we review the secure network coding signature over integers by Gennaro et al. [18] that will be used as a building block of SVFL, and present a simple masking technique.

Secure Homomorphic Network Coding
Signature Over Integers Network coding [33] provides an alternative, decentralized approach to traditional multi-cast routing.Gennaro et al. [18] focused on random linear network coding and proposed the first RSA-based homomorphic network coding signature scheme in the random oracle model.
Definition 2 (RSA Assumption [46]).Given a composite number N ¼ pq and an exponent e prime to the 'ðNÞ, where p; q are distinct primes and 'ðÁÞ is the Euler's totient function, the RSA assumption is defined to efficiently find the e-th root modulo N of a random y Z Ã N .For large RSA key sizes, e.g., 2048 bits, the efficient method to solve this problem is not known at present.The subgroup of the squares is denoted by QR N , where NÞg: Definition 3. A homomorphic network coding signature scheme contains a tuple of algorithms HNsig ¼ ð Setup, Sign, Vrfy, CombineÞ, defined as follows: Setupð1 ; 1 N Þ.On input a security parameter and a RSA modulus N ¼ pq, where p; q are distinct safe primes, which makes QR N cyclic and random elements in QR N the generators of QR N with high probability.Output a public key pk ¼ ðN; e; g 1 ; . . .; g n ; HÞ and a private signing key sk ¼ ðdÞ, where e 2 Z Ã 'ðNÞ is an exponent prime to the 'ðNÞ, ðg 1 ; . . .; g n Þ are random generators of QR N , H : f0; 1g Ã !Z Ã N is a collisionresistant hash function, and d 2 Z Ã 'ðNÞ satisfies ed ¼ 1ðmod 'ðNÞÞ: Signðsk; fid; v ðiÞ Þ.On input a signing key d, a random file identifier fid 2 f0; 1g Ã , and an augmented vector w ðiÞ ¼ u ðiÞ jjv ðiÞ ¼ ðu , where u ðiÞ is the i-th unit vector, output a signature ðmod NÞ: In particular, for any vector w ¼ ðu 1 ; . . .; u m ; v 1 ; . . .j ðmod NÞ: Combineðpk; fðw ðiÞ ; s i ; a i Þg ' i¼1 Þ.On input a public key pk and ' triples ðw ðiÞ ; s i ; a i Þ for ' 2 ½m, where a i 2 Z, check if Vrfyðpk; w ðiÞ ; fid; s i Þ ¼ 1 holds for i 2 ½'.For those ðw ðiÞ ; s i ; a i Þ that pass verification, e.g., ðw ð1Þ ; s i¼1 a i w ðiÞ , and output a signature on w: The correctness of the above signature scheme is straightforward.The security of the above signature scheme says that any PPT adversary, given access to the public key and signature queries (adaptively) for a set of vectors v ð1Þ ; . . .; v ðmÞ (corresponding to the augmented vectors w ð1Þ ; . . .; w ðmÞ ), cannot forge a valid signature s Ã on a vector w Ã under a file identifier fid Ã 2 f0; 1g Ã , where fid Ã is one of the identifiers chosen by the challenger (signer) and the vector w Ã is not in the linear span of the vectors ð w ð1Þ ; . . .; w ðmÞ Þ.We refer the interested reader to [18] for details about the security definition, and here we only recall the following theorem.
Theorem 4 ( [18]).Under the RSA assumption, the above homomorphic network coding signature scheme HNsig is secure against any PPT adversary in the random oracle model.Lemma 5. Given w ðiÞ ¼ u ðiÞ jjv ðiÞ 2 Z mþn , where u ðiÞ is the i-th unit vector, for any vector w ¼ P ' i¼1 a i w ðiÞ and w 0 ¼ P ' i¼1 b i w ðiÞ for ' 2 ½m, if w ¼ w 0 , then we have a i ¼ b i for i 2 ½'.
Proof.Since u ðiÞ is the i-th unit vector, we have The Lemma 5 indicates that any vector w in the linear span of the vectors ð w ð1Þ ; . . .; w ðmÞ Þ can be represented as a unique linear combination of the vectors ð w ð1Þ ; . . .; w ðmÞ Þ. Combining Theorem 4 with Lemma 5, we can obtain the following lemma.Lemma 6.For secure HNsig scheme, given a series of valid vector-signature pairs ð w ð1Þ ; s 1 Þ; . . .; ð w ðmÞ ; s m Þ under an identifier fid Ã 2 f0; 1g Ã , if s Ã is a valid signature on a vector w Ã ¼ ðg 1 ; . . .; g m ; w 1 ; . . .; w n Þ under the file identifier fid Ã 2 f0; 1g Ã , then the vector w Ã must be a unique linear combination of the vectors ð w ð1Þ ; . . .; w ðmÞ Þ, i.e., w Ã ¼ P m i¼1 g i w ðiÞ .
Proof.First, by Theorem 4, if s Ã is a valid signature on a vector w Ã ¼ ðg 1 ; . . .; g m ; w 1 ; . . .; w n Þ under a file identifier fid Ã 2 f0; 1g Ã , then the vector w Ã must be in the linear span of the vectors ð w ð1Þ ; . . .; in the linear span of the vectors ð w ð1Þ ; . . .; w ðmÞ Þ, then the vector w Ã must be a unique linear combination of the vectors ð w ð1Þ ; . . .; The Lemma 6 ensures that we can derive a unique linear combination of the vectors ð w ð1Þ ; . . .; w ðmÞ Þ for the vector w Ã from a valid vector-signature pair ðs Ã ; w Ã Þ.This plays a crucial role in our SVFL.

Masking With One-Time Pads
We present a simple masking technique that we call masking with one-time pads, or Motp for short.Motp is similar to the encryption one-time pad (OTP) [44] and that of [8], [17], [57].
Given a vector x 2 Z n R for some R 2 N, choose a uniformly random vector r 2 Z n R , then, we mask the vector x as follows: Given the mask vector r 2 Z n R , we can unmask the vector z 2 Z n R as follows: x ¼ z À rðmod RÞ: where "" indicates that the distributions are identical.
Proof.The proof is straightforward by mathematical induction.First, when ' ¼ 1, let z 1 ¼ x 1 þ r 1 ðmod RÞ, then we have z 1 y 1 due to the property of uniform distribution, where where y kþ1 $ Z n R .On the other hand, we have where y $ Z n R .Therefore, we have where To reduce the communication cost, we use a secure Pseudorandom Generator [61] (PRG) to generate a mask vector r ¼ PRGðsÞ 2 Z n R , where s $ f0; 1g l is a uniformly random seed.By the security of the PRG, we have that r % comp r 0 ; where r 0 $ Z n R and X % comp Y indicates that X and Y are computationally indistinguishable, as long as s is kept secret from the PPT distinguisher.
To mask a vector x 2 Z n R , we compute: Given the seed s, we can unmask the vector z 2 Z n R : x ¼ z À PRGðsÞðmod RÞ: Combining Lemma 7 with the security of the PRG yields the following lemma.

THE PROPOSED PROTOCOL
In this section, we construct SVFL, an efficient cross-silo FL protocol supporting secure aggregation and verification, using HNsig scheme and our Motp scheme.Client.The role of a client includes: 1) initializing the system by generating initial model parameters and some related parameters (one of the clients); 2) training the model locally, masking the local gradient, and signing the masked gradient; and 3) checking the correctness of the aggregated gradient.Server.The role of the server includes: 1) randomly selecting a leader from the clients; 2) aggregating the masked gradients; and 3) returning an aggregated masked-gradient along with an aggregated signature ("proof").

The SVFL Protocol
Let C ¼ fC 1 ; . . .; C m g be a set of clients with a corresponding local datasets D ¼ fD i ; . . .; D m g.SVFL consists of four phases: Initialization, Model Training, Aggregation and Verification and Update: Initialization.In each training iteration, the server will randomly select a client as the leader to initialize the model and generate m seeds of PRG and a public/secret key pair of HNsig scheme.The leader also chooses a random string and m weights.Then, the leader sends the model parameters, the seeds, the public/secret key pair, the random string and m weights to all other clients, while sending the public key, the random string and m weights to the server.Model Training.Upon receiving the model parameters, m seeds, the public/secret key pair, the random string and m weights, each client trains the model locally, computes the local gradient, masks it using Motp scheme, signs the masked gradient using HNsig, and sends the (masked) gradient-signature pair to the server.Aggregation.Upon receiving the gradient-signature pair from each client, the server aggregates all masked gradients and computes an aggregated signature on the aggregated masked-gradient using HNsig.Finally, the server returns the aggregated masked-gradient and its aggregated signature to each client.Verification and Update.After receiving the aggregated masked-gradient and its aggregated signature from the server, each client first validates the signature, then unmasks the aggregated masked-gradient, updates the local model, and starts the next iteration.Otherwise, the client notifies the other clients and terminates SVFL.Clearly, our SVFL applies to cross-device setting by assuming a TA, which initializes the model and generates the required parameters.We provide a general description of SVFL in Fig. 3.We also provide details about the masking and signing algorithm (see Algorithm 1), the verification and aggregation algorithm (see Algorithm 2), and the verification and update algorithm (see Algorithm 3).We reiterate that SVFL is for cross-silo FL, so the clients are a handful of organizations with adequate computing resources and reliable communications.This means that we do not need to consider dropouts.Moreover, the public/secret key pair of HNsig and the m weights can always be reused, but the random string fid and the m seeds of PRG cannot be reused.Hence, in each iteration the server needs to re-randomly select a leader to re-generate the random string and the m seeds.Of course, we can assume that there is a TA to initialize the trained model and generate these parameters, but it is reasonable to assume that some randomly selected client plays this role in the cross-silo FL setting (see Remark 1).Indeed, After all, generating a public/secret key pair for HNsig is costly, but such cost is one-time; generating a random string and m seeds is a recurring task yet a simple task.

CORRECTNESS AND SECURITY ANALYSIS
In this section, we analyze the correctness and security of SVFL.

Correctness of SVFL
The correctness of SVFL requires that each client obtain correct aggregated gradient and valid aggregated signature, as long as each client and the server run SVFL protocol honestly.
Theorem 9 (Correctness).We assume that the elements of weighted gradients (i.e., a i c W i ) are in the same range ½0; B À 1 (e.g., 32 bits long) and R ¼ mðB À 1Þ þ 1, where B 2 N. Let be the security parameter and ðs 1 ; . . .; s m Þ 2 f0; 1g l be m random seeds of PRG, where we let l ¼ .Then, if each client and the server execute SVFL honestly, each client will get correct aggregated gradient and valid aggregated signature.

Proof. It is not hard to prove this due to the correctness of
Motp and HNsig schemes.Specifically, given fð c W i ; s i ; a i ; iÞg m i¼1 , where c W i ¼ W i þ PRGðs i Þðmod RÞ, the server can compute an aggregated masked-gradient and a corresponding aggregated signature by running c W P m i¼1 a i c W i and s Combineðpk; fð f W i ; s i ; a i Þg m i¼1 Þ, respectively.By the correctness of the HNsig scheme, the for some K 2 N. Therefore, we have

Threat Model of SVFL
In a cross-silo FL system, we want to prevent each client's local gradient from being leaked to other clients as well as both the local gradients and the trained model (aggregated gradient) from being leaked to the server, while ensuring that the server aggregates the masked gradients honestly.
To preserve the privacy of the local gradients and the trained model, we consider the honest-but-curious setting, where both the clients and the server will follow the protocol honestly, but will attempt to infer information about the clients' local gradients.

Algorithm 3. Verification and Update Algorithm
Input: ð c W; sÞ; ðs 1 ; . . .; s m Þ; pk; ða 1 ; . . .; a m Þ; M t ; h.Output: M tþ1 .1. Check if the signature is valid: Q n j¼1 g Ŵj j ðmod NÞ; where h i ¼ Hði; fidÞ and c W ¼ ð Ŵ1 ; . . .; Ŵn Þ; 2. Unmask the vector c W: 3. Update the local model parameters: However, unlike cross-device FL, in cross-silo FL, we do not consider collusion between the server and the clients.Recall that all clients share the same information (i.e., the model parameters, the seeds, the public/secret key pair, the random string and m weights) distributed by the leader, while the server can only obtain the public key, the random string and m weights.If the server colludes with some clients, then the clients can only provide the server with the public key, the random and m weights that the server already has, as providing information other than these will result in the disclosure of not only the trained model (it is the common interest of all clients in the cross-silo setting) but also their own local gradients (it follows from the construction of SVFL).In other words, in cross-silo FL, the collusion between the server and the clients does not provide any additional information to the server and therefore makes no sense to the server.But if the server is willing to collude with some clients despite information asymmetry (i.e., the server cannot get any useful information from the clients), the clients can compute other clients' local gradients by unmasking their masked gradients provided by the server.
Based on the above analysis, coupled with the fact that protecting the local gradients and the trained model from being learned by any external parties (including the server) is the primary purpose of cross-silo FL, it is reasonable to not consider the collusion between the server and the clients.For completeness, we discuss some potential approaches to defend against such collusion attacks in Remark 10.In fact, such collusion was also not considered in HE-based FL (e.g., [12], [23], [45], [53]).Therefore, we consider two types of attacks: one is initiated by the honest-but-curious clients, where up to m À 2 clients collude to infer information about the other clients' local gradients, and the other is initiated by the honest-but-curious server whose goal is to infer information about the clients' local gradients and the trained model.
As for the security of verification, like the previous verifiable FL [17], [21], [57], we consider an active-adversary setting, where the malicious server may not aggregate the masked gradient honestly, instead tries to forge an aggregated masked-gradient and its corresponding "proof".We point that the concrete threat model relies on the specific technique used to achieve this.Remark 10.One potential defense against such collusion attacks (note that here we do not consider the security of verification) is to use a TA to generate m weights fa i 2 Ng i2½m and m þ 1 random masked vectors fs i 2 f0; 1g l g i2½m ; r 2 Z n R such that P m i¼1 a i PRGðs i Þ ¼ r and distribute ðs i ; rÞ to the corresponding client C i for i 2 ½m.Then, by the Motp scheme, each client C i uses the masked vector PRGðs i Þ to mask its local gradient.Upon receiving the aggregated masked-gradient from the server, each client uses the vector r to unmask the gradient and obtains the desired sum.This approach is highly efficient, but it relies on the TA.Another potential defense mechanism is to have the clients generate the above m weights and vectors ðs i ; rÞ themselves by running an MPC protocol.While the efficiency of this approach is acceptable, it requires more communication rounds.Yet another potential defense mechanism is to use some distributed cryptographic primitives, such as multi-key homomorphic encryption scheme.This approach may not require more communication rounds, but it is often inefficient.

Security of SVFL
We first show that SVFL can protect the privacy of clients' local gradients and aggregated gradient.Then, we show that SVFL can prevent the malicious server from forging aggregated masked-gradient and its signature.
Privacy.In the honest-but-curious setting of our threat model, each client and the server will follow the protocol honestly, but try to infer the clients' local gradients and aggregated gradient.Therefore, we can use a simulationbased proof which is standard for MPC protocol to prove the privacy of the clients' local gradients and aggregated gradient.
We first consider the privacy against the honest-but-curious clients, who have their own local gradients and the aggregated gradient.Given any subset U C of the clients, where C is the set of all clients (jCj ¼ m), let REAL C; U ðfða i ; W i Þg i2C ; ðs 1 ; . . .; s m ÞÞ be a random variable representing the joint view of the clients in U.It suffices to prove that the joint view of the clients in U can be simulated given only the gradients of the clients in U and the aggregated gradient.This indicates that these honest-but-curious clients learn nothing more than the sum of gradients of the other clients and their own gradients.
Then, we consider the privacy against the honest-butcurious server, who receives m masked gradients and their signatures from clients.Note that the server does not collude with any client according to our threat model.In addition, the signatures are for the masked gradients, they do not reveal any information about the local gradients.Therefore, it suffices to show that these masked gradients and any aggregated masked-gradient can be simulated given only some public information.This indicates that the server learns nothing more than some random vectors.where W ¼ P i2C a i W i .Proof.Since the server is not involved, the joint view of the clients in U does not depend on the local gradients of the clients not in U. Given ðW; fða i ; W i Þg i2U Þ, where U C and W ¼ P i2C a i W i , the simulator SIM can generate the simulated view of the clients in U by running the clients in U on their real local gradients fW i g i2U and the seeds fs i g i2U , and the clients in C n U on some uniformly random vectors.More specifically, the simulator SIM generates the real masked gradient f c W i g i2U using fðs i ; a i ; W i Þg i2U for the clients in U.For all honest clients in C n U with the local gradients fW j g j2CnU , instead of computing the masked gradient c W j ¼ W j þ PRGðs j Þðmod RÞ for j 2 ½jC n Uj, the simulator SIM selects jC n Uj uniformly random vectors R , the simulator SIM can simulate the view of the honest-butcurious server by simply choosing m uniformly random vectors Z i 2 Z n R for i 2 ½m.Then, by Lemma 8, we have for i 2 ½m.
In addition, note that the server can aggregate any number of masked gradients.Then, by Eq.(2), we have Verifiability.Following our threat model, we now show that the malicious server cannot pass each client's checking.
Theorem 13.Under the same parameter selections as Theorem 9, each client can detect forged aggregated masked-gradient.
Proof.Let ð c W i ; s i ; a i Þ for i 2 ½m be (masked) gradient-signature tuples that the malicious server got from clients.Recall that any signature satisfies where c W i ¼ ð Ŵi;1 ; . . .; Ŵi;n Þ.Let ðW; sÞ be an aggregated gradient-signature pair that the malicious server sends to each client during the Verification and Update phase.Each client checks if the signature is valid, i.e., Vrfyðpk; W 0 ; fid; sÞ ¼ ?1; where W 0 ¼ ð P m i¼1 a i u i ÞjjW.Assume that ðW 0 ; sÞ is a forgery, then we have that the signature s can pass each client's checking but W 6 ¼ P m i¼1 a i c W i .However, if the signature passes each client's checking, then we have where h i ¼ Hði; fidÞ and W ¼ ð W 1 ; . . .; W n Þ (by Algorithm 3).This means that s is a valid signature on the vector W 0 ¼ ða 1 ; . . .; a m ; W 1 ; . . .; W n Þ 2 Z mþn .Therefore, by Lemma 6, the vector W 0 must be a unique linear combination of the vectors ðu and hence we have which contradicts our earlier assumption.
This means that if the signature on an aggregated masked-gradient returned by the server passes each client's checking, then the aggregated gradient is correct.In other words, any forged aggregated gradient-signature pair cannot pass each client's checking.t u

COMPLEXITY ANALYSIS
In this section, we provide a concrete complexity analysis of SVFL.

Performance Analysis of Client
Computation Cost.The computation cost of each client consists of: 1) masking the local gradient, which takes OðnÞ time; 2) signing the masked gradient, which takes OðnÞ time; 3) verifying the signature returned by the server, which takes Oðm þ nÞ time; 4) unmasking the aggregated masked-gradient, which takes OðnÞ time; 5) updating the model, which takes OðnÞ time.Overall, each client's computation is Oðm þ nÞ.In addition, the client selected as the leader needs to initialize the system, which takes OðnÞ time.
Communication Cost.Each client only needs to send a masked gradient and its signature to the server, which requires an overall communication cost of ndlog 2 Re þ dlog 2 Ne bits (see the parameter setting in Theorem 9).Moreover, the client selected as the leader also needs to send the public/secret keys, m seeds, the initialized model parameters, the random string and m weights to all other clients, and send the public key, the random string and m weights to the server, which needs an overall communication cost of Be bits.Therefore, compared with plain FL, this implies a communication expansion factor of ðndlog 2 Re þ dlog 2 NeÞ=ndlog 2 Be.Since R ¼ mðB À 1Þ þ 1 (see Theorem 9), for B ¼ 2 32 (i.e., 32 bits), n ¼ 2 20 elements, N ¼ 2 1024 , and m ¼ 100 clients, the expansion factor is 1.21.
Storage Cost.Besides the local dataset, each client must store the model parameters, public/secret keys, m seeds, a random string and m weights, with a total storage cost of

Performance Analysis of Server
Computation Cost.The computation cost of the server 1 consists of: 1) aggregating all masked gradients, which takes Oðm þ nÞ time; 2) computing an aggregated signature, which takes OðmÞ time.Overall, the server's computation is Oðm þ nÞ.
Communication Cost.The server only needs to send the aggregated masked-gradient and its signature to each client, with an overall communication cost of mndlog 2 Re þ mdlog 2 Ne bits.Therefore, compared with plain FL, this implies a communication expansion factor of ðndlog 2 Re þ dlog 2 NeÞ=ndlog 2 Be.For B ¼ 2 32 (i.e., 32 bits), n ¼ 2 20 elements, N ¼ 2 1024 , and m ¼ 100 clients, the expansion factor is 1.21.
Storage Cost.The server must store the public key, the random string and m weights, which is ðn þ 1Þdlog 2 Ne þ dlog 2 'ðNÞe þ mdlog 2 Be þ bits, along with the masked gradients and their signatures buffer, which is mndlog 2 Re þ mdlog 2 Ne bits, such that the server can aggregate the masked gradients upon arrival.

PERFORMANCE EVALUATION
In this section, we evaluate the performance of SVFL in real scenarios.In particular, we train two practical ML models using SVFL and evaluate the accuracy of the results, as well as the incurred computational and communication overhead.The experimental results conform with the complexity analysis presented in Section 6.

Experimental Setup
Considering that secure aggregation and verification can be implemented as plug-in modules for SVFL, we run singlethreaded simulations on a Linux virtual machine installed on a laptop with an Intel(R) Core(TM) i5-8250U CPU @1.60GHz and 16.0GB RAM.We implement SVFL in python on TensorFlow Federated [2] and conduct the experiments on: 1) a convolutional neural network, consisting of two 5 Â 5 convolution layers, a fully connected layer with 512 units, and a softmax output layer, on the popular MNIST dataset, and 2) an AlexNet [31], consisting of five convolutional layers and three fully connected layers, on the popular CIFAR10 dataset.In all the experiments below, we randomly partition the training dataset over all clients; for example, assuming the number of clients is 10, for MNIST dataset with 70,000 images (60,000 training data and 10,000 test data) of 28 Â 28 pixels, the client C i may be randomly assigned 10,000 training data and 1,000 test data.In addition, we use SHA-256 hash to represent the hash function H and AES in counter mode to achieve PRG.We use the standard SGD algorithm for training, where the learning rate is set to 10 À3 with a learning rate decay 10 À6 , the batch size is 100, and the number of local epochs is 10.

Model Accuracy
Compared to the plain FL with the floating-point gradients, SVFL requires the floating-point gradient value to be represented as a 32-bit integer, which is the only factor that may lead to degraded model accuracy.Prior work [22] shows that deep networks can be trained using only 16-bit wide fixed-point number representation to achieve near-lossless accuracy.Therefore, from a theoretical point of view, the accuracy loss due to 32-bit integer representation of the gradient is negligible.
In this experiment, we simulate the training with 10 clients (m ¼ 10) and compare the test accuracy of SVFL with that of the plain FL where no secure aggregation and verification are involved.Fig. 4 shows the comparison results.The experimental results demonstrate that the accuracy achieved by SVFL is very comparable to the ones achieved by the plain FL; reaching 96:75% as compared to 97:23% for MNIST dataset and 78% as compared to 79:2% for CIFAR10 dataset at the 500 th round.
1.The process in which the server randomly selects a client as a leader does not involve a specific calculation, and the time required is quite small (it is just a matter of flipping a coin), which makes it negligible compared to other specific calculations (e.g., aggregation).Thus, we omit it.

Computational Overhead
We evaluate the computational cost of our SVFL on five processes: masking the gradients, signing the masked gradients, aggregating the masked gradients, unmasking the aggregated masked-gradients and verification.Note that we omit the running time for local model training and update (it depends on specific training models and the use of advanced optimizers such as Adam [28]), as this work focuses on secure aggregation and verification.Improving local and global models of training are beyond the scope of the current work.
Table 2 shows the running time per round for both a single client and the server.As illustrated in Table 2, masking and unmasking consume a very short time compared to that spent on signing and verification.Fig. 5 depicts the relationship between the total running time per round and the number of clients and the gradient vector size.As can be seen from Fig. 5, the running times of the client and the server increase linearly with both the number of clients and the gradient vector size.Also, the figure shows that the gradient vector size has a greater impact on the computation costs of both the client and server than the number of clients.

Communication Overhead
We evaluate our experiment on a 50Mbps communication channel.Table 3 shows the communication overhead and its corresponding time per round for both a single client and the server (to a single client).From Table 3, we observe that the communication costs of a single client and the server are almost the same.Fig. 6 describes the relationship between the total amount of data transferred per round and the gradient vector size and the number of clients.As shown in Fig. 6, the communication overhead per client increases linearly with the gradient vector size, but increases logarithmically with the number of clients.We do not plot the communication cost for the server, as it is essentially m times the communication cost of the client.

Comparison With Cross-Silo FL
By Table 1, we know that BatchCrypt [63] does not support correctness verification of the aggregated gradient, so we compare the computational costs of the masking, aggregation and unmasking (i.e., Motp scheme) of SVFL with the encryption, aggregation and decryption overhead (i.e., Pailler cryptosystem [41]) of BatchCrypt.In [63], the authors trained a 3-layer fully-connected neural network over FMNIST dataset [56] (with 101.77K gradients), an AlexNet model [31] over CIFAR10 dataset [30] (with 1.25M gradients), and a LSTM model [24] over Shakespeare dataset [3] (with 4.02M gradients).Thus, to compare SVFL and BatchCrypt in the same experimental environment, we evaluate the computational overhead of the masking, aggregation and unmasking when the gradient vector size is set to 101.77K, 1.25M, and 4.02M, respectively.As depicted in Fig. 7, the computational costs of the masking, aggregation and unmasking of SVFL are much lower than the encryption, aggregation and decryption of BatchCrypt.Specifically, the masking consumes 533-799 times less computation time than the encryption, the aggregation of SVFL consumes 5.7-17 times less computation time than the aggregation of BatchCrypt, and the unmasking   consumes 49-77 times less computation time than the decryption.This means that SVFL (without verification) is at least 74-112.6times faster than BatchCrypt.In fact, as shown in [63], for Paillier with 3074-bit key, the encryption time required to encrypt a 6.87MB plaintext is 3111.14sand the decryption time is 993.8s.Even with the maximum speedup of 93Â claimed in [63], it would take 33.45s of encryption time and 10.69s of decryption time.By contrast, for any 6.87MB plaintext, the execution time for the masking and unmasking of SVFL is 0.047s and 0.185s, respectively.In terms of communication, BatchCrypt provides the network footprint incurred in one iteration, which is much greater than the communication overhead.Thus, we compare the communication cost between SVFL and BatchCrypt using an analytical method.As noted in [63], Pailler encryption produces substantially larger ciphertexts, which expands the amount of data transfer by more than 150Â compared to plain FL.In addition, compared with FATE [1], BatchCrypt reduces the network footprint by up to 101Â for LSTM model over the Shakespeare dataset, which is the best reduction that BatchCrypt could achieve.Thus, BatchCrypt requires at least 1:48Â more communication amount than plain FL.By contrast, by the theoretical analysis of communication cost given in Section 6, compared with plain FL, the communication expansion factor of SVFL is 1:21Â for both the client and the server.Therefore, SVFL requires less traffic than BatchCrypt.
Remark 14 (Security Comparison with HE-based FL).In Section 5.3, we described two types of attacks applicable to SVFL: one is due to the honest-but-curious clients, and the other is due to the honest-but-curious server, we then proved that our SVFL is secure against both attacks based on the security of the underlying masking with one-time pads.In comparison, the benchmarking HE-based FL protocol [45] (and other follow-up works, e.g.[12], [23], [53], [63]) only considered and proved the privacy against the honest-but-curious server, under CPA-secure (chosen plaintext attacks; we refer to [20] for its definition) of the underlying HE scheme.We remark that considering only honest-but-curious server is relevant to the scenario in which clients are considered as organizations such as hospitals and financial institutions under the jurisdiction of federal governments.

Comparison With Verifiable FL
By Table 1, VFL [17], VerifyNet [57] and VeriFL [21] support validation for the correctness of the aggregated gradient.As noted in Related Work (see Section 2), VFL, VerifyNet and VeriFL add the verification mechanism on top of [8], and they all use (double) masking and unmasking techniques similar to our Motp scheme for masking the local gradients.Therefore, we are only concerned with verification-related comparisons.
SVFL versus VFL.In VFL [17], the computational costs related to masking and verification are included in its encryption, verification and decryption, so we compare the runtime of the "masking"þ"signing" and "verification"þ"unmasking" of SVFL with the runtime of the encryption and "verification"þ "decryption" of VFL, respectively.Fig. 8 reports on the experimental results and shows that SVFL performs better than VFL.
On the other hand, the communication overhead of VFL increases linearly with the degree of the polynomial (VFL uses Lagrange interpolation to achieve the correctness verification of the aggregated gradient), which is associated with its security.By contrast, each client and the server of SVFL only need to send a signature, which is less than 1KB The gradient vector size is fixed to 166.337K entries.The time includes network latency.  in size, regardless of the number of clients and the gradient vector size.Therefore, we conclude that SVFL requires a lower communication cost than VFL.SVFL versus VerifyNet.Recall that in order to achieve the correctness verification of the aggregated gradient, the server needs to generate a "proof" and each client verifies the "proof".In VerifyNet [57], the author compared the verification costs of each client and the server (i.e., compute the "proof") to the total cost.Thus, we compare the verification overhead of the client ("signing"þ"verification") and the server (computing an aggregated signature) of SVFL with the verification overhead of the client and the server of Veri-fyNet (no dropouts), respectively.In terms of communication, we compare the communication overhead associated with the verification of SVFL with VerifyNet.As shown in Fig. 9, SVFL is way superior to VerifyNet in terms of verification and communication overhead.This is mainly because VerifyNet relies on expensive bilinear operations, and the communication overhead of the client and the server in Ver-ifyNet increase linearly with the gradient vector size.By contrast, in SVFL, each client only needs to generate a signature and verify an aggregated signature (where the size of any signature is less than 1KB regardless of the gradient vector size), while the server only needs to generate the aggregated signature; to maintain fair comparison, we do not consider mutual verification.
SVFL versus VeriFL.VeriFL [21] divides the aggregation and verification phases into four and three rounds respectively, among which we observe that the second and fourth rounds of the aggregation phase and all rounds of the verification phase are related to the correctness verification of the aggregated gradient.Therefore, we focus on these rounds and compare the computation and communication overheads associated with the verification of SVFL with VeriFL (no dropouts).As illustrated in Fig. 10, SVFL performs better than VeriFL in terms of verification efficiency including verification and communication overhead.This is mainly because in the verification process of VeriFL, each client needs to verify the commitments of all clients, which takes a considerable amount of time and increases linearly with the number of clients.By contrast, in SVFL, each client only needs to verify an aggregated signature smaller than 1KB in size, which is independent of the number of clients.Note that there is no verification overhead for the server in Ver-iFL when we do not consider dropouts; in comparison, the server in our SVFL needs to compute an aggregated signature for verification, but its computational cost is very small (see Fig. 10b).

CONCLUSION
In this paper, we proposed SVFL, a cross-silo federated learning protocol that provides secure aggregation and correctness verification of the aggregated gradient, while achieving both improved computation and communication overheads compared to existing cross-silo FL and verifiable FL protocols.We showed, by complexity analysis and extensive experimental evaluation, that the computation and communication overheads of our SVFL are very low, making it ideal for practical applications.The techniques we use in this work are implemented as pluggable modules, which can be truncated depending on different application scenarios and requirements.Thus, we believe that SVFL is suitable for the more practical case of non-i.i.d.data in the cross-silo setting.However, since this work mainly focuses on secure aggregation and verification, we leave it as future work to demonstrate how SVFL can be applied to scenarios with non-i.i.d.data in the cross-silo setting.In addition, SVFL can also be used in the cross-device federated learning setting, but it needs to rely on a TA.We believe it would be interesting to explore how to build a cross-device FL that supports secure aggregation and verification without relying on TA.

Fig. 2
Fig. 2 depicts SVFL, where the HNsig and Motp schemes are implemented as pluggable modules.SVFL contains two entities: the client and the server.

Fig. 5 .
Fig. 5. Running time per round.(a) Running time per client, where the gradient vector size is fixed to 166.337K entries.(b) Running time for the server, where the gradient vector size is fixed to 166.337K entries.(c) Running time per client, where the number of clients is set to 10.(d) Running time for the server, where the number of clients is set to 10.

Fig. 6 .
Fig. 6.Total data transfer per round.(a) Total data transfer, as the gradient vector size increases, where the number of clients is set to 10.(b) Total data transfer, as the number of clients increases, where the gradient vector size is fixed to 166.337K entries.

Fig. 9 .
Fig. 9. Comparison of the computation and communication overheads between SVFL and VerifyNet, where the number of clients is fixed to 100.(a) Running time per round of the client.(b) Running time per round of the server.(c) Total data transfer for the client.(d) Total data transfer for the server.

Fig. 10 .
Fig. 10.Comparison of the computation and communication overheads between SVFL and VeriFL, where the number of clients is fixed to 500.(a) Running time per round of the client.(b) Running time per round of the server.(c) Total data transfer for the client.(d) Total data transfer for the server.

TABLE 1
Let REAL C; S ðfða i ; W i Þg i2C ; ðs 1 ; . . .; s m ÞÞ be a random variable representing the view of the server S.
ents).Under the same parameter selections as Theorem 9, there exists a PPT simulator SIM such that for all , ðs 1 ; . . .; s m Þ, C, U such that U C, and inputs fða i ; W i Þg i2C , the output of SIM C; U is computationally indistinguishable from the output of REAL C; U : SIM C; U ðW; fða i ; W i Þg i2U ; ðs 1 ; . . .; s m ÞÞ % comp REAL C; U ðfða i ; W i Þg i2C ; ðs 1 ; . . .; s m ÞÞ; Proof.Since the server can only obtain fð c W i ; s i ; a i Þg m i¼1 from clients, where c W Theorem 12 (Privacy Against Honest-but-Curious Server).Under the same parameter selections as Theorem 9, there exists a PPT simulator SIM such that for all , Z n R , ðs 1 ; . . .; s m Þ, C, S and inputs fða i ; W i Þg i2C , the output of SIM C; S is computationally indistinguishable from the output of REAL C; S : comp REAL C; S ðfða i ; W i Þg i2C ; ðs 1 ; . . .; s m ÞÞ:

TABLE 2
Each entry represents the average over 10 iterations.The number of clients is set to m ¼ 10 and the gradient vector size is 166.337Kentries."Aggregation" includes aggregation of the masked gradients and combination of the signatures.

TABLE 3 Communication
Overhead and Runtime Per Round