Privacy-preserving quantum federated learning via gradient hiding

Changhao Li; Niraj Kumar; Zhixin Song; Shouvanik Chakrabarti; Marco Pistoia

doi:10.1088/2058-9565/ad40cc

1. Introduction

Quantum computing has experienced rapid advancements in recent years, and within this dynamic landscape, distributed quantum computing including quantum machine learning (QML) [1–9], has garnered considerable attention due to its remarkable capability to harness the collective power of distributed quantum resources, surpassing the limitations of individual quantum nodes. Distributed quantum computation usually involves generating and transmitting quantum states across multiple nodes leveraging the advancements in quantum communication technologies [10]. Remarkably, distributed quantum computing protocols offer a ray of hope in addressing privacy concerns in the presence of adversaries [10–14], while traditional classical methods have struggled to ensure the confidentiality of sensitive information during distributed processes. These adversaries not only involve third-party attacks that can be tackled with well-celebrated quantum communication technologies such as quantum key distribution [10, 11], but also include privacy concerns with untrusted computing nodes [12, 13].

A critical example of this vulnerability indeed lies in classical federated learning (FL) [15, 16], where multiple clients collaboratively train a machine learning model to optimize a given task while keeping their training data distributed without being moved to a single server or data center. A central server is assigned the responsibility of aggregating the client model updates, typically the model cost function gradients generated by the clients using their local data. However, this opens up a possibility of leaking client's sensitive data to the server using gradient inversion attacks [17–21]. While techniques employing homomorphic encryption or differential privacy [22, 23] have been introduced to tackle the problem, they usually demand additional computational and communication overhead or come at the expense of reduced model accuracy. To this end, quantum technologies could provide a natural embedding of privacy. To counteract the gradient inversion attack, one recent proposal [9] replaced the classical neural network in the FL model with variational quantum circuits built using expressive quantum feature maps such that the problem of a successful attack is reduced to solving high-degree multivariate Chebyshev equations. Other quantum-based proposals include adding a certain level of noise to the gradient values to reduce the probability of a successful gradient inversion attack [24], leveraging blind quantum computing [25], and others [26–29]. An alternative to the aforementioned methods is to encode the client's classical gradient values into quantum states and leverage quantum communication between the clients and server to transmit the states. This provides opportunities to hide the gradient values of individual clients from the server while allowing the server to perform the model aggregation using appropriate quantum operations on their end. In this case, the transmitted quantum states offer an inherent advantage in terms of privacy even without additional privacy mechanisms, as the classical information can be encoded in a logarithmic number of qubits and using Holevo's bound, the server could extract at most logarithmic number bits of classical information during each round of communication [6]. Moreover, we remark that the approach can be naturally integrated with quantum cryptographic techniques [10, 30] to become robust against third-party attacks.

In this work, we introduce protocols for the above approach, aiming to advance the capability of distributed quantum computing with quantum communication in the context of FL. Specifically, we propose two types of protocols: one based on private inner-product estimation to perform model aggregation, and the other based on the concept of incremental learning to encode the model aggregated sum in the phase of the quantum state. For the former, we transform the secure model aggregation task into a correlation estimation problem and generalize the recently developed blind quantum bipartite correlator (BQBC) algorithm [7] into multi-party scenarios. For m clients with d model parameters to be updated, the protocol involves a quantum communication cost of $\tilde{\mathcal{O}}(md/\epsilon)$ where ε is the standard model update error, and it is quadratically better in m compared to the analogous method based on classical secret sharing [31]. For the second type of protocol, similar to incremental learning, clients perform multi-party computation sequentially or simultaneously without having the server involved until the end of the protocol at which the server extracts the aggregated gradient information. For one of our proposed protocols within the framework of incremental learning, the secure multi-party summation algorithm achieves a similar quantum communication cost as the BQBC with the complexity being $\tilde{\mathcal{O}}(m d/\epsilon)$ .

These protocols are designed not only to bolster privacy but also to have an evaluation on the quantum communication costs. Through the application of quantum algorithms, this work aspires to unlock novel strategies that are capable of safeguarding sensitive information within the realm of distributed quantum computing while optimizing communication efficiency. Furthermore, it is noteworthy that the suggested protocols can seamlessly integrate with quantum key distribution protocols, thereby ensuring information-theoretic security against external eavesdropper attacks. Our work sheds light on designing efficient quantum communication-assisted FL algorithms and paves the way for secure distributed QML protocols.

2. Problem statement

2.1. Federated learning setup

We present the settings of the quantum communication-based FL scheme involving m clients and a central server. Consider the setup with each client $i \in [m]$ having N_i samples of the form,

$\begin{align*} \mathbf{X}^{\left(i\right)}, Y^{\,\left(i\right)} : \left\{\left(\mathbf{x}^{\left(i\right)}_j, y^{\left(i\right)}_j\right)\right\}_{j = 1}^{N_i}, \hspace{2mm} i \in \left[m\right] \end{align*}$

such that the total number of samples across all the clients is $N = \sum_{i\in [m]} N_i$ . Here each $\mathbf{x}_j^{(i)} \in \mathbb{R}^n$ and $y_j^{(i)} \in \mathcal{C}$ for finite set of output classes.

The aim is to learn a single, global statistical model such that the client data is processed and stored locally, with only the intermediate model updates being communicated periodically with a central server. In particular, the goal is typically to minimize a central objective cost function,

$\begin{equation} \text{min}_{\boldsymbol{\theta}} \left[\boldsymbol{\mathcal{L}}\left(\boldsymbol{\theta}\right) = \sum_{i = 1}^{m} w_i\boldsymbol{\mathcal{L}}_i\left(\boldsymbol{\theta}\right) \right] \end{equation} \tag{ 1 }$

where $\boldsymbol{\theta} = \{\theta_1, \ldots, \theta_d\} \in \mathbb{R}^d$ are the set of d trainable parameters of the FL model. The user-defined term $w_i \unicode{x2A7E} 0$ determines the relative impact of each client in the global minimization procedure with the most natural setting being $w_i = \frac{N_i}{N}$ . That is, here the weight w_i depends on the local data size of individual clients and is known to both the server and clients.

In the standard FL setup, at the tth iteration, the clients each receive the parameter values $\boldsymbol{\theta}^t \in \mathbb{R}^d$ from the server and their task is to compute the gradients with respect to $\boldsymbol{\theta}^t$ and send it back to the server. Here the superscript denotes the iteration step. Upon performing a single batch training, they compute the d gradient updates $\nabla\boldsymbol{\mathcal{L}}_i$ and share it with the server. The server's task is then to perform the gradient aggregation within a standard error bound ε to update the next set of parameters $\boldsymbol{\theta}^{t+1}$ using the rule,

$\begin{equation} \boldsymbol{\theta}^{t+1} = \boldsymbol{\theta}^{t} - \alpha \sum_{i = 1}^{m}w_i \nabla \boldsymbol{\mathcal{L}}_i\left(\boldsymbol{\theta^t}\right), \end{equation} \tag{ 2 }$

where for the rest of the work, we assume the relative impact $w_i = \frac{N_i}{N}$ , and α is the learning rate hyperparameter chosen by the server. The parameters $\boldsymbol{\theta}^{t+1}$ are then communicated back to the clients and the protocol repeats until a desired stopping criteria is reached.

We denote that in many cases, one is interested in learning $\alpha\sum_{i = 1}^{m}w_i \nabla \boldsymbol{\mathcal{L}}_i(\boldsymbol{\theta^t}) \mod 2\pi$ as the model parameters can have a 2π period, particularly in quantum circuits. We point out that here the local circuit model of both the server and clients could be either classical or quantum, but they both have the capability of encoding their local data into quantum states. Further, we consider a quantum communication channel between the server and m clients in order to facilitate the transmission of quantum states.

2.2. Data leakage in classical FL

The existing classical FL setup was built on the premise that sharing gradients to the server would not leak the local data information to the server. However, this notion of privacy has been challenged by the wider community [19]. Specifically led by the work of [20] and followed up by [17, 21, 32, 33], it is shown that it is possible for the honest-but-curious server (who strictly follows the protocol but is interested in learning clients' private data) to extract input data from model gradients. In fact, using the results of [18], we showcase in appendix A how to easily invert the gradients generated from a fully connected neural network model to learn the data.

While classical techniques including homomorphic encryption [22] and secret sharing [31] have been employed to tackle the challenge, they usually impose a significant overhead in communication and computation cost, limiting their applications for FL tasks. On the other hand, randomization approach employing differential privacy [23], while being simple to implement, usually leads to a reduced model accuracy and utility (see appendix B for details).

In this work, we address the concern of data leakage originating from gradients that are generated by either a classical neural network based model or a variational quantum circuit based model [34]. The primary objective is to facilitate a secure global parameter update without divulging individual clients' gradient information $\nabla \boldsymbol{\mathcal{L}}_i(\boldsymbol{\theta^t})$ to the server, thereby mitigating the risk of gradient inversion attacks. In order to hide the individual gradient information while still performing the model parameter update in equation (2), one can implement privacy in either multiplication between weights w_i and local gradient $\nabla \boldsymbol{\mathcal{L}}_i(\boldsymbol{\theta^t})$ , or summation among weighted gradients. In what followings, we will show protocols along these two ways: secure inner product estimation or secure weighted gradient summation (in analogy with incremental learning). Before diving into the details, we summarize the proposed protocols by listing the main privacy mechanism as well as quantum communication complexity and their requirements in table 1.

Table 1. Privacy and communication complexity of proposed gradient-hidden quantum federated learning protocols.

Protocol	Privacy mechanism	Communication complexity ^a	Additional requirement
baseline (classical)	classical secret sharing	CC: $\mathcal{O}((m+m^2)d)$	classical communication among clients
inner product estimation with classical secret sharing	classical secret sharing, amplitude encoding	CC: $\mathcal{O}((m + m^2)d)$ QC: $\mathcal{O}(\frac{d\log m}{\epsilon^2})$	classical communication among clients, quantum communication among clients
blind QBC algorithm	quantum encoding, $\quad$ random phase padding	QC: $\mathcal{O} (\frac{md}{\epsilon}\log( m \log(\frac{m}{\epsilon})))$ ^b	quantum communication among clients
GHZ-based phase encoding	phase accumulation	QC: $\mathcal{O}(\frac{md}{\epsilon^2} )$	global entanglement
multiparty quantum summation	phase accumulation	QC: $\mathcal{O}(\frac{md}{\epsilon} \log\frac{m}{\epsilon})$	quantum communication among clients

^aCC: classical communication complexity. QC: quantum communication complexity.^bAdditional classical communication complexity $\mathcal{O}(m)$ when random phase padding is used.

3. Protocol I: secure inner product estimation

In this section, we consider converting the model aggregation problem into task of distributed inner product estimation between server and clients where algorithms such as quantum bipartite correlator (QBC) [7, 8] could be employed. From the FL parameter update rule equation (2), we note that for each parameter index, $j \in [d]$ , the task for the server would be to perform multiplication between the weight w_i and local gradient $\nabla \mathcal{L}_{i,j}(\boldsymbol{\theta})$ , before summation of all weighted gradients to obtain $\theta_j^{t+1}$ .

In the following, we start from a baseline approach where the secure inner product is performed with the assistance of classical secret sharing (CSS). Following it, we utilize the blind quantum bipartite correlator algorithm and propose a scheme for secure inner product estimation with quadratically fewer communication cost in m.

3.1. Baseline: classical secret sharing assisted inner-product estimation

In this section, we start with a purely classical strategy to hide the gradients of the clients prior to sending the masked gradients to the server. We use this as a baseline to compare against the quantum gradient hiding strategies we develop over the next sections. The baseline strategy is built using the masking technique with one-time pads as introduced in Protocol 0 in [31]. For this protocol to succeed, we assume that each client is switched 'on' during the entirety of the protocol and further, has pairwise secure classical communication channels with each of the m − 1 other clients.

The protocol starts with each client i sampling m − 1 random values $s_{i,k} \in [0,R)$ for every other client indexed by k. Here R is the chosen upper limit of the interval as agreed by all the clients. Similarly, all other clients generate the random values in $[0,R)$ for every other client. Next, clients i and k exchange $s_{i,k}$ and $s_{k,i}$ over their secure channel and compute the perturbations $p_{i,k} = s_{i,k} - s_{k,i} (\text{mod}~ R)$ . We note that $p_{i,k} = -p_{k,i}$ , Further, $p_{i,k} = 0$ when i = k. The clients repeat the above procedure a total of d times (to mask each of the d gradient values $\nabla \mathcal{L}_{i,j}(\boldsymbol{\theta})$ ).

Next, for every parameter to be updated, each client sends masked gradient value to the server,

$\begin{equation} y_{i} = \nabla \mathcal{L}_{i}\left(\boldsymbol{\theta}\right) + \frac{1}{w_i}\sum_{k = 1}^m p_{i,k} \left(\text{mod}~ R\right). \end{equation} \tag{ 3 }$

Note that we drop the parameter index j hereafter for simplicity. The task of the server is then to perform a weighted aggregation of the gradients in order to obtain the next set of parameter values. It can be trivially checked that an honest server always succeeds in performing the correct aggregation, i.e.

$\begin{equation} \begin{split} \bar{y} & = \sum_{i = 1}^m w_i y_{i} \\ & = \sum_{i = 1}^m w_i \nabla \mathcal{L}_{i}\left(\boldsymbol{\theta}\right) + \sum_{i,k}\left(s_{i,k} - s_{k,i}\right) \left(\text{mod}~ R\right)\\ & = \sum_{i = 1}^m w_i \nabla \mathcal{L}_{i}\left(\boldsymbol{\theta}\right). \end{split} \end{equation} \tag{ 4 }$

Further, privacy is guaranteed due to the use of one-time pad masking of gradients which guarantees information-theoretic security against malicious server.

The above scheme requires a total of $\frac{m(m-1)}{2}\times d \log(R)\approx \mathcal{O}(m^2 d)$ classical bits of communication between the clients and a further $\mathcal{O}(md)$ bits of communication between the clients and server to achieve secure aggregation. Thus the total classical communication complexity required is,

$\begin{equation} \mathcal C_\mathrm{CSS}^C = \mathcal{O}\left(\left(m + m^2\right)d\right) \approx \mathcal{O}\left(m^2d\right). \end{equation} \tag{ 5 }$

We remark that the above classical secret sharing based scheme could be replaced with a quantum protocol with similar communication cost (figure 1(a)). Specifically, after obtaining the masked gradients as in equation (3), the clients can collaboratively encode their masked gradients in an amplitude encoded quantum state,

$\begin{equation} \vert \phi_c\rangle = \frac{1}{\mathcal N_c} \sum_{i = 1}^m y_{i} \vert i\rangle, \end{equation} \tag{ 6 }$

where $\mathcal N_c$ is the normalization factor. Note that this state could be prepared by having each client i send the classical masked gradient information y_i to one client who can prepare equation (6) locally. This state preparation process then involves a classical communication cost $\mathcal{O}(m)$ . This state is then sent to the server which can recover the weighted aggregate sum by performing the SWAP test-based discrimination [35] with their local state $\vert \phi_s\rangle = \frac{1}{\mathcal N_s} \sum_{i = 1}^m w_i \vert i\rangle$ . Here the main computational cost lies in the controlled-SWAP operation in the SWAP-test. In terms of communication cost, since the state in equation (6) requires only $\mathcal{O}(\log(m))$ qubits, the amount of communication between the server and clients can be reduced to $\mathcal{O}(\log (m)/\epsilon^2)$ , where ε is the error incurred in estimating the aggregated sum using the SWAP test. The total communication complexity with this scheme is,

$\begin{equation} \begin{aligned} & \mathcal C_\mathrm{CSS}^Q = \mathcal{O}\left(\left(\frac{\log m}{\epsilon^2} + m + m^2\right)d \right). \\ \end{aligned} \end{equation} \tag{ 7 }$

**Figure 1.** Diagram of QFL protocols based on secure inner product estimation. (a) CSS-assisted QFL protocol. The clients jointly prepare a state in which the amplitudes encode the masked gradients and then send it to the server. The gradient masking is achieved via classical secret sharing. (b) BQBC-based QFL protocol. We consider a central server with m clients and there are quantum channels among them. During each round of communication, each client encodes their local gradient information in specific phases of the received state and then send it back to the server.
Download figure:
Standard image High-resolution image

3.2. Model aggregation with blind quantum bipartite correlator algorithm

To reduce the communication cost, in this section, we propose a method for model updating based on quantum bipartite correlator algorithm [7, 8] that is designed to estimate inner product between remote vectors. The essential idea is a generalization of the recently proposed blind quantum bipartite correlator algorithm [7]: firstly, each client converts the gradient information into binary floating point numbers. Then, at each round of communication, the server passes the index qubit state that encodes weight information into each honest or honest-but-curious client and let them privately encode the gradient information into the phase of corresponding index qubits. Finally, the server receives back the index qubits and performs quantum counting algorithm to extract the desired aggregated gradient.

We now proceed to the implementation details of the algorithm. As mentioned, the goal is to have the sever perform the inner product estimation using the known weight information and gradient information that is only locally held by each client. For the kth client ( $k \in [m]$ ), both the weight w_k and the gradient $\nabla\boldsymbol{\mathcal{L}}_k$ can be expanded as binary bitstrings both with size l_k : a _k and b _k, such that $w_k \cdot \nabla\boldsymbol{\mathcal{L}}_k$ equals to the inner product $l_k \frac{1}{l_k}\sum_j a_{kj}b_{kj} = l_k \overline{a_k b_k}$ . One example of such expansions is to use the IEEE standard for floating-point arithmetic [36], where we can have

$\begin{equation} w_k = \sum_{i = 0}^{l_k} 2^{u-i} a_{ki}, \nabla\boldsymbol{\mathcal{L}}_k = \sum_{i = 0}^{l_k} 2^{v-i} b_{ki}. \end{equation} \tag{ 8 }$

Here u and v are the highest digits of w_k and $\nabla\boldsymbol{\mathcal{L}}_k$ , respectively, and are constants known to both server and clients. We then get

$\begin{align*} w_k \cdot \nabla\boldsymbol{\mathcal{L}}_k & = \sum_{i = 0}^{l_k} \sum_{j = 0}^{l_k}2^{u+v-\left(i+j\right)}a_{ki}b_{kj} \\ & = \sum_{\lambda = 0}^{l_k} 2^{u+v-\lambda} \sum_{i = 0}^{\lambda} a_{ki}b_{k\left(\lambda-i\right)}. \end{align*}$

In the following, we assume $l_k = l_0, \forall 1 \unicode{x2A7D} k \unicode{x2A7D} m$ for simplicity. Then the goal is to design a private inner product protocol to have the server evaluate

$\begin{equation} \begin{aligned} \sum_{k = 1}^m w_k \cdot \nabla\boldsymbol{\mathcal{L}}_k & = \sum_{k = 1}^m \sum_{\lambda = 0}^{l_0} 2^{u+v-\lambda} \sum_{i = 0}^{\lambda} a_{ki}b_{k\left(\lambda-i\right)} \\ & = 2^{u+v}\sum_{\lambda = 0}^{l_0} 2^{-\lambda} \sum_{k = 1}^m \sum_{i = 0}^{\lambda} a_{ki}b_{k\left(\lambda-i\right)}. \\ \end{aligned} \end{equation} \tag{ 9 }$

Here the lowest digit term in the equation above is given by $2^{u+v-l_0} \sum_{k = 1}^m \sum_{i = 0}^{l_0} a_{ki}b_{k(l_0-i)}$ . In the following, we consider the evaluation of $\sum_{k = 1}^m \sum_{i = 0}^{l_0} a_{ki}b_{k(l_0-i)} \equiv \sum_{k = 1}^{m} \sum_{j}^{l_0} \tilde{a}_{kj} \tilde{b}_{kj}$ as an example to showcase how the server could extract the above quantities. Note that there are l₀ terms of such summations that need to be estimated with $l_0 \approx \mathcal{O}(\log(m/\epsilon))$ .

We thus consider the following protocol: initially, the server prepares a quantum state with $\lceil \log (ml_0) \rceil$ index qubits $\frac{1}{\sqrt{2^{\lceil \log (ml_0) \rceil}}}\sum_{i = 1}^{2^{\lceil \log (ml_0) \rceil}}\vert i\rangle$ , and then applied controlled-gate to encode all w_k information on a single qubit o_a . The final state is

$\begin{equation} \sum_{k = 1}^m \sum_{i = 1}^{l_0}\vert k,i\rangle\vert a_{ki}\rangle_{o_a}, \end{equation} \tag{ 10 }$

where the index k denotes the kth client and index i is the index for bitstring with size l₀. We omit the normalization factor for above and following states in the protocol for simplicity.

Then, as shown in the diagram in figure 1(b), the server delivers the above $\lceil \log (ml_0) \rceil+1$ qubits to the first client. Note that a malicious server could prepare a state $\sum_{k = 1}^m \sum_{i = 1}^{l_0}c_{ki}\vert k,i\rangle\vert a_{ki}\rangle_{o_a}$ with non-uniform amplitude distribution of c_ki to extract clients' information of interest. To detect such attacks, the first client would firstly decode the ancillary qubit o_a (as the encoded weight information is known globally) and then measure the index qubits in X basis. In the honest server case where c_ki has a uniform distribution, measurement outcome should be all +1. That is, if a malicious server tries to extract certain gradient information by increasing the amplitude of corresponding bitstrings, the index qubit state without the ancilla would not be $\vert +\rangle^{\otimes \lceil \log (ml_0) \rceil }$ .

After successful verification and re-encoding of the weight information in ancillary qubit o_a , the first client encodes its local gradient information $\nabla\boldsymbol{\mathcal{L}}_1$ into the phase of the first part of index qubits, which leads to

$\begin{equation} \sum_{i = 1}^{l_0}\left(-1\right)^{a_{1i} b_{1i}}\vert 1,i\rangle\vert a_{1i}\rangle_{o_a} +\sum_{k = 2}^{m} \sum_{i = 1}^{l_0} \vert k,i\rangle\vert a_{ki}\rangle_{o_a}. \end{equation} \tag{ 11 }$

This could be done using CZ gate between qubit o_a and a local qubit held by the first client that encodes $\nabla\boldsymbol{\mathcal{L}}_1$ .

The first client then passes the above state to the second client, who then encodes its local gradient information $\nabla\boldsymbol{\mathcal{L}}_2$ into the phase of the second part of index qubits. The resulting state is

$\begin{equation} \sum_{i = 1}^{l_0}\left(-1\right)^{a_{1i}b_{1i}}\vert 1,i\rangle\vert a_{1i}\rangle_{o_a} + \sum_{i = 1}^{l_0}\left(-1\right)^{a_{2i} b_{2i}}\vert 2,i\rangle\vert a_{2i}\rangle_{o_a} + \sum_{k=3}^{m} \sum_{i = 1}^{l_0} \vert k,i\rangle\vert a_{ki}\rangle_{o_a} . \end{equation} \tag{ 12 }$

The above process is repeated until all the clients have encoded their local gradient information in the phase, resulting to a state

$\begin{equation} \sum_{k = 1}^{m} \sum_{i = 1}^{l_0} \left(-1\right)^{a_{ki} b_{ki}}\vert k,i\rangle\vert a_{ki}\rangle_{o_a}. \end{equation} \tag{ 13 }$

Finally, the state is returned to the server by the last client. Then the server runs quantum counting algorithm [7, 8] to evaluate $\frac{1}{m l_0}\sum_{k = 1}^{m} \sum_{j = 1}^{l_0} a_{kj} b_{kj} = \frac{1}{m}\sum_{k = 1}^m \overline{a_k b_k}$ . In order to perform the estimation algorithm, $\mathcal{O} (\frac{1}{\epsilon})$ rounds of communication is needed where ε is the standard estimation error. We remark that quantum counting algorithm is based on Grover's search algorithm and is advantageous compared with SWAP-test based algorithms [37] in terms of the error complexity. The computational cost is mainly from the inverse quantum Fourier transform (QFT) and the Grover oracle calls in the quantum counting algorithm, which is $\mathcal{O}(\frac{1}{\epsilon})$ here [8].

We present the takeaway of this method here. Firstly, the privacy is encoded in the index qubit states. When the server measures the index qubits, the probability of getting a specific index is simply $\frac{1}{m l_0}$ , which is small when client number m is large. Moreover, the server could not amplify the amplitude of a specific index by preparing a uniformly distributed superposition state, as the first client is capable of verifying it. Furthermore, even there are multi-round of communication and the server could perform collective attacks, by increasing l₀ or adding a random pad on the phase, it is still hard for the server to get individual client's information [7]. Note that here the privacy comes from the phase encoding, rather than summation of gradients as in incremental learning protocols.

We consider the quantum communication complexity to be the total number of qubits transmitted in order to estimate $\frac{1}{m}\sum_{k = 1}^m \overline{a_k b_k}$ in the protocol, which reads as

$\begin{equation} \mathcal C_\mathrm{BQBC} = \mathcal{O} \left(\frac{md\log\left(m l_0\right)}{\epsilon}\right) = \mathcal{O} \left(\frac{md}{\epsilon}\log\left( m \log\left(\frac{m}{\epsilon}\right)\right)\right) . \end{equation} \tag{ 14 }$

Again, here m is the number of clients, and l₀ is related with the precision of gradient $ml_0 = O(m\log(1/\epsilon_0)) = O( m \log(m/\epsilon))$ where ε₀ is the inner product estimation error bound for single clients. This is better than classical secret sharing which has a total complexity in $O(m^2)$ . We note that in the absence of random phase padding, the technique here does not require classical communication at each round. The incorporation of random, one-time phase pads for privacy enhancement necessitates an additional classical communication cost of $\tilde{\mathcal{O}}(m)$ , as each client would need to send the padding information to the server at last.

3.3. Redundant encoding

The privacy of the protocol above could be further enhanced with redundant encoding of gradient data into binary bitstrings [7]. In particular, we remark on the following theorem: (efficient redundant encoding).

Theorem 1 In the BQBC-based QFL protocol, given a fixed estimation error ε, there exists a redundant encoding method with a redundant parameter r, such that the probability that server learns client's information decreases polynomially in r, while the communication complexity increases only poly-logarithmically in r.

Proof. Following [7], we consider the following redundant encoding approach aimed at reducing the probability that a malicious server acquiring a specific b_i information with i being the pertinent index of interest. we describe the following protocol where both the client k and server encode their single bit local information b_ki and a_ki into bitstrings ${\left[b^{^{\prime}}_{ki,1},b^{^{\prime}}_{ki,2},\ldots,b^{^{\prime}}_{ki,r}\right]}$ and ${\left[a^{^{\prime}}_{ki,1},a^{^{\prime}}_{ki,2},\ldots,a^{^{\prime}}_{ki,r}\right]}$ with size r, where r is an integer and r > 1. The total amount of bits then increases from ml₀ to rml₀. For the weight information, we consider the following encoding rule

$\begin{equation} \begin{split} \boldsymbol{a}^{^{\prime}}_{ki,j} = a_{ki}; \quad k = 1,2,\ldots ,m; i = 1,2,\ldots ,l_0; j = 1,2, \ldots , r; \end{split} \end{equation} \tag{ 15 }$

which is a simply copy the bit a_ki for r times. Here k is the index for client and i is the index of bitstring held by each client.

On the other hand, for $\boldsymbol{b}^{^{\prime}}_k$ , the kth client can hide the information b_ki randomly in one of the r digits and let the other r − 1 digits to be all zero or one. That is, for bit index i, the kth client chooses either

$\begin{equation} \begin{split} &\boldsymbol{b}^{^{\prime}}_{ki,j} = \delta_{j, R_{ki}} \cdot b_{ki}; \\ i = 1,2,\ldots ,l_0; &j = 1,2, \ldots , r, R_{ki}\in\left\{1,2,\ldots ,r\right\}. \\ \end{split} \end{equation} \tag{ 16 }$

or

$\begin{equation} \begin{split} &\boldsymbol{b}^{^{\prime}}_{ki,j} = \left(1-\delta_{j, R_{ki}}\right) + \delta_{j, R_{ki}} \cdot b_{ki}; \\ i = 1,2,\ldots ,l_0; &j = 1,2, \ldots , r, R_{ki}\in\left\{1,2,\ldots ,r\right\}. \\ \end{split} \end{equation} \tag{ 17 }$

where R_ki is an random number $R_{ki} \in [r]$ and $\delta_{j, R_{ki}}$ is the Kronecker delta function.

Then, according to the above rules, by running the QBC algorithm, for each k the server would get

$\begin{equation} \frac{1}{rl_0}\sum_i^{l_0} a_{ki} b_{ki} \quad \text{or} \quad \frac{1}{rl_0}\sum_i^{l_0} a_{ki} b_{ki} + \frac{r-1}{rl_0}\sum_i^{l_0} a_{ki} \end{equation} \tag{ 18 }$

depending on whether the kth client chooses encoding method equations (16) or (17). The difference between the two extracted values is $\frac{r-1}{rl_0}\sum_i^{l_0} a_{ki}$ that the server would know. Note that this choice could vary for different client k. At the end of the protocol, each client can send an one-bit message via classical channel to the server and let server knows which one was used, after which the server could extract $\sum_{k = 1}^{m} \sum_{j = 1}^{l_0} a_{kj} b_{kj}$ . This process yields a classical communication $\mathcal{O}(m)$ .

We remark that at each communication round, the probability that the server samples a specific bit reduces from $\frac{1}{ml_0}$ to $\frac{1}{rml_0}$ with r > 1. Even though that r-times more communication round will be needed to achieve the same error bound ε as in the original QBC case, the server would not know which digit encodes the correct b_ki information as here R_kis are random numbers. Therefore, the probability that the server successfully gets a specific bit b_ki would be

$\begin{equation} P = \frac{1}{rm l_0} \times \frac{r}{\epsilon} \times \frac{1}{r} = \frac{1}{rm l_0 \epsilon}, \end{equation} \tag{ 19 }$

where the second term $\frac{r}{\epsilon}$ is the total number of communication rounds and the third term is $\frac{1}{r}$ is due to the randomness in R_ki.

It's clear to see that a larger value of r corresponds to a decreased probability for the server to successfully extract valuable information from the client through the attack strategy. The flexibility that the client can independently choose encoding method also protects the majority information of b , i.e. the client may choose equation (16) to encode data if the majority of b is 1 to decrease the probability that 1s are being detected. Nevertheless, the trade-off for employing this redundant encoding approach compared with the original one in section 3.2 manifests as an augmented quantum communication complexity, as the transmitted qubit number goes from $\log (m l_0)$ to $\log (r m l_0)$ now. To this end, with the above redundant encoding method, the communication complexity increases logarithmically in r, while the probability that server successfully gets a specific bit b_ki decreases polynomially in r. □

With theorem 1, we show that one can design a protocol such that the privacy goes polynomially better with a redundant encoding parameter while the communication cost only goes logarithmically or linearly with this parameter.

4. Protocol 2: incremental learning

The aforementioned protocols entail secure inner product estimation between the server and clients. An alternative approach involves the formulation of protocols with secure weighted gradient summation, ensuring that the server exclusively receives aggregated gradients rather than weighted gradients from individual clients. This concept aligns with the principles of incremental learning, wherein a model or parameter is iteratively trained or updated with new data, assimilating fresh information while preserving knowledge acquired from prior data. In this section, we introduce two such protocols: the first involves secure aggregation through a globally entangled state, while the second is grounded in secure multiparty gradient summation.

4.1. Secure aggregation based on global entanglement

We start by discussing a secure aggregation protocol using globally entangled state distributed among clients and server (figure 2(a)). Similar as [38], at each time step and for each parameter to be updated, we consider a global (m+1)-qubit GHZ state and each qubit is held by one party. This can be achieved by letting the server prepare a GHZ state locally and then distribute the m qubits to the m clients via quantum channels. In this case, the GHZ state can be simply prepared by applying a Hadamard gate on one qubit initialized in $\vert 0\rangle$ with a sequence of CNOT gates between this qubit and other qubits. Alternatively, in the presence of malicious server, we can consider each party holds one local qubit and remote entanglement is generated via quantum communication channels. Specifically, one can consider direct transmission of qubits between remote clients in order to apply local CNOT gates between them, or CNOT gate teleportation [39, 40] to generate remote entangled states. In both cases, the state preparation process involves a quantum communication complexity $\mathcal{O}(m)$ , as there are essentially m − 1 controlled gates to be implemented.

**Figure 2.** Diagram of QFL protocols that are similar as incremental learning. (a) Secure gradient aggregation based on global entanglement among clients. We consider GHZ states that are distributed by the server or trusted client. After each client encodes its local gradient information, the server performs measurement on the phase of the state. (b) Quantum FL with secure multiparty gradient summation. The ancillary h qubits (in purple) are sent to the rest $(m-1)$ clients by the first client for gradient summation, after which the first client sends the other h-qubit state (in orange) to the server.
Download figure:
Standard image High-resolution image

Nevertheless, after the preparation of the GHZ state, for a given parameter, the kth client encodes its local weighted gradient information $\nabla \mathcal{L}_k(\boldsymbol{\theta^t})$ into the phase of its local qubit by applying a phase gate. That is, the kth client would apply a Z-rotation with the rotation angle being the local weighted gradient information. This process can be either sequentially or simultaneously. The distributed qubits are then sent back to the server via the quantum channel. The resulting state now reads

$\begin{equation} \frac{1}{\sqrt{2}} \left( \vert 00\ldots 0\rangle_{s,1,2,\ldots ,m} + e^{-i \sum_k^m \nabla \mathcal{L}_k\left(\boldsymbol{\theta^t}\right)}\vert 11\ldots 1\rangle_{s,1,2,\ldots ,m} \right). \end{equation} \tag{ 20 }$

The server could firstly disentangle the m received qubits by performing sequential CNOT gates between the local qubit s with the rest m qubits, leading to

$\begin{equation} \frac{1}{\sqrt{2}} \left( \vert 0\rangle_s + e^{-i \sum_k^m \nabla \mathcal{L}_k\left(\boldsymbol{\theta^t}\right)}\vert 1\rangle_s \right)\otimes \vert 0\ldots 0\rangle_{1,2,\ldots ,m}. \end{equation} \tag{ 21 }$

Then, similar as it is done in a typical Ramsey interferometry experiment [41], the server would need to estimate the phase term $\sum_i^m \nabla \mathcal{L}_i(\boldsymbol{\theta^t})$ . This can be done by applying a Hardmard gate on qubit s, leading its state to

$\begin{equation*} \frac{1}{2} \left( \left(1 + e^{-i \sum_k^m \nabla \mathcal{L}_k\left(\boldsymbol{\theta^t}\right)}\right)\vert 0\rangle_s + \left(1-e^{-i \sum_k^m \nabla \mathcal{L}_k\left(\boldsymbol{\theta^t}\right)}\right)\vert 1\rangle_s \right) , \end{equation*}$

followed by projective measurement in computational basis. The probability of getting zero would simply be

$\begin{equation} P_k\left(0\right) = \frac{1+\cos \left(\sum_k^m \nabla \mathcal{L}_k\left(\boldsymbol{\theta^t}\right)\right)}{2}. \end{equation} \tag{ 22 }$

The above process is repeated $\mathcal{O}(\frac{1}{\epsilon^2})$ times until the desired error bound ε is met. The procedure is iteratively applied to update all d parameters.

We now perform security analysis of the gradient information. As all the local gradient is aggregated in the phase of the GHZ state, the server could not extract the gradient of single clients. Under malicious server setting, in contrast of the semi-honest assumption in [38], if the GHZ state is distributed by the server, the server could simply prepare the state where only the jth client's qubit is entangled with the server qubit while others are not entangled, for example,

$\begin{equation} \frac{1}{\sqrt{2}} \left(\vert 0\rangle_1 + \vert 1\rangle_1\right) \otimes \ldots \frac{1}{\sqrt{2}} \left(\vert 00\rangle_{s,j} + \vert 11\rangle_{s,j}\right) \otimes \ldots \frac{1}{\sqrt{2}} \left(\vert 0\rangle_m + \vert 1\rangle_m\right). \end{equation} \tag{ 23 }$

Then, the malicious server would extract the gradient information of client j by measuring the phase its local qubit. To tackle this adversary, the GHZ state could be distributed by a trusted client. Alternatively, the GHZ state can be generated by allowing communications among clients. That is, the honest (or honest-but-curious) clients could jointly prepares a m-qubit entangled state and then communicate with the server to reach the state equation (20), as described above.

The total communication complexity for the aforementioned distributed entangled state scenario would be decided by the qubit distribution at each round and number of communication rounds to estimate the phase. Specifically, the total quantum communication cost (including the initial state preparation process) would read

$\begin{equation} \mathcal C_\mathrm{GHZ} = \mathcal{O}\left(\frac{md}{\epsilon^2}\right) \end{equation} \tag{ 24 }$

with ε being the error bound for the phase estimation.

4.2. Secure multiparty gradient summation

We next introduce a gradient-hidden quantum FL protocol using phase accumulation and estimation. Inspired by the secure multiparty quantum summation protocol proposed in [42], we consider the following data encoding method. At each time step t, the weighted gradient information for parameter k and client l is $\nabla_k \mathcal{L}_l(\boldsymbol{\theta^t}) \in \{0, \delta, 2\delta, \ldots , 2\pi \}$ . Note that here we set the upper bound of each individual gradient to be 2π for simplicity and the condition can be relaxed.

As shown in the diagram in figure 2(b), the protocol starts from the first client, who encodes its local gradient information for a given parameter into a h-qubit state $\vert \nabla \mathcal{L}_1\rangle$ with $h = \lceil \log (2\pi/\delta) \rceil$ (we consider $\log (2\pi/\delta)$ being an integer for simplity in the following). A subsequent QFT would yield the following state

$\begin{equation} \vert \psi\rangle_{1} = \text{QFT} \vert \nabla \mathcal{L}_1\rangle = \frac{1}{\sqrt{2^h}} \sum_{l = 1}^{2^h} e^{i l\delta \nabla \mathcal{L}_1 } \vert l\rangle_{1}. \end{equation} \tag{ 25 }$

Then, the first client prepares a h-qubit ancillary state that encodes the same information as $\vert l\rangle_{1}$ above. This can be achieved by simply applying CNOT gates between the first h qubits in equation (25) and the ancillary h qubits. The resulting state reads

$\begin{equation} \vert \psi\rangle_{1,a} = \frac{1}{\sqrt{2^h}} \sum_{l = 1}^{2^h} e^{i l\delta \nabla \mathcal{L}_1 } \vert l\rangle_{1} \vert l\rangle_{1,a} \end{equation} \tag{ 26 }$

where the subscript a denotes ancilla. The first client then sends the h-qubit ancillary state to the second client via quantum communication. Similarly as the first client, the second client would first encode its local gradient information in another h-qubit state $\vert \nabla \mathcal{L}_2\rangle$ . In order to perform the summation of $\nabla \mathcal{L}_1$ and $\nabla \mathcal{L}_2$ , we consider the following operation for the second client: conditioned on the state of the received ancilla qubits, phase gates are applied on the local h qubit, such that the resulting ensemble state reads

$\begin{equation} \begin{split} \vert \psi\rangle_{12,a} & = C_{a}Z_{2} \frac{1}{\sqrt{2^h}} \sum_{l = 1}^{2^h} e^{i l\delta \nabla \mathcal{L}_1 } \vert l\rangle_{1} \vert l\rangle_{1,a} \vert \nabla \mathcal{L}_2\rangle \\ & = \frac{1}{\sqrt{2^h}} \sum_{l = 1}^{2^h} e^{i l\delta \left(\nabla \mathcal{L}_1 +\nabla \mathcal{L}_2\right) } \vert l\rangle_{1} \vert l\rangle_{1,a} \vert \nabla \mathcal{L}_2\rangle. \\ \end{split} \end{equation} \tag{ 27 }$

Note that the local state $\vert \nabla_k \mathcal{L}_2\rangle$ is not entangled with the rest of the system after the above operation. The second client would then pass the received h qubits to the next client and the above process repeats until all the m clients encode their local gradient information in the phase:

$\begin{equation} \vert \psi\rangle_{123\ldots m,a} = \frac{1}{\sqrt{2^h}} \sum_{l = 1}^{2^h} e^{i l\delta \sum_i^m \nabla \mathcal{L}_i } \vert l\rangle_{1} \vert l\rangle_{1,a} \vert \nabla \mathcal{L}_2\rangle \ldots \vert \nabla \mathcal{L}_{m-1}\rangle \vert \nabla \mathcal{L}_m\rangle. \end{equation} \tag{ 28 }$

The mth cient would return the ancillary qubits back to the first client, who will subsequently perform verification on the states to detect potential dishonesty of the involved parities. Specifically, the first client would first uncompute the ancillary qubits with CNOT gates, leading to

$\begin{equation} \vert \psi\rangle_{123\ldots m,a^{^{^{\prime}}}} = \frac{1}{\sqrt{2^h}} \sum_{l = 1}^{2^h} e^{i l\delta \sum_i^m \nabla \mathcal{L}_i } \vert l\rangle_{1} \vert 0\rangle_{1,a} \vert \nabla \mathcal{L}_2\rangle \ldots \vert \nabla \mathcal{L}_{m-1}\rangle \vert \nabla \mathcal{L}_m\rangle. \end{equation} \tag{ 29 }$

Then the ancillary qubits are measured. In the absence of malicious client that tries to extract the phase information of previous clients and perform projective measurements on the ancillary qubits in computational basis, the measurement should yield 0 for all the h qubits. As an example, we consider the case when a malicious client k tries to extract the aggregated phase $e^{i l\delta \sum_i^{k-1} \nabla \mathcal{L}_i }$ from the received state

$\begin{equation*} \frac{1}{\sqrt{2^h}} \sum_{l = 1}^{2^h} e^{i l\delta \sum_i^{k-1} \nabla \mathcal{L}_i } \vert l\rangle_{1,a}. \end{equation*}$

This malicious client k could perform an inverse QFT on the above state to get $\vert \sum_i^{k-1} \nabla \mathcal{L}_i \mod 2\pi\rangle_{1,a}$ . However, in this case, this h-qubit ancillary state cannot be reset to $\vert 0\rangle_{1,a}$ when the first client performs CNOT gates between it and $\vert l\rangle_1$ . That is, the first client would be able to detect this anomaly by checking the measurement results on the h-qubit ancillary state. Then, the first client would not run the remaining $\mathcal{O}(1/\epsilon)$ rounds of the process. Therefore, the malicious client k would not be able to estimate $( \sum_i^{k-1} \nabla \mathcal{L}_i \mod 2\pi)$ with an error bound ε.

Upon the verification, the first client would send the $\vert l\rangle_1$ state to the server, who can then apply inverse QFT to extract the accumulated gradient information.

$\begin{equation} \begin{split} \vert \psi\rangle_s & = \text{QFT}^{-1} \frac{1}{\sqrt{2^h}} \sum_{l = 1}^{2^h} e^{i l\delta \sum_i^m \nabla \mathcal{L}_i } \vert l\rangle_{1} \\ & = \vert \sum_i^m \nabla \mathcal{L}_i \mod 2\pi\rangle_1. \\ \end{split} \end{equation} \tag{ 30 }$

With the state $\vert \sum_i^m \nabla_k \mathcal{L}_i \mod 2\pi\rangle_1$ outlined above, the server could perform model aggregation and update the model accordingly. The same protocol applies for other parameters to be updated and different time windows. The main computational bottleneck in this protocol lies in the (inverse) QFT in equations (25) and (30), as well as the 2^h conditioned phase gates between the $h = \mathcal{O}(\log (m/\epsilon))$ -qubit ancillary state and the local h-qubit state held by each client.

We remark that as the state received by the server is $\frac{1}{\sqrt{2^h}} \sum_{l = 1}^{2^h} e^{i l\delta \sum_i^m \nabla_k \mathcal{L}_i } \vert l\rangle_{1}$ and the gradient aggregation has already been performed incrementally, the server could not extract the local gradient information held by individual clients. Moreover, the verification procedure could ensure that a malicious client could not simply perform measurement on the phase of the received qubits to extract the previously aggregated gradient information. To this end, the protocol relies on one trusted client node that can prepare the entangled state in equation (26) and send the final h-qubit state to the server. The efficiency of this incremental learning protocol might be improved by pre-assigning clients into multiple batches in which there is at least one trusted node.

We now discuss the quantum communication cost of the designed protocol. As discussed above, the h-qubit states are transmitted among all the m clients for each parameter to be updated. For p iterations of the process that are needed to yield a standard error ε on the phase, the communication complexity of this secure multiparty summation protocol is given by

$\begin{equation} \mathcal C_{SMS} = \left(m+1\right)\times d \times h \times p = \mathcal{O} \left(\frac{md}{\epsilon} \log\frac{m}{\epsilon}\right) \end{equation} \tag{ 31 }$

where the $\log(\frac{m}{\epsilon})$ term is associated with the error that comes from assigning $\nabla_k \mathcal{L}(\boldsymbol{\theta^t}) \in \{0, \delta, 2\delta, \ldots , 2\pi \}$ .

Alternatively, one may consider encoding all the d gradient information in a superposition state with $\lceil \log d \rceil$ qubits to reduce communication cost in terms of number of parameters d. However, as the server would need to update the d parameters separately, at least d samplings are required to query the encoded gradient information hence the total communication complexity in d would still be $\tilde{\mathcal{O}}(d)$ .

5. Discussions

To this end, we have presented four different quantum protocols to address the privacy concern in FL problem. The best quantum schemes involve a total communication complexity $\tilde{\mathcal{O}}(\frac{md}{\epsilon})$ , while the secret sharing based classical FL protocol has a communication cost $\mathcal{O}((m + m^2)d)$ . In the scenario of large client number m (which is the case for typical FL settings), the quantum protocols could have better performance in communication resource overhead than classical approach when $m \gg 1/\epsilon$ . Indeed, when the goal is to achieve a statistical variance $\epsilon^2 = \text{var}(\sum_{k = 1}^m w_k \nabla \mathcal{L}_k) \propto 1/m$ in the gradient aggregation for model parameter update, the quantum schemes proposed in this work (including the blind QBC-based protocol and multiparty quantum protocol) have a total communication complexity $\tilde{\mathcal{O}}(m^{3/2}d)$ that is superior to the classical counterpart.

We remark that the above protocols leveraging quantum communication can be integrated with common quantum cryptography techniques [10, 30, 43, 44] to be secure against external attacks. As an example, we consider using decoy state [43] to detect eavesdropping attacks: when a quantum state with n data qubits is sent to another party via a quantum channel during the QFL protocols, decoy states are randomly inserted and sent along with data qubits.

More specifically, when a n-qubit state is transmitted, we consider $n_d = \mathcal{O}(n)$ decoy qubits that are randomly drawn from $\{\vert 0\rangle,\vert 1\rangle,\vert +\rangle,\vert -\rangle \}$ by the sender. The receiver receives the data and decoy qubits from the quantum channel, as well as positions and encoding basis of decoy qubits from a separated classical channel. After measuring the decoy qubits in the instructed basis, the receiver transmits the measurement results to the sender, who will then calculate the error rate and detect the potential existence of external eavesdropper. In this simple case, for a given decoy state, the probability that the eavesdropper performs a measurement on it without being detected is simply $\frac{3}{4}$ and the probability drops exponentially when there are n_d uncorrelated decoy qubits. Advanced decoy-state quantum key distribution techniques [10] can be implemented to enhance the protocol's resilience against third-party attacks. In this scenario, while the protocol demonstrates the capability to detect eavesdropper attacks, it incurs an additional cost in communication complexity. Specifically, there is an extra classical and quantum communication cost of $\mathcal{O}(n_d)$ between each sender and receiver pair.

It is important to emphasize that the proposed QFL protocols do not depend on a variational quantum circuit for gradient generation. Instead, gradient information can be produced using a classical neural network, thereby reducing the quantum capability demands on both the server and clients. Furthermore, in contrast to numerous classical FL algorithms that may face a trade-off between privacy loss and utility loss [45], the quantum protocols presented in this study do not compromise privacy for diminished utility, such as reduced accuracy.

6. Conclusion

In conclusion, we design gradient-hidden protocols for secure FL to protect against gradient inversion attacks and safeguard clients' local information. The proposed algorithms involve quantum communication among a server and clients, and we analyze both privacy and communication costs. The secure inner product estimation protocol based on BQBC relies on transmitting a logarithmic number of qubits to reduce the information server could query. We devise an efficient redundant encoding method to improve privacy further. For the incremental learning protocols, we consider both phase encoding based on globally entangled state and secure multi-party summation of gradient information to prevent the server from learning individual gradients from clients. We further discuss the quantum and classical communication costs involved in each protocol.

Our present study suggests numerous potential avenues for future research. Firstly, while the proposed protocols primarily address adversaries in the form of a malicious or honest-but-curious server, there is a need to develop secure protocols tailored to scenarios involving a dishonest majority, encompassing malicious clients [46, 47]. Secondly, the protocols proposed herein can be extended and applied to other secure distributed quantum computing tasks, such as quantum e-voting protocols [48, 49]. Furthermore, our work would motivate subsequent efforts aimed at achieving quantum communication advantages while preserving privacy advantages over classical counterparts in practical distributed machine learning tasks [6, 50]. To this end, our work sheds light on designing efficient quantum communication-assisted distributed machine learning algorithms, studying quantum inherent privacy mechanisms, and paves the way for secure distributed quantum computing protocols.

Acknowledgments

The authors thank Shaltiel Eloul, Jamie Heredge and other colleagues at the Global Technology Applied Research Center of JPMorgan Chase for support and helpful discussions.

Data availability statement

No new data were created or analysed in this study.

Disclaimer

This paper was prepared for informational purposes by the Global Technology Applied Research center of JPMorgan Chase & Co. This paper is not a product of the Research Department of JPMorgan Chase & Co. or its affiliates. Neither JPMorgan Chase & Co. nor any of its affiliates makes any explicit or implied representation or warranty and none of them accept any liability in connection with this paper, including, without limitation, with respect to the completeness, accuracy, or reliability of the information contained herein and the potential legal, compliance, tax, or accounting effects thereof. This document is not intended as investment research or investment advice, or as a recommendation, offer, or solicitation for the purchase or sale of any security, financial instrument, financial product or service, or to be used in any way for evaluating the merits of participating in any transaction. Zhixin Song's contributions were made as part of his internship at Global Technology Applied Research in JPMorgan Chase.

Appendix A: Gradient-inversion attack in federated learning

In this section, we expand upon the argument presented in section 2.2, namely a major data leakage issue with the standard classical FL approach. This vulnerability arises from the susceptibility to a gradient inversion attack, wherein an honest-but-curious server can successfully reconstruct the original data from the received gradients [17, 18]. Here we analyze an example case by looking at the vulnerability of recovering averaged data information where the model considered for training is the fully connected neural network layer. To simplify the analysis, we consider a dense linear layer containing $\mathbf{x} = x_1 \ldots x_n$ as input and $y \in \mathcal{C}$ as output [51], where the dense layer is $o_j = \sum_{i = 1}^n w_{ij}x_i + b_j$ . Given a known y, x can be inverted successfully, and any additional hidden layers can be inverted by back-propagation.

Consider a typical classification architecture that uses softmax activation function, $p_k = \frac{e^{o_k}}{\sum_j e^{o_j}}$ to create the model label $y(\mathbf{w}, \mathbf{b}) = c_{\text{arg max}(p_k)}$ (here k is the index for label class), followed by the cross entropy to obtain the cost value,

$\begin{equation} \mathcal L\left(p, y\right) = -\sum_{k}^{|\mathcal{C}|} y_k \log p_k \end{equation} \tag{ A1 }$

where $|\mathcal{C}|$ is the number of output classes. The derivative of p_k with respect to each o_j is,

$\begin{equation} \frac{\partial p_k}{\partial o_j} = \begin{cases} & p_k\left(1 - p_j\right), \hspace{2mm} k = j \\ & -p_kp_j, \hspace{2mm} k\neq j \end{cases}. \end{equation} \tag{ A2 }$

Now we can calculate the derivative of the cost function with respect to the weights and biases via backpropagation,

$\begin{align} & \frac{\partial \mathcal L}{\partial w_{i, j = k}} = \left(p_j - y_j\right)x_i, \end{align} \tag{ A3 }$

$\begin{align} &\frac{\partial \mathcal L}{\partial b_{j = k}} = p_j - y_j. \end{align} \tag{ A4 }$

From this, it can be seen that the number of gradient equations shared with the server would be $n|\mathcal{C}| + |\mathcal{C}|$ while the number of unknowns is $n + |\mathcal{C}|$ . For example, from the above equations, it can be seen see that x_i can be found from any j, using,

$\begin{equation} \frac{\partial \mathcal L}{\partial w_{i,j = k}}/\frac{\partial \mathcal L}{\partial b_{j = k}} = x_i \end{equation} \tag{ A5 }$

In the above setting we saw that for batch size B = 1, the number of unknowns is less than the number of equations and thus the unknowns can be trivially recovered from the system of equations generated by the dense linear layer of the neural network.

It turns out there is a feasible attack even if we consider the mini-batch size training with B > 1. Here, the client trains with the inputs $\text{Samp} : = [(\mathbf{x}_\kappa, y_\kappa)_{\kappa \in \text{Samp}}]$ and only shares the averaged gradient information (over the data points with $B = |\text{Samp}|$ ) with the server,

$\begin{align} & \frac{\partial \mathcal L}{\partial w_{i, j = k}} = \frac{1}{B} \sum_{\kappa \in \text{Samp}}\left(p_{\kappa j} - y_{\kappa j}\right)x_{\kappa i}, \end{align} \tag{ A6 }$

$\begin{align} &\frac{\partial \mathcal L}{\partial b_{j = k}} = \frac{1}{B} \sum_{\kappa \in \text{Samp}}p_{\kappa j} - y_{\kappa j}. \end{align} \tag{ A7 }$

In this scenario, the number of equations shared is still $n|\mathcal{C}| + |\mathcal{C}|$ , whereas the number of unknowns is now $B(n + |\mathcal{C}|)$ . Thus the number of unknowns can now exceed the number of equations, resulting in no unique solution for the server when attempting to solve the system of equations. Even in the case that a unique solution exists, numerical optimization can be challenging. However, In cases where the softmax follows the cross-entropy, the authors in [18] demonstrate that an accurate direct solution can be achieved in many instances, even when $B \gg 1$ . This is attributed to the demixing property across the batch, facilitating the server's ease in retrieving the data points in Samp.

Appendix B: Protection mechanisms in classical federated learning

To safeguard clients' data from diverse adversaries, including gradient inversion attacks, classical FL protocols have implemented multiple protection mechanisms. In this context, we provide a concise review of these techniques.

Hmorphic encryption (HE) [22, 52] stands out as a widely employed encryption technique for privacy protection, enabling the aggregation of local gradient information directly on encrypted data without the need for decryption. However, the practical application of HE faces challenges due to the substantial computational and communication overhead it introduces, particularly for large models [53].

Another prevalent technique involves the application of differential privacy [23, 54–57], wherein Laplace noise or Gaussian noise is typically added to gradient information. While this method is straightforward and minimally impacts communication and computation costs, it may compromise privacy and result in diminished model utility.

Additionally, secret sharing techniques [31, 58] have been developed to distribute a secret among a group of participants, such as sharing local gradient information with the server using secret sharing among clients. Nevertheless, this approach necessitates extensive message exchange, incurring a communication overhead that may be impractical in many FL settings. A concrete example of such techniques is presented in section 3.1 of the main text.

Privacy-preserving quantum federated learning via gradient hiding

Article metrics

Submit

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Problem statement

2.1. Federated learning setup

2.2. Data leakage in classical FL

3. Protocol I: secure inner product estimation

3.1. Baseline: classical secret sharing assisted inner-product estimation

3.2. Model aggregation with blind quantum bipartite correlator algorithm

3.3. Redundant encoding

4. Protocol 2: incremental learning

4.1. Secure aggregation based on global entanglement

4.2. Secure multiparty gradient summation

5. Discussions

6. Conclusion

Acknowledgments

Data availability statement

Disclaimer

Appendix A: Gradient-inversion attack in federated learning

Appendix B: Protection mechanisms in classical federated learning

Privacy-preserving quantum federated learning via gradient hiding

Article metrics

Submit

Share this article

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Problem statement

2.1. Federated learning setup

2.2. Data leakage in classical FL

3. Protocol I: secure inner product estimation

3.1. Baseline: classical secret sharing assisted inner-product estimation

3.2. Model aggregation with blind quantum bipartite correlator algorithm

3.3. Redundant encoding

4. Protocol 2: incremental learning

4.1. Secure aggregation based on global entanglement

4.2. Secure multiparty gradient summation

5. Discussions

6. Conclusion

Acknowledgments

Data availability statement

Disclaimer

Appendix A: Gradient-inversion attack in federated learning

Appendix B: Protection mechanisms in classical federated learning