In this section, we introduce the SFL framework, encompassing both client and server-side models. Additionally, we present the CSE-SFL algorithm designed by [
4] to mitigate communication overhead. Accordingly, we split the model as
where
denotes the client-side model, and
indicates the server-side model. We introduce
as a client-side model including the auxiliary network where
indicates the model for the auxiliary network.
We denote
as the output of the forward propagation of the client
i’s model,
, on its local random data sample,
, which is intended to be transmitted to the server at specific intervals including the true labels
corresponding to the local random data sample. Note that the sampled data at the client is not shared with the server but the true labels. Similarly,
indicates the backward propagation model of the server for client
i. Accordingly,
corresponds to the backward propagation results obtained by the auxiliary network. In more detail, the client
i performs forward propagation up to the splitting layer and transmits the output of this layer, along with the true labels, to the server. The server then continues forward propagation through to the final layer and computes the loss function. Subsequently, the server performs backward propagation of the error and sends the gradients of its first layer back to the client. We consider
as the aggregated model at each global round
where
and
. Throughout this paper,
identifies the clients’ set which is indexed by
i. We employ two strategies for client participation. The first strategy entails all clients participating in the learning process. The second strategy involves the server randomly sampling a subset of size
n of clients with replacement,
, following a uniform distribution. We assume that
s are non-i.i.d. The derivative of local loss function of client
i in SFL setting with respect to
and
are indicated by
and
respectively. As for the server-side model, the derivative of the loss function is
which is with respect to
. The stochastic gradients of each of the aforementioned gradients will be distinguished by a
sign, e.g.,
where
is a random sample from client
i dataset. Note that
and
are the learning rates of client-side and server-side models respectively. Client
i trains
on its local dataset and renders the forward propagation results,
, to the auxiliary network at each local step
k and it receives the
in response. Note that
indexes the local steps. Additionally, the client sends the
to the server at each global round
t such that
where
l is a parameter determining the frequency of this process. We have one server performing the model aggregation at each global round, completing the forward propagation of clients, and updating the server model at specific global rounds. Algorithm 1 illustrates the proposed procedure by [
4] in detail.
Algorithm 1 CSE-SFL [4] |
- 1:
At Server
- 2:
Initialize , and
- 3:
for do
- 4:
Sample a subset of n clients out of m clients
- 5:
Receive
- 6:
Let and
- 7:
Broadcast and to clients
- 8:
if then
- 9:
for each client in sequence do
- 10:
Client()
- 11:
Complete forward propagation with , and
- 12:
Compute , the prediction of
- 13:
Compute loss function
- 14:
Complete backward-propagation
- 15:
Send to the client
- 16:
Update server model:
- 17:
end for
- 18:
end if
- 19:
end for
- 20:
Concatenate and
- 21:
At Clients :
- 22:
for all clients in parallel at round t do
- 23:
,
- 24:
if then
- 25:
- 26:
Send and to the server
- 27:
,
- 28:
Complete backward-propagation with
- 29:
Client update:
- 30:
Auxiliary update:
- 31:
for local step do
- 32:
Compute forward propagation with and
- 33:
Compute local loss
- 34:
Client update:
- 35:
Auxiliary update:
- 36:
end for
- 37:
else
- 38:
for local step do
- 39:
Compute forward propagation with and
- 40:
Compute local loss
- 41:
Client update:
- 42:
Auxiliary update:
- 43:
end for
- 44:
end if
- 45:
Return to the server
- 46:
end for
|