During the fine-tuning process, certain layers in the source-domain model can extract features that are shared between the source and the target domain. These helpful network layers for the target task should be frozen, while the network layers specific to certain tasks should be fine-tuned to help the model adapt to the target task. Therefore, it is crucial to choose the right layers for fine-tuning in order to achieve optimal model performance. However, due to the large number of source-domain models, each model may have dozens or even hundreds of layers, making it difficult to efficiently select the layers that need to be fine-tuned. To address this problem, we propose an adaptive multi-source domain layer selection strategy (AMLS) that can automatically search for appropriate layer fine-tuning schemes from the vast model layers based on the differences between the target and the source domain. By employing this strategy, we can effectively identify the layers that require fine-tuning, rather than indiscriminately fine-tuning all layers, thereby enhancing the model performance on the target task.
We assume that the layers near the bottom of the source-domain model help extract shared features between the source domain and the target domain. Accordingly, we keep these layers frozen, while considering the layers closer to the top of the source-domain model (which are more susceptible to specific tasks) for fine-tuning. This is accomplished by combining the characteristics of individual models and utilizing the particle swarm optimization (PSO) algorithm to search for the optimal number of fine-tuning layers for multiple source-domain models. PSO is currently one of the most popular metaheuristic algorithms, as it is simple to implement and has strong global search capabilities [
34]. When selecting fine-tuning layers for multiple source-domain models, we assume that there are
N such models, denoted as
.
represents the
jth source-domain model, and
represents the
kth layer parameters of the model
.
represents the total number of layers in model
. The encoding form of the individuals (candidate solutions) in the problem of selecting fine-tuning layers for multiple source-domain models can be described as follows:
where
represents the
ith population member, as well as the fine-tuning scheme for the corresponding layers of
N models.
represents the number of fine-tuned layers in the
jth source domain model, and
. Moreover, when
, it indicates that all layers of
are fine-tuned. Conversely, when
, it indicates that all layers of
are frozen. In order to evaluate the quality of each individual, we use the classification accuracy of the target task corresponding to individual
as the fitness value, expressed as follows:
where
where
represents a mask matrix that corresponds one-to-one with the dimensions of model
, which is mainly used to freeze or fine-tune parameters;
denotes the model fine-tuned by the source-domain model
, using the mask matrix
; and
represents the mask value corresponding to all parameters
in the
kth layer of the source-domain model
. When
, it indicates that the parameters in this layer are frozen (i.e., not being fine-tuned), whereas
indicates that the parameters in this layer need to be fine-tuned to adapt to the target task. The value of
is determined by the individual
.
represents the multi-source domain collaborative loss function (MC-loss) proposed in this paper (as discussed in
Section 3.2).
represents the classification accuracy of the target task
, which is obtained by fine-tuning the corresponding layer parameters based on individual
in the framework of
Figure 2. To reduce computational cost, the number of fine-tuning epochs for each individual is manually set to the
(in this case, it is set to 5).
In the PSO, each particle has its own position (candidate solution)
and velocity
, which are continuously updated based on the guidance of individual historical best and the global best of the group, respectively, by applying the following expressions:
where
and
denote the position and velocity, respectively, of the
ith individual in the
tth generation population within the
dth dimension;
and
respectively represent the historical optimal position of the
ith individual on the
dth dimension and the global optimal position of the population;
and
are random numbers ranging from 0 to 1;
denotes the inertia weight; and
and
represent the individual learning factors.
The AMLS is described in Algorithm 1. As can be seen from the presented steps, we first randomly generate a population with
n individuals and initialize the velocity
and the position
(line 1). Each individual’s position
represents the layer fine-tuning scheme corresponding to
N source-domain models. Next, the individual best solution
and the population best solution
of
n individuals are initialized respectively (line 2). Then, the masking matrix
corresponding to
N source-domain models is calculated based on each individual’s position information (lines 6-8), and the
N source-domain models
are fine-tuned
times using the MC-loss function. The fitness value of each individual is calculated separately using Eq. (
2), and the individual optimal position
and the population optimal position
are updated (lines 9-13). The current population optimal position
is assigned to the optimal layer fine-tuning scheme
using Eq. (
5) and Eq. (
6) to update the velocity and position of the individuals. After
G iterations of updates, the optimal layer fine-tuning scheme
for
N source-domain models is obtained.
Algorithm 1: Adaptive multi-source domain layer selection strategy. |
Input:The source-domain models ; target data ; population size n; the max iteration number G of the population.
Randomly generate populations with n individuals: .
-
Initialize the individual optimal solution: ; the population optimal
solution: .
Calculate mask matrix according to using Eq. ( 4).
Fine-tune model epochs on target data using Eq. ( 3).
end for
Calculate fitness value of jth individual using Eq. ( 2).
if better than then
.
if better than then
.
end if
end if
end for
.
Update all individual velocity v values using Eq. ( 5).
Update all individual position x values using Eq. ( 6).
end for
Output:the optimal layer fine-tuning scheme .
|