1. Introduction
Vehicle re-identification (Re-ID) refers to judging whether vehicle images captured in non-overlapping areas belong to the same vehicle in a traffic monitoring scene within a specific range.Recently, vehicle re-identification methods based on supervised learning have made great progress[
1,
2,
3,
4] . However,the supervised learning method mainly has the following problems:1)Extremely dependent on complete labels, that is, the labels of training data from multiple non-overlapping cameras,annotating all large-scale unlabeled data, which is time-consuming and labor-intensive; 2) These methods perform well in the original task (source domain),but when deployed in a new environment (target domain), the performance will drop significantly due to the presence of domain bias.
To overcome these problems,Researchers began to focus on the research of vehicle re-identification methods based on unsupervised domain-adaptive[
5,
6,
7,
8,
9,
10], that is, trying to transfer images from a well-marked source domain datasets to an unlabeled target domain datasets through knowledge transfer. Isola et al.[
11] used the generated images to train the ReID model earlier, by preserving the identity information from the well-labeled domain, while learning the style of the unlabeled domain to improve the performance of the model in the unlabeled target domain. Peng et al.[
12]proposed a Progressive Adaptation Learning algorithm for vehicle re-identification. This method utilizes the source domain to generates "pseudo-target samples" through the Generative Adversarial Network (GAN). It employs a dynamic sampling strategy during training to mitigate domain discrepancies. Zheng et al.[
13] proposed a viewpoint-aware clustering algorithm. It leverages a pre-trained vehicle orientation predictor to predict the orientations of vehicles, assign directional pseudo-labels, and first clusters vehicles with the same perspective. Subsequently, it clusters vehicles with different perspectives, thereby enhancing the performance of the vehicle re-identification model. Wang et al.[
14]proposed a progressive learning method named PLM for vehicle re-identification in unknown domains. The method utilizes domain adaptation and multi-scale attention network to smooth domain bias, trains a reID model, and introduces a weighted label smoothing loss to improve performance. The above methods preserve the identity information from a well-labeled source domain while learning the style of the unlabeled target domain, but usually face the problem that when learning an adaptive model on the target domain, it is unavoidable to visit the source domain to generate new samples for subsequent fine-tuning. However, due to data ownership and privacy issues in different domains, inter-domain data cannot be communicated in many cases, and the target domain model cannot directly access the source domain data, and the adaptive performance of the model will be greatly affected.
To this end,this paper proposes an unsupervised domain-adaptive vehicle re-identification method based on source-free knowledge transfer, that is, a target domain sample is given, and the migration image of the sample is obtained through the generator, and the two images are respectively Provided to the target model and the source model, let the difference between the image pair compensate for the knowledge difference between the domain models, so that the output of the two domain models is similar, and then train the generator by constraining the output similarity, and the target domain data can be Transformed to have the style of the source domain. Therefore, this "source-like samples" can replace the role played by the source domain data in the model adaptation of the target domain, and since the generated sample content is provided by the target domain, it is more affinity in the process of model adaptation, which helps Solve the problem that the target domain cannot access the data of the source domain. The method can be divided into two stages:
(1)In the first stage, a source-free knowledge transfer module is built, which only uses the source domain model and the target domain model trained by unlabeled target domain data as supervision, and trains a generator to generate "source-like samples" without accessing the source domain data. "source-like samples" match the style of the source domain and the content of the target domain.
(2)In the second stage, a progressive joint training strategy is adopted to gradually learn the adaptive model by inputting different proportions of "source-like samples" and target domain data. This process can be regarded as a means of data expansion. Compared with the target domain directly acting on the source domain model, the "source-like samples" containing source domain knowledge can be more compatible with the model, and the domain can be effectively reduced in iterative training. In order to achieve the purpose of improving the generalization performance of the model.
The contributions of this paper can be summarized as follows:
(1)We propose an unsupervised domain-adaptive vehicle re-identification method based on source-free knowledge transfer. It does not need to access the source domain data, and uses the hidden domain difference information in the source domain model and the target domain model to constrain the generator to generate "source-like samples". Samples are used as a data augmentation method for vehicle re-identification tasks to assist model training.
(2) We propose a progressive joint training strategy of "source-like samples" and the target domain. "source-like samples" adapt to the source domain model in the same style, and match the target domain data with the same content, as the source domain model and target domain data. An intermediate hub for adaptation, which mitigates domain differences and thus improves model performance.
2. Method
In this section, we give a detailed description of the proposed method.The schematic diagram of the method is shown in
Figure 1, Only need to be given a source model and a target model, through the source-free knowledge transfer module constructed in this paper, a generator
can be trained to generate "source-like samples"
.This sample contains source domain knowledge, which is more affinity than the target domain directly acting on the source domain model. It can be used as data expansion to assist in training the target model, which helps to improve model performance. Then, a progressive joint training strategy is further adopted to use this sample and target domain data for training, and by controlling the proportion of "source-like samples" and the original sample in the target domain, the model performance may be degraded due to the high proportion of noise samples. Because the method in this paper does not need to access the source domain data, it can overcome the limitation that the existing unsupervised domain-adaptive methods need to access the source domain, and avoid the security and transmission problems that may be caused by accessing the source domain data.
2.1. Pretrained source and target models
Among the existing unsupervised domain-adaptive methods[
15], the usual practice is to train the model
on the source domain first, where
represents the parameters of the current model,and then transfer the model to the target domain for learning.
The method in this paper does not need to access the source data, and its source domain model is expressed as:
The target domain model is expressed as:
The experimental steps are as follows:
First, access the source domain to train the source model as well as a learnable source domain classifier ,where represents the number of sample identities.
Second, optimize using the identity classification loss
and triplet loss
[
16] composed of the cross-entropy loss function
,as shown in equations (3) and (4):
where
represents the
-norm;the subscripts
and
represents the positive sample and the negative sample of the
th sample, respectively; and
represents the distance margin of triplet loss. The overall loss
is therefore calculated as:
where
represents the weight of two losses. After obtaining the source model, we learn a target model by loading the source model parameters, clustering the target domain, and then predicting the pseudo labels
.The overall loss
is therefore calculated as:
2.2. Source-free image generation module
The existing unsupervised domain adaptive vehicle re-identification methods usually need to access the source domain data, and transfer the well-marked source domain data to the unmarked target domain style through style-based transfer[
17,
18] or generative confrontation network[
19,
20], so as to smooth domain bias to better apply source domain models to target domain data.However, when some data has security and privacy restrictions, it is extremely difficult to access the source domain data. In order to solve this problem, this paper constructs a source-free image generation module, which aims to force the generator to generate "source-like samples" from the target domain data with the style of the source domain through the implicit domain information constraints in the model, generated to bridge the knowledge gap between models.
As shown in
Figure 2, the first line is the target samples image, and the second line is the "source-like samples" obtained by the target image through the source-free image generation module. The biggest feature of this sample is that its content matches the target domain data, and its style can match the source domain model. It can act as a bridge between source and target domains.In the subsequent model optimization process, the joint training of this sample and the target domain sample can effectively improve the generalization performance of the vehicle re-identification model.
The source-free image generation module aims to train the image generator
to generate "source-like samples" with the style of the source domain through the domain information implicit in the model. These samples replace the source domain data to complete the matching source model. To describe the knowledge adapted in "source-like samples", in addition to the traditional knowledge distillation loss
a channel-level relational consistency loss
is also introduced. It can focus on the relative channel relationship between feature maps of the target domain and "source-like samples". Therefore, the total loss
is shown in Equation (
7):
In the following sections, the details of the two losses will be introduced in detail.
2.2.1. knowledge distillation loss
In our proposed source-free image generation network, we utilize a combination
of the source model and generator to describe the knowledge adapted in the target model. This approach can be considered as a special application of knowledge distillation. Our aim is to extract knowledge differences between two domains into the generator. In this case, we compose the knowledge distillation loss
by the output
obtained by feeding “source-like samples” into the source model and the output
obtained by feeding target domain samples into the target model.
where
represents the Kullback–Leibler divergence(KL).
2.2.2. Channel-level relational consistency loss
In unsupervised domain-adaptive tasks, it is usually assumed that there is a fixed classifier, so it can be considered that the global features obtained by the target domain through the target model should be similar to the global features obtained by "source-like samples" through the source model. To promote similar channel-level relationships between feature maps and , a relation consistency loss is used to constrain.
Previous knowledge distillation work is usually constrained by maintaining batch-level or pixel-level relationships[
21,
22]. However, this constrained approach is not suitable for the current task. First of all, the batch-level relationship cannot well supervise the generation task of each image, which will cause damage to the generated effect. Second, the effectiveness of pixel-level relationships will be greatly reduced after global pooling. Compared with the two, the channel-level relationship[
23] is computed on a per-image basis and is not affected by global pooling.Therefore, the channel-level relationship is more suitable for computing
.
Given the feature map
of “source-like samples” and the feature map
of target domain, we resize them into feature vectors
and
, as shown in Equation (
9) and Equation (
10):
where
, and
W, represent the feature map depth(number of channels), height, and width, respectively. Next,we compute their channel-level self correlation, the gram matrix, as shown in Equation (
11):
where
. Like other similarity-preserving losses for knowledge distillation, we apply row-level
norm, as shown in Equation (
12):
where
represents row
i in the matrix. Finally, the channel-level relational consistency loss
is the mean squared error(MSE) between the normalized Grann matrices,as shown in Equation (
13):
2.3. Progressive joint training strategy
In the previous section, the target domain data was used to generate "source-like samples" through the source-free knowledge transfer module. Since this sample combines the style of the source domain and the vehicle content of the target domain, it can be used for training to improve the target The performance degradation caused by deploying domain data to the source domain model improves the generalization performance of the model. In addition, during the process of model adaptation, due to issues such as image style and image quality, "source-like samples" may be regarded as noise by the model. Therefore, a progressive joint training strategy is introduced to control the feeding ratio of "source-like samples" and the target domain, which can effectively prevent the one-time input of "source-like samples" from causing too much noise and destroying the model performance.In addition, as the proportion of "source-like samples" increases, the adaptability of the model to the fusion of two samples is also enhanced, so that more discriminative features are learned on the target domain. The maximum ratio of "source-like samples" to the target domain is 1:1.
During the training process, the "source-like samples" and target domain data are processed by using the pre-trained source model of vehicle re-identification with good performance to output high-dimensional features. Most of the previous methods choose K-Means to generate clusters, which need to be initialized by cluster centroids. However, it is uncertain how many categories are required in the target domain. Therefore, DBSCAN[
24] is chosen as the clustering method. Specifically, instead of using a fixed clustering radius, this paper adopts a dynamic clustering radius calculated by K-Nearest Neighbors (KNN). After DBSCAN, in order to filter noise, some of the most reliable samples are selected for soft label assignment according to the distance between sample features and cluster centroids. For the proposed method, samples satisfying
are used for the next iteration, where
is the feature of the
i-th image,
is the feature of the centroid of the cluster to which
belongs, and
represents the metric radius belonging to the same category.
Author Contributions
Conceptualization, Z.S. and D.L.; methodology, Z.S.,D.L. and Z.C.; software, D.L. and Z.C.; validation, Z.S., D.L. and Z.C.; formal analysis, Z.S.; investigation, Z.S.; resources, Z.S.; data curation, D.L. and Z.C.; writing—original draft preparation, D.L. and Z.C.; writing—review and editing, Z.S.; visualization, W.Y.; supervision, Z.S. and W.Y.; project administration, Z.S. and W.Y.; funding acquisition, W.Y. All authors have read and agreed to the published version of the manuscript.