Kernel Geometric Mean Metric Learning

Preprint

Article

Kernel Geometric Mean Metric Learning

Altmetrics

Downloads

145

Views

Comments

A peer-reviewed article of this preprint also exists.

Zixin Feng,Teligeng Yun,

Yu Zhou^*,Ruirui Zheng,

Jianjun He

Zixin Feng,Teligeng Yun,

Yu Zhou^*,Ruirui Zheng,

Jianjun He

This version is not peer-reviewed

Submitted:

29 July 2023

Posted:

01 August 2023

You are already at the latest version

Alerts

Abstract

This paper propose a kernel geometric mean metric learning (KGMML) algorithm. The basic idea is to obtain the closed-form solution of the geometric mean metric learning (GMML) algorithm in the high-dimensional feature space determined by the kernel function. Then, the solution is generalized as a form of kernel matrix by using the integral representation of the weighted geometric mean and the Woodbury matrix in this new feature space. Experimental results on 15 datasets show that the proposed algorithm can effectively improve the accuracy of the GMML algorithm and other metric algorithms.

Keywords:

Subject: Computer Science and Mathematics - Artificial Intelligence and Machine Learning

1. Introduction

Analyzing the modeling process of machine learning algorithms, it is clear that the construction of a learning algorithm requires a similarity metric between sample pairs given. It is well known that the distance measure is one of the most commonly used measures to describe the similarity between samples. At present various distance metrics have been proposed such as Euclidean distance and Mahalanobis distance. However, these distance metric expressions are fixed, i.e., there are non-adjustable parameters, which result in different effectiveness of dealing with various problems. Thus, an effective distance metric is proposed by constructing learning from training samples. From the definition of the distance metric, it follows that any binary function

d (x_{i}, x_{j})

defined in the feature space is called a distance function, provided that the four conditions of symmetry, self-similarity, non-negativity, and trigonometric inequality are satisfied simultaneously. Thus any binary function

d_{M} (x_{i}, x_{j}) = {(x_{i} - x_{j})}^{T} M (x_{i} - x_{j}),

(1)

is a distance function determined by any symmetric positive definite (SPD) matrix M, where

x_{i}, x_{j}

are two samples from the training set

X

, and usually M is called a metric matrix. The purpose of metric learning is to use training samples to learn a metric matrix M such that the resulting distance function

d_{M} (x_{i}, x_{j})

can improve the performance of the learning algorithm or satisfy some application requirements. Thus, metric learning has wide applications in many fields, such as pattern recognition [1,2], data mining [3,4,5], information security [6,7], bioinformatics [8,9], and medical diagnosis [10,11,12].

Because of the wide application space, metric learning techniques have received a lot of attention and many excellent algorithms have been proposed methods. Xing [13] first proposed a metric learning algorithm. The main idea of the algorithm is to learn a metric matrix so that the distance between similar pairs of samples is small and the distance between dissimilar pairs of samples is large. The algorithm can make the distribution of similar pairs of samples in the new metric space more compact more compact in the new metric space, while the distribution of dissimilar pairs is more discrete. The proposal of this algorithm marked the real development of metric learning, and many subsequent research works were inspired by this algorithm. Davis [14] proposed an information theoretic metric learning algorithm. The basic idea of the algorithm is to assume the existence of an a priori metric matrix

M_{0}

, while ensuring that the distance between similar pairs of samples is less than a threshold and the distance between dissimilar pairs of samples is greater than a threshold. Minimize the relative entropy between the multivariate Gaussian distributions corresponding to M and

M_{0}

. Wang [15] proposed an information geometry metric learning algorithm. The basic idea of the algorithm is to use the category labeling information of the training samples to construct a kernel matrix that can reflect the desired distance between samples. Construct an actual kernel matrix describing the realistic distance relationship between the samples using the eigenvectors of the samples as well as the metric matrix M. The metric matrix M is then solved by minimizing the distance between these two kernel matrices. Weinberger [16] proposed a metric learning algorithm based on maximum margin. The basic idea is to define an interval function between similar samples of each sample and other classes of samples. The metric matrix is then solved by minimizing the distance between similar pairs of samples and maximizing the defined interval.

Geometric mean metric learning algorithm was proposed by Pourya [17] in 2016. The essence of most metric learning algorithms is to minimize the distance between similar pairs of samples rather than maximize the distance between similar pairs of samples, therefore the sign of the term corresponding to the dissimilar pair of samples in the objective function is usually negative, while the strategy of the geometric mean metric learning model is to use the inverse matrix of the metric matrix M to represent the distance between dissimilar pairs of samples. The advantage of this approach is that the sign of the item corresponding to the dissimilar pair of samples in the objective function will become positive, which reduces the difficulty of solving the model. Its objective function is as follows:

\begin{matrix} min_{M > 0} \sum_{(x_{i}, x_{j}) \in D^{+}} d_{M} (x_{i}, x_{j}) + \sum_{(x_{i}, x_{j}) \in D^{-}} d_{M^{- 1}} (x_{i}, x_{j}), \end{matrix}

(2)

where the similar pair sample set

D^{+}

and the dissimilar pair sample set

D^{-}

can be expressed as:.

\{\begin{matrix} D^{+} = \{(x_{i}, x_{j}) ∣ x_{i}, x_{j} are in the same class\}, \\ D^{-} = \{(x_{i}, x_{j}) ∣ x_{i}, x_{j} are in different class\} . \end{matrix}

(3)

Although GMML algorithm has some advantages, such as unconstrained convex objective function, closed form solution and interpretability, and faster calculation speed [18], it is actually a linear learning method and does not work well to address non-linear problems. Kernel methods are key technology for addressing nonlinear problems, and therefore kernel algorithms [19,20,21,22,23] have been proposed. The principle of the kernel method is that the original data transformed from the input space into a higher dimensional feature space by a mapping function. This transformation must be achieved to offer a reliable linear model in the feature space that corresponds to a nonlinear solution in the original data space. Gaussian kernel functions were used to map the data in the feature space. The closed form solution of the GMML algorithm is represented as a kernel matrix by using the integral representation of the weighted geometric mean and the Woodbury matrix. Then the KGMML algorithm is obtained. Thus the nonlinear problem can be handled effectively while retaining the advantages of the GMML algorithm. We define the objective function as the following form:

\begin{matrix} min_{M_{Φ} > 0} \sum_{(Φ (x_{i}), Φ (x_{j})) \in D_{Φ}^{+}} d_{M_{Φ}} (Φ (x_{i}), Φ (x_{j})) + \sum_{((Φ (x_{i}) - Φ (x_{j})) \in D_{Φ}^{-}} d_{{M_{Φ}}^{- 1}} (Φ (x_{i}), Φ (x_{j})), \end{matrix}

(4)

where a mapping is

Φ : R^{m} \mapsto H_{κ}

(Hilbert Space), i.e.,

x \in R^{m} \to

Φ (x) \in H_{κ}

M_{Φ}

is the metric matrix in

H_{κ}

space, the set of similar pairs

D_{Φ}^{+}

and the set of dissimilar pairs

D_{Φ}^{-}

are

\{\begin{matrix} D_{Φ}^{+} = \{(Φ (x_{i}), Φ (x_{j})) ∣ Φ (x_{i}), Φ (x_{j}) are in the same class\}, \\ D_{Φ}^{-} = \{(Φ (x_{i}), Φ (x_{j})) ∣ Φ (x_{i}), Φ (x_{j}) are in different class\} . \end{matrix}

(5)

The main innovation of the paper is to study kernel geometric mean metric learning algorithm for nonlinear distance using a kernel function. The key idea is to construct kernel matrices based on the distance metric for the given training data. Secondly, the accuracy of the proposed algorithm is superior to GMML algorithm and other metric algorithms. The structure of this paper is organized as follows: In Sect. 2, some lemmas of weighted geometric mean and Woodbury identity are formulated. Then, in Sect. 3, optimization problem and its solution were discuss, then extension to weighted geometric mean is discussed. The steup of the experiment and the analysis of parameter sensitivity are carried out in Sect. 4. Finally, our results are summarized in Sect. 5.

2. Preliminaries

In this section, three lemmas will be given in order to simplify the objective function (4) more clearly.

Lemma 1.

[24] For any

t \in (0, 1)

, A is

n \times n

positive definite and B is

n \times n

Hermitian, the following equation holds

A ♯_{t} B = \frac{2 sin (π t)}{π} A \int_{- 1}^{1} {(1 - s)}^{- t} {(1 + s)}^{t - 1} {((1 - s) I + (1 + s) B^{- 1} A)}^{- 1} d s,

which is the integral representation of the

A, B

weighted geometric mean, where I is the identity matrix.

Lemma 2.

[25] If A is a

n \times n

invertible matrix corrected by

U C V

where U is

n \times k

matrice, C is

k \times k

matrice, V is

k \times n

matrice, then Woodbury identity is

{(A + U C V)}^{- 1} = A^{- 1} - A^{- 1} U {(C^{- 1} + V A^{- 1} U)}^{- 1} V A^{- 1} .

Lemma 3.

[24] For any

t \in (0, 1)

, A is

n \times n

positive definite and B is

n \times n

Hermitian, the following equation holds

A ♯_{t} B = A {(B^{- 1} A)}^{- t} .

3. Main results

3.1. Optimization problem and its solution

In the following, the objective function (4) can be simplified in view of Eq.(1), one has

\begin{matrix} min_{M_{Φ} > 0} \sum_{(Φ (x_{i}), Φ (x_{j})) \in D_{Φ}^{+}} {(Φ (x_{i}) - Φ (x_{j}))}^{T} M_{Φ} (Φ (x_{i}) - Φ (x_{j})) \\ + \sum_{(Φ (x_{i}), Φ (x_{j})) \in D_{Φ}^{-}} {((Φ (x_{i}) - Φ (x_{j}))}^{T} M_{Φ}^{^{- 1}} ((Φ (x_{i}) - Φ (x_{j})) . \end{matrix}

(6)

Rewriting the Mahalanobis distance uses traces, Eq.(4) can be turned into the following optimization problem

\begin{matrix} min_{M_{Φ} > 0} \sum_{(Φ (x_{i}), Φ (x_{j})) \in D_{Φ}^{+}} tr (M_{Φ} (Φ (x_{i}) - Φ (x_{j})) {(Φ (x_{i}) - Φ (x_{j}))}^{T}) \\ + \sum_{(Φ (x_{i}), Φ (x_{j})) \in D_{Φ}^{-}} tr (M_{Φ}^{^{- 1}} (Φ (x_{i}) - Φ (x_{j})) {(Φ (x_{i}) - Φ (x_{j}))}^{T}) \\ = & min_{M_{Φ} > 0} tr (M_{Φ} S_{Φ}) + tr (M_{Φ}^{^{- 1}} D_{Φ}) \\ \overset{△}{=} & min_{M_{Φ} > 0} h (M_{Φ}), \end{matrix}

(7)

where

S_{Φ}

D_{Φ}

are SPD matrixes, and it is a realistic assumption in many situations [17]. Thus

S_{Φ}

D_{Φ}

can be expressed as

\begin{matrix} S_{Φ} & = \sum_{(Φ (x_{i}), Φ (x_{j})) \in D_{Φ}^{+}} (Φ (x_{i}) - Φ (x_{j})) {(Φ (x_{i}) - Φ (x_{j}))}^{T}, \\ D_{Φ} & = \sum_{(Φ (x_{i}), Φ (x_{j})) \in D_{Φ}^{-}} (Φ (x_{i}) - Φ (x_{j})) {(Φ (x_{i}) - Φ (x_{j}))}^{T} . \end{matrix}

(8)

Differentiating

h (M_{Φ})

with respect to

M_{Φ}

yields

\nabla h (M_{Φ}) = S_{Φ} - M_{Φ}^{- 1} D_{Φ} M_{Φ}^{- 1} .

(9)

Then,

\nabla h (M)

is set to 0, which implies that

M_{Φ} S_{Φ} M_{Φ} = D_{Φ},

(10)

it is clear that the above equation is a Riccati equation. Since

S_{Φ}

and

D_{Φ}

are positive definite matrices, Eq.(4) has a unique positive solution which is the midpoint of the geodesic joining

{S_{Φ}}^{- 1}

D_{Φ}

[26], that is

M_{Φ} = {S_{Φ}}^{- 1} ♯_{1 / 2} D_{Φ} = {S_{Φ}}^{- 1 / 2} {({S_{Φ}}^{1 / 2} D_{Φ} {S_{Φ}}^{1 / 2})}^{1 / 2} {S_{Φ}}^{- 1 / 2},

(11)

where

{S_{Φ}}^{- 1} ♯_{1 / 2} D_{Φ}

represents the geometric mean of

{S_{Φ}}^{- 1}

and

D_{Φ}

3.2. Extension to weighted geometric mean

In order to incorporate the weighted geometric mean, it is necessary to take into account the determination of weights for the objective function. When assigning linear weights for

S {_{Φ}}^{- 1}

and

D_{Φ}

, only the metric matrix

M_{Φ}

can be uniformly scaled by a constant factor. Consequently, it is illogical to assign linear weights to the two components in Eq.(2). Nevertheless, by employing nonlinear weights derived from SPD manifold Riemannian geometry, the weights can be transformed into trade-offs between the two terms. Thus, the Riemann distance

δ_{R}

is introduced, and the problem of finding the minimum value of

h (M_{Φ})

is equivalent to solving the minimum value of the following optimization problem

min_{M_{Φ} ⪰ 0} δ_{R}^{2} (M_{Φ}, {S_{Φ}}^{- 1}) + δ_{R}^{2} (M_{Φ}, D_{Φ}),

(12)

where the Riemannian distance between SPD matrices X and Y is denoted by

δ_{R}^{2} (X, Y) = {∥log (Y^{- 1 / 2} X Y^{- 1 / 2})∥}_{F}

, and

{∥.∥}_{F}

is the Frobenius norm of a matrix. A linear parameter

t \in [0, 1]

is introduced to trade off the relationship of the two terms in the formula

min_{M_{Φ} ⪰ 0} (1 - t) δ_{R}^{2} (M_{Φ}, {S_{Φ}}^{- 1}) + t δ_{R}^{2} (M_{Φ}, D_{Φ}) \overset{△}{=} min_{M_{Φ} > 0} h_{t} (M_{Φ}) .

(13)

Because

h_{t} (M_{Φ})

is still geodesically convex (see [17]), Eq.(13) has a unique positive solution, that is

\begin{matrix} M_{Φ} = {S_{Φ}}^{- 1} ♯_{t} D_{Φ} . \end{matrix}

(14)

For the convenience of subsequent calculations, the notations are defined as follow

\begin{matrix} P_{Φ} & = \sum_{(Φ (x_{i}), Φ (x_{j})) \in D_{Φ}^{+}} Φ (x_{i}) - Φ (x_{j}), \\ Q_{Φ} & = \sum_{(Φ (x_{i}), Φ (x_{j})) \in D_{Φ}^{-}} Φ (x_{i}) - Φ (x_{j}) . \end{matrix}

Theorem 1.

The solution

M_{Φ} = {S_{Φ}}^{- 1} ♯_{t} D_{Φ}

of the objective function (4) can be rewritten as

\begin{matrix} M_{Φ} = Q_{Φ} {({K_{P Q}}^{T} K_{P Q})}^{t - 1} {Q_{Φ}}^{T}, \end{matrix}

where the kernel matrix

K_{P Q} = 〈P_{Φ} \cdot Q_{Φ}〉

P_{Φ}^{T} Q_{Φ}

, and

〈\cdot〉

is denoted inner product.

Proof.

According to Lemma 1

\begin{matrix} M_{Φ} & = S_{Φ}^{- 1} ♯_{t} D_{Φ} \\ = \frac{2 s i n (π t)}{π} S_{Φ}^{- 1} \int_{- 1}^{1} {(1 - s)}^{- t} {(1 + s)}^{t - 1} ((1 - s) I + (1 + s) D_{Φ}^{- 1} S_{Φ}^{- 1})^{- 1} d s \\ = \frac{2 s i n (π t)}{π} \int_{- 1}^{1} {(1 - s)}^{- t} {(1 + s)}^{t - 1} ((1 - s) S_{Φ} + (1 + s) D_{Φ}^{- 1})^{- 1} d s . \end{matrix}

(15)

Substituting

S_{Φ} = P_{Φ} P_{Φ}^{T}, D_{Φ} = Q_{Φ} Q_{Φ}^{T}

into Eq.(15), it is clearly that

\begin{matrix} M_{Φ} & = \frac{2 s i n (π t)}{π} \int_{- 1}^{1} {(1 - s)}^{- t} {(1 + s)}^{t - 1} ((1 - s) P_{Φ} P_{Φ}^{T} + (1 + s) {(Q_{Φ} Q_{Φ}^{T})}^{- 1})^{- 1} d s . \end{matrix}

(16)

For convenience in the following discussion, we introduce the notation

G : = {((1 - s) P_{Φ} P_{Φ}^{T} + (1 + s) {(Q_{Φ} Q_{Φ}^{T})}^{- 1})}^{- 1} .

From Lemma 2, it follows that

\begin{matrix} \begin{matrix} G = {(A + U C V)}^{- 1} = A^{- 1} - A^{- 1} U {(C^{- 1} + V A^{- 1} U)}^{- 1} V A^{- 1}, \end{matrix} \end{matrix}

(17)

where

A : = (1 + s) {(Q_{Φ} Q_{Φ}^{T})}^{- 1}

U : = P_{Φ}

C : = (1 - s) I

V : = P_{Φ}^{T}

. Then, taking it into Eq.(16), one has

\begin{matrix} M_{Φ} & = \frac{2 s i n (π t)}{π} \int_{- 1}^{1} {(1 - s)}^{- t} {(1 + s)}^{t - 1} [\frac{Q_{Φ} Q_{Φ}^{T}}{1 + s} - \frac{Q_{Φ} Q_{Φ}^{T}}{1 + s} \cdot P_{Φ} {(\frac{I^{- 1}}{1 - s} + P_{Φ}^{T} \cdot \frac{Q_{Φ} Q_{Φ}^{T}}{1 + s} \cdot P_{Φ})}^{- 1} P_{Φ}^{T} \frac{Q_{Φ} Q_{Φ}^{T}}{1 + s}] d s . \\ = \frac{2 s i n (π t)}{π} \int_{- 1}^{1} {(1 - s)}^{- t} {(1 + s)}^{t - 1} [\frac{Q_{Φ} Q_{Φ}^{T}}{1 + s} - \frac{Q_{Φ}}{1 + s} \cdot {K_{P Q}}^{T} {(\frac{I^{- 1}}{1 - s} + \frac{K_{P Q} {K_{P Q}}^{T}}{1 + s})}^{- 1} \frac{K_{P Q} {Q_{Φ}}^{T}}{1 + s}] d s \\ = \frac{2 s i n (π t)}{π} \int_{- 1}^{1} {(1 - s)}^{- t} {(1 + s)}^{t - 1} Q_{Φ} [\frac{I}{1 + s} - \frac{{K_{P Q}}^{T}}{1 + s} {(\frac{I^{- 1}}{1 - s} + \frac{K_{P Q} {K_{P Q}}^{T}}{1 + s})}^{- 1} \frac{K_{P Q}}{1 + s}] Q_{Φ}^{T} d s, \end{matrix}

(18)

where

K_{P Q} = P_{Φ}^{T} Q_{Φ}

, and set

G : = \frac{I}{1 + s} - \frac{{K_{P Q}}^{T}}{1 + s} {(\frac{I^{- 1}}{1 - s} + \frac{K_{P Q} {K_{P Q}}^{T}}{1 + s})}^{- 1} \frac{K_{P Q}}{1 + s} .

In view of Lemma 2

G = A^{- 1} - A^{- 1} U {(C^{- 1} + V A^{- 1} U)}^{- 1} V A^{- 1} = {(A + U C V)}^{- 1},

(19)

where

A : = [(1 + s) I]

U : = K_{P Q}^{T}

C : = (1 - s) I

V : = K_{P Q}

. Thus,

\begin{matrix} M_{Φ} = & \frac{2 s i n (π t)}{π} \int_{- 1}^{1} {(1 - s)}^{- t} {(1 + s)}^{t - 1} Q_{Φ} ((1 + s) I + (1 - s) {K_{P Q}}^{T} K_{P Q})^{- 1} Q_{Φ}^{T} d s \\ = & Q_{Φ} \frac{2 s i n (π t)}{π} \int_{- 1}^{1} {(1 - s)}^{- t} {(1 + s)}^{t - 1} {((1 + s) {({K_{P Q}}^{T} K_{P Q})}^{- 1} ({K_{P Q}}^{T} K_{P Q}) + (1 - s) ({K_{P Q}}^{T} K_{P Q}))}^{- 1} d s Q_{Φ}^{T} \\ = & Q_{Φ} {({K_{P Q}}^{T} K_{P Q})}^{- 1} \frac{2 s i n (π t)}{π} \int_{- 1}^{1} {(1 - s)}^{- t} {(1 + s)}^{t - 1} {((1 - s) I + (1 + s) {({K_{P Q}}^{T} K_{P Q})}^{- 1})}^{- 1} d s Q_{Φ}^{T} \\ = & Q_{Φ} {({K_{P Q}}^{T} K_{P Q})}^{- 1} I ♯_{t} ({K_{P Q}}^{T} K_{P Q}) Q_{Φ}^{T} . \end{matrix}

(20)

From Lemma 3, it follows that

\begin{matrix} M_{Φ} = & Q_{Φ} {({K_{P Q}}^{T} K_{P Q})}^{- 1} {({({K_{P Q}}^{T} K_{P Q})}^{- 1})}^{- t} Q_{Φ}^{T} \\ = & Q_{Φ} {({K_{P Q}}^{T} K_{P Q})}^{t - 1} {Q_{Φ}}^{T} . \end{matrix}

(21)

□

From the definition of the kernel matrix yields

\begin{matrix} K_{i Q} & = 〈Φ (x_{i}) \cdot Q_{Φ}〉, \\ K_{j Q} & = 〈Φ (x_{j}) \cdot Q_{Φ}〉 . \end{matrix}

(22)

Theorem 2.

The distance

d_{M_{Φ}} (x_{i}, x_{j}) = {(Φ (x_{i}) - Φ (x_{j}))}^{T} M_{Φ} (Φ (x_{i}) - Φ (x_{j}))

can be rewritten as

d_{M_{Φ}} (x_{i}, x_{j}) = (K_{i Q} - K_{j Q}) {({K_{P Q}}^{T} K_{P Q})}^{t - 1} ({K_{i Q}}^{T} - {K_{j Q}}^{T}) .

Proof.

According to Theorem 1

\begin{matrix} d_{M_{Φ}} (x_{i}, x_{j}) = & {(Φ (x_{i}) - Φ (x_{j}))}^{T} Q_{Φ} {({K_{P Q}}^{T} K_{P Q})}^{t - 1} Q_{Φ}^{T} (Φ (x_{i}) - Φ (x_{j})) \\ = & (K_{i Q} - K_{j Q}) {({K_{P Q}}^{T} K_{P Q})}^{t - 1} ({K_{i Q}}^{T} - {K_{j Q}}^{T}) . \end{matrix}

(23)

□

Eq.(23) can be taken as the final result of kernel geometric mean metric learning algorithm. The algorithm is summarized in Algorithm 1.

Algorithm 1 Kernel Geometric Mean Metric Learning Algorithm

Input: Training set

S = \{(x_{i}, x_{j}) | x_{i}, x_{j} \in X\}

.
Parameter: t:

t \in [0, 1]

, the weight coefficient in Eq.(14),
p: kernel parameters of Gaussian kernel function.
Output:

d_{M_{Φ}}

the distance learned for KGMML,
Step1. According to Eq.(5), construct

D_{Φ}^{+}

and

D_{Φ}^{-}

.
Step2. Compute kernel matrices

K_{P Q}

K_{i Q}

and

K_{j Q}

according to Theorem 1 and Theorem 2.
Step3. Compute

d_{M_{Φ}} (x_{i}, x_{j}) = (K_{i Q} - K_{j Q}) {({K_{P Q}}^{T} K_{P Q})}^{t - 1} ({K_{i Q}}^{T} - {K_{j Q}}^{T})

4. Experiment

4.1. Experimental setup

To verify the effectiveness of algorithm 1, simulation experiments will be conducted on 15 UCI [27] datasets, where the basic information is shown in Table 1.

Table 1. Characteristics of experimental datasets.

	Data sets	Of features	Of instances	Of classes
1	Pima	8	768	2
2	Vehicle	18	846	4
3	German	24	1000	2
4	Segment	18	2310	7
5	Usps	256	9298	10
6	Mnist	784	4000	10
7	Glasses	9	214	6
8	DNA	180	3186	2
9	Heart-Disease	13	270	2
10	Lymphgraphy	18	148	4
11	Liver-Disorders	6	345	2
12	Hages-Roth	4	160	3
13	Ionosphere	34	351	2
14	Spambase	57	4601	2
15	Balance-Scale	4	625	3

Next, some efficient algorithms are described for distance metric learning, and the proposed method is compared with existing excellent classical algorithms. The specific introduction will be given in Table 2. For the KGMML, the setting of t and p will be given in detail in the next subsection.

Table 2. Briefly describes the distance metric learning method used in this paper

	Name	Description
1	Euclidean	The Euclidean distance metric [28].
2	DMLMJ	Distance metric learning through maximization of the Jeffrey divergence [28].
3	LMNN	Large margin nearest neighbor classification [16].
4	GB-LMNN	Non-linear Transformations with Gradient Boosting [29].
5	GMML	Geometric Mean Metric Learning [17].
6	Low-rank	Low-rank geometric mean metric learning [30].
7	KGMML	The kernelized version of GMML

4.2. Parameter Sensitivity Analysis

It can be seen from Algorithm1 that the values of t and p are specified before use. To determine the impact of these two parameters on the KGMML algorithm, the experiments are conducted on five datasets selected from 15 datasets. 5-fold cross-validation is used to choose the best t-value, and the two-step method is used to test different t-values. The above approach is used to find the best t-value, and its precision can be verified in Figure 1 and Figure 2. Firstly, obtain the optimal t in the set

\{0.1, 0.3, 0.5, 0.7, 0.9\}

, and the result is shown in Figure 1. Secondly, test 5 t-values using intervals with a step size of 0.02. As shown in Figure 2, the variation of accuracy in the test interval with a step size of 0.02 is not significant, so we chose the middle t-value of 0.05. When the p-value is 10, the precision has an inflection point in Figure 3, and the precision is higher at the inflection point. Thus, the p- value is chosen as 10. Vary t in the set

\{0.1, 0.3, 0.5, 0.7, 0.9\}

, and p is fixed.

Figure 1. Vary t in the set

\{0.1, 0.3, 0.5, 0.7, 0.9\}

, and p is fixed.

Figure 1. Vary t in the set

\{0.1, 0.3, 0.5, 0.7, 0.9\}

, and p is fixed.

Figure 2. Vary t in the set

\{0.01, 0.03, 0.05, 0.07, 0.09\}

, and p is fixed

Figure 2. Vary t in the set

\{0.01, 0.03, 0.05, 0.07, 0.09\}

, and p is fixed

Figure 3. Vary p in the set

\{0.01, 1, 10, 100, 1000\}

, and t is fixed

Figure 3. Vary p in the set

\{0.01, 1, 10, 100, 1000\}

, and t is fixed

4.3. Experimental Results

The accuracy of each compared algorithm on 15 datasets is shown in Table 3, where the best result on each dataset is shown in boldface. Compared with the other six algorithms, the highest accuracy is achieved from KGMML algorithm on the 8 datasets. All experimental methods are implemented on MATLABR2018b (64-bit), and the simulations are run on a laptop with an Intel Core i5 (2.5GHz) processor.

Table 3. Error rate results are shown on UCI dataset, and the best result is shown in bold.

	Data sets	GMML	DMLMJ	LMNN	GB-LMNN	Educlidean	Low-rank	KGMML
1	Pima	27.66	30.18	33.82	37.14	27.27	29.58	25.17
2	Vehicle	22.09	25.75	46.53	41.24	33.53	41.32	21.21
3	German	27.41	24.79	30.50	29.32	31.53	26.96	24.40
4	Segment	4.13	3.64	5.19	4.55	6.93	5.59	3.22
5	Usps	3.72	2.88	33.11	10.60	10.55	4.08	2.82
6	Mnist	9.65	16.44	86.80	82.62	17.12	75.34	8.32
7	Glasses	36.96	33.08	30.20	23.30	30.23	33.64	32.47
8	DNA	23.65	21.75	22.32	22.84	27.63	26.24	23.05
9	Heart-Disease	20.82	19.73	31.52	18.51	33.33	22.52	18.97
10	Lymphography	56.88	73.62	70.76	60.21	75.13	57.57	53.75
11	Liver-Disorders	35.00	30.05	30.46	34.88	31.88	41.86	30.17
12	Hages-Roth	37.69	16.84	31.33	31.35	16.67	39.55	19.52
13	Ionosphere	15.34	11.27	5.71	4.29	1.43	17.26	11.54
14	Spambase	19.33	18.27	38.60	15.80	16.09	11.43	11.20
15	Balance-Scale	12.84	8.62	12.80	15.20	14.40	12.31	9.19

In order to show the performance advantages of the KGMML algorithm, a score statistic is performed on the KGMML algorithm and other classic algorithms. The scoring process is as follows: assuming that A is the result of using the KGMML algorithm on a certain data set, and B is the result of uniting the KGMML algorithm on this data set (that is, using other algorithms). Firstly, comparative analysis of A and B is performed by using a 5% significance level t test. In a statistical sense, if

A > B

, it is considered that the result A obtained by using the KGMML algorithm on this dataset wins the result B using other algorithms. Thus, the statistical result of the score on this data set is recorded as "1/0/0". If

A < B

, the score statistics result is recorded as "0/0/1". If

A = B

, it means that A and B are the same in the statistical sense. Thus, it is considered that they are tied and recorded as "0/1/0". It is evident that the notation "5/0/0" signifies that the outcomes achieved by the KGMML algorithm outperform those of other algorithms across five datasets. On each data set and the overall score, the statistical results of the KGMML algorithm are listed in Table 4.

Table 4. The score statistics of the comparison between the results obtained by the KGMML algorithm and other classical algorithms.

Datasets	KGMML
Datasets	GMML	DMLMJ	LMNN	GB-LMNN	Educlidean	Low-rank
Pima	1/0/0	1/0/0	1/0/0	1/0/0	1/0/0	1/0/0
Vehicle	0/1/0	1/0/0	1/0/0	1/0/0	1/0/0	1/0/0
German	1/0/0	0/1/0	1/0/0	1/0/0	1/0/0	1/0/0
Segment	1/0/0	0/1/0	1/0/0	1/0/0	1/0/0	1/0/0
Usps	1/0/0	0/1/0	1/0/0	1/0/0	1/0/0	1/0/0
Mnist	0/1/0	1/0/0	1/0/0	1/0/0	1/0/0	1/0/0
Glasses	1/0/0	0/1/0	0/0/1	0/0/1	0/1/0	1/0/0
DNA	0/1/0	0/0/1	0/0/1	0/1/0	1/0/0	1/0/0
Heart-Disease	1/0/0	0/1/0	1/0/0	0/1/0	1/0/0	1/0/0
Lymphography	1/0/0	1/0/0	1/0/0	1/0/0	1/0/0	1/0/0
Liver-Disorders	1/0/0	0/1/0	0/1/0	1/0/0	0/1/0	1/0/0
Hages-Roth	1/0/0	0/0/1	1/0/0	1/0/0	0/0/1	1/0/0
Ionosphere	1/0/0	0/1/0	0/0/1	0/0/1	0/0/1	1/0/0
Spambase	1/0/0	1/0/0	1/0/0	1/0/0	1/0/0	0/1/0
Balance	1/0/0	0/1/0	1/0/0	1/0/0	1/0/0	1/0/0
Total	12/3/0	5/8/2	11/1/3	9/2/2	11/2/2	14/1/0

5. Conclusions

Kernel geometric mean metric learning is proposed for nonlinear distance metric with the introduction of a kernel function. Traditional metric learning approaches aim to learn a global linear metric, which is not well-suited for nonlinear problems. The experimental results on the UCI dataset show that the algorithm can effectively improve the accuracy of GMML algorithm, and the nonlinear problems can be addressed by the proposed algorithm. In future work, the problem of the inaccurate similarity pair will be tried to improve where it exists in the kernel geometric mean metric learning algorithm. The partial labeling metric learning algorithm has been proposed in recent years, and a partial labeling algorithm based on kernel geometric mean metric learning will be proposed in the future.

Author Contributions

Conceptualization, J.H. and R.Z.; methodology, J.H.; software, Z.F. and T.Y.; validation, Z.F., T.Y. and Y.Z.; formal analysis, Z.F.; investigation, Y.Z.; resources, J.H.; data curation, R.Z.; writing—original draft preparation, Z.F.; writing—review and editing, Z.F.; visualization, Z.F.; supervision, R.Z.; project administration, J.H.; funding acquisition, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (62102062), the Humanities and Social Science Research Project of Ministry of Education(21YJCZH037), the Natural Science Foundation of Liaoning Province (2020-MS-134, 2020-MZLH-29).

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Acknowledgments

The authors would like to thank the School of Information and Communication Engineering, Dalian Minzu University for assistance with simulation verifications related to this work.

Conflicts of Interest

The authors declare no conflict of interest.

References

Lu J, Wang R, Mian A, et al. Distance metric learning for pattern recognition[J]. Pattern recognition, 2018, 75: 1-3. [CrossRef]
Wei Z, Cui Y, Zhou X, et al. A research on metric learning in computer vision and pattern recognition[C]//2018 Tenth International Conference on Advanced Computational Intelligence (ICACI). IEEE, 2018: 254-259. [CrossRef]
Yan Y, Xia J, Sun D, et al. Research on combination evaluation of operational stability of energy industry innovation ecosystem based on machine learning and data mining algorithms[J]. Energy Reports, 2022, 8: 4641-4648. [CrossRef]
Wang F, Sun J. Survey on distance metric learning and dimensionality reduction in data mining[J]. Data mining and knowledge discovery, 2015, 29(2): 534-564. [CrossRef]
Yan M, Zhang Y, Wang H. Tree-Based Metric Learning for Distance Computation in Data Mining[C]//Asia-Pacific Web Conference. Cham: Springer International Publishing, 2015: 377-388. [CrossRef]
Mojisola F O, Misra S, Febisola C F, et al. An improved random bit-stuffing technique with a modified RSA algorithm for resisting attacks in information security (RBMRSA)[J]. Egyptian Informatics Journal, 2022, 23(2): 291-301. [CrossRef]
Kraeva I, Yakhyaeva G. Application of the metric learning for security incident playbook recommendation[C]//2021 IEEE 22nd International Conference of Young Professionals in Electron Devices and Materials (EDM). IEEE, 2021: 475-479. [CrossRef]
Bennett J, Pomaznoy M, Singhania A, et al. A metric for evaluating biological information in gene sets and its application to identify co-expressed gene clusters in PBMC[J]. PLoS Computational Biology, 2021, 17(10): e1009459. [CrossRef]
Makrodimitris S, Reinders M J T, Van Ham R C H J. Metric learning on expression data for gene function prediction[J]. Bioinformatics, 2020, 36(4): 1182-1190. [CrossRef]
Yuan T, Dong L, Liu B, et al. Deep Metric Learning by Exploring Confusing Triplet Embeddings for COVID-19 Medical Images Diagnosis[C]//Workshop on Healthcare AI and COVID-19. PMLR, 2022: 1-10.
Jin Y, Lu H, Li Z, et al. A cross-modal deep metric learning model for disease diagnosis based on chest x-ray images[J]. Multimedia Tools and Applications, 2023: 1-22. [CrossRef]
Xing Y, Meyer B J, Harandi M, et al. Multimorbidity Content-Based Medical Image Retrieval and Disease Recognition Using Multi-label Proxy Metric Learning[J]. IEEE Access, 2023. [CrossRef]
Xing E, Jordan M, Russell S J, et al. Distance metric learning with application to clustering with side-information[J]. Advances in neural information processing systems, 2002, 15.
Davis J V, Kulis B, Jain P, et al. Information-theoretic metric learning[C]//Proceedings of the 24th international conference on Machine learning. 2007: 209-216. [CrossRef]
Wang S, Jin R. An information geometry approach for distance metric learning[C]//Artificial intelligence and statistics. PMLR, 2009: 591-598.
Weinberger K Q, Saul L K. Distance metric learning for large margin nearest neighbor classification[J]. Journal of machine learning research, 2009, 10(2).
Zadeh P, Hosseini R, Sra S. Geometric mean metric learning[C]//International conference on machine learning. PMLR, 2016: 2464-2471.
Zhou Y, Gu H. Geometric mean metric learning for partial label data[J]. Neurocomputing, 2018, 275: 394-402.
Mika S, Ratsch G, Weston J, et al. Fisher discriminant analysis with kernels[C]//Neural networks for signal processing IX: Proceedings of the 1999 IEEE signal processing society workshop (cat. no. 98th8468). Ieee, 1999: 41-48. [CrossRef]
Li Z, Kruger U, Xie L, et al. Adaptive KPCA modeling of nonlinear systems[J]. IEEE Transactions on Signal Processing, 2015, 63(9): 2364-2376. [CrossRef]
Lee J M, Qin S J, Lee I B. Fault detection of non-linear processes using kernel independent component analysis[J]. The Canadian Journal of Chemical Engineering, 2007, 85(4): 526-536. [CrossRef]
Zhang L, Zhou W D, Jiao L C. Kernel clustering algorithm[J]. CHINESE JOURNAL OF COMPUTERS-CHINESE EDITION-, 2002, 25(6): 587-590.
Choi H, Choi S. Kernel isomap[J]. Electronics letters, 2004, 40(25): 1612-1613. [CrossRef]
Fasi M, Iannazzo B. Computing the weighted geometric mean of two large-scale matrices and its inverse times a vector[J]. SIAM Journal on Matrix Analysis and Applications, 2018, 39(1): 178-203. [CrossRef]
Hager W W. Updating the inverse of a matrix[J]. SIAM review, 1989, 31(2): 221-239. [CrossRef]
Bhatia R. Positive definite matrices[M]. Princeton university press, 2009.
Asuncion A, Newman D. UCI machine learning repository[J]. 2007.
Nguyen B, Morell C, De Baets B. Supervised distance metric learning through maximization of the Jeffrey divergence[J]. Pattern Recognition, 2017, 64: 215-225.
Kedem D, Tyree S, Sha F, et al. Non-linear metric learning[J]. Advances in neural information processing systems, 2012, 25.
Bhutani M, Jawanpuria P, Kasai H, et al. Low-rank geometric mean metric learning[J]. arXiv preprint arXiv:1806.05454, 2018.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

MDPI Initiatives

Important Links

Choose an area of interest and we will send you notifications of new preprints at your preferred frequency.

Disclaimer

Kernel Geometric Mean Metric Learning

Abstract

1. Introduction

2. Preliminaries

3. Main results

3.1. Optimization problem and its solution

3.2. Extension to weighted geometric mean

4. Experiment

4.1. Experimental setup

4.2. Parameter Sensitivity Analysis

4.3. Experimental Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

MDPI Initiatives

Important Links

Subscribe