Convergence Rate of Regularized Regression Associated with Zonal Translation Networks

Preprint

Article

Convergence Rate of Regularized Regression Associated with Zonal Translation Networks

Altmetrics

Downloads

Views

Comments

A peer-reviewed article of this preprint also exists.

Xuexue Ran,

Baohuai Sheng^*,Shuhua Wang

Xuexue Ran,

Baohuai Sheng^*,Shuhua Wang

This version is not peer-reviewed

Submitted:

04 August 2024

Posted:

05 August 2024

You are already at the latest version

Alerts

Abstract

Neural network regularized learning has garnered significant attention in recent years. We give a systematic investigation on the performance of regularized regression associated zonal translation networks. We propose the concept of Marcinkiewicz-Zygmund inequality Setting (MZIS) for the scattered nodes collected from the unit sphere. We show that, under the MZIS, the corresponding convolutional zonal translation network has reproducing property. Based on these facts,we propose a kind of kernel regularized regression learning framework and provide upper bound estimate for the learning rate with the kernel approach. We also give proof for the density of the zonal translation network with spherical Fourier analysis.We provide the approximation error with a K-functional.

Keywords:

Subject: Computer Science and Mathematics - Artificial Intelligence and Machine Learning

MSC: 41A25

1. Introduction

It is known that convolutional neural network provides various models and algorithms to process data model in many fields such as computer vision(see e.g. [1]), natural lagrange processing (see e.g. [2]), and sequence analysis in bio-informatics(see e.g. [3]). The regularized neural network learning has thus become an attractive research topic (see e.g.[4,5,6,7,8,9]). In the present paper, we shall give theory analysis for the convergence rate of regularized regression associated with zonal translation network on the unit sphere.

Let X be a compact subset in the d-dimensional Euclidean space

R^{d}

with the usual norm

∥ x ∥ = \sqrt{\sum_{k = 1}^{d} x_{k}^{2}}

for

x = (x_{1}, x_{2}, \dots, x_{d}) \in R^{d}

and Y be a nonempty closed subset contained in

[- M, M]

for a given

M > 0

. The aim of the regression learning problem is to learn the target function which describes the relationship between the input

x \in X

and the output

y \in Y

from a hypothesis function space. In most of the cases, the target function is offered as a set of observations

z = {z_{i}}_{i = 1}^{m} = {(x_{i}, y_{i})}_{i = 1}^{m} \in Z^{m}

which has been drawn independently and identically distributed (i.i.d.) according to a probability joint distribution (measure)

ρ (x, y) = ρ_{X} (x) ρ (y | x)

Z = X \times Y

, where

ρ (y | x) (x \in X)

is the conditional probability of y for a given x and

ρ_{X} (x)

is the marginal probability about x, i.e.,for every integrable function

φ : X \times Y \to R

there holds

\begin{matrix} \int_{X \times Y} φ (x, y) d ρ = \int_{X} (\int_{Y} φ (x, y) d ρ (y ∣ x)) d ρ_{X} . \end{matrix}

For a given normed space

(B, ∥ \cdot ∥_{B})

consisting of real functions on X we define the regularized learning framework with B as

\begin{matrix} f_{z, λ} : = a r g min_{f \in B} (E_{z} (f) + \frac{λ}{2} {∥ f ∥}_{B}^{2}), \end{matrix}

(1)

where

λ > 0

is the regularization parameter,

E_{z} (f)

is the empirical mean

E_{z} (f) = \frac{1}{m} \sum_{i = 1}^{m} {(y_{i} - f (x_{i}))}^{2} .

The optimal target function is the regression function

f_{ρ} (x) = \int_{Y} y d ρ (y | x)

satisfying

\begin{matrix} f_{ρ} = inf_{f} E_{ρ} (f), \end{matrix}

where

E_{ρ} (f) = \int_{Z} {(y - f (x))}^{2} d ρ

and the inf is taken over all the

ρ_{X}

-measurable functions f. Moreover, there holds the famous equality (see e.g.[10])

\begin{matrix} ∥ f - f_{ρ} ∥_{L^{2} (ρ_{X})}^{2} = E_{ρ} (f) - E_{ρ} (f_{ρ}), f \in L^{2} (ρ_{X}) . \end{matrix}

(2)

The choices for the hypothesis space B in (1) are riches. For example, C.P.An et al choose the algebraic polynomial class as B (see [11,12,13],). In [14], C. De Mol et al chose the dictionary as B. Recently some papers chose the Sobolev space as the hypothesis space B (see [15,16]). By kernel method we mean traditionally replacing B with a reproducing kernel Hilbert space (RKHS)

(H_{K}, {〈 \cdot, \cdot 〉}_{K})

which is a Hilbert space consisting of real functions defined on X and there is a Mercer kernel

K_{x} (y) = K (x, y)

X \times X

(i.e.,

K_{x} (y)

is a continuous and symmetric function on

X \times X

and for any

n \geq 1

and any

{x_{1}, x_{2}, \dots, x_{n}} \subset X

the Mercer matrices

{(K (x_{i}, x_{j}))}_{i, j = 1, 2, \dots, n}

are positive semi-definite) such that

\begin{matrix} f (x) = {〈 f, K_{x} 〉}_{K}, \forall x \in X, \forall f \in H_{K} . \end{matrix}

(3)

and there hold the embeded inequality

\begin{matrix} | f (x) | \leq {c ∥ f ∥}_{K}, \forall x \in X, \forall f \in H_{K}, \end{matrix}

(4)

where c is a constant independent of f and x. There are two results about the optimal solution

f_{z, λ}

. The reproducing property (3) yields the representation

\begin{matrix} f_{z, λ} (x) = \sum_{k = 1}^{m} c_{k} K_{x_{k}} (x), \forall x \in X . \end{matrix}

(5)

The embeded inequality (4) yields the inequality

\begin{matrix} | f_{z, λ} (x) | \leq \sqrt{\frac{2 M}{λ}} \forall x \in X . \end{matrix}

(6)

Representation (5) is the theory basis for kernel regularized regression (see e.g.[17,18]). Inequality (6) is the key inequality for bounding the learning rate with covering number method (see e.g.[19,20,21]). For other skills of the kernel method one can cite [10,22,23,24] et al.

It is particularly important to mention here that translation networks have recently been used for the hypothesis space of regularized learning (see e.g. [25,26]). From the view of approximation theory, a simple single layer translation network with m neurons is a function space produced by translating a given function

ϕ

which can be written as

\begin{matrix} Δ_{ϕ, \bar{X}}^{X} = {\sum_{i = 1}^{m} c_{i} T_{x_{i}} (ϕ, \cdot) : c_{i} \in R, x_{i} \in X, i = 1, 2, \dots, m}, \end{matrix}

where

\bar{X} = {x_{i}}_{i = 1}^{m} \subset X

is a given node set, and for a given

x \in X

T_{x} (ϕ, y)

is a translation operator corresponding to X. For example, when

X = R^{d}

and

X = {[- π, π]}^{d}

, we choose

T_{x} (ϕ, y)

as the usual convolution translation operator

ϕ (x - y)

for the

ϕ

defined on

R^{d}

or an

2 π

-periodic function

ϕ

(see [27,28]. When

X = S^{d - 1} = {x \in R^{d} : ∥ x ∥ = 1}

is the unit sphere in

R^{d}

, one can choose

T_{x} (ϕ, y)

as the zonal translation operator

ϕ (x y)

for a given

ϕ

defined on the interval

[- 1, 1]

(see [29]). In [30], we defined a kind of translation operator

T_{x} (ϕ, y)

for

X = [- 1, 1]

It is easy to see that the approximation ability and the construction of a translation network depend upon the nodes set

\bar{X} = {x_{i}}_{i = 1}^{m}

(see e.g. [31,32,33]). On the other hand, according to the view of [34], the quadrature rule and the Marcinkiewicz-Zygmund (M-Z) inequality associated with

\bar{X}

also influence the construction of the translation network

Δ_{ϕ, \bar{X}}

. For a bounded closed set

Ω \subset R^{d}

with measure

d ω

satisfying

\int_{Ω} d ω = V < + \infty

. We denote by

P_{n} \subset L^{2} (Ω)

the linear space of polynomials on

Ω

of degree at most n, equipped with the

L^{2}

-inner product

〈 v, z 〉 = \int_{Ω} v z d ω .

The m-point quadrature rule (QR) is

\begin{matrix} \int_{Ω} g d ω \approx \sum_{j = 1}^{m} w_{j} g (x_{j}), \end{matrix}

(7)

where

\bar{X} = {x_{j}}_{j = 1}^{m} \subset Ω

and weights

w_{j}

are all positive for

j = 1, 2, \dots, m .

We say the QR (7) has polynomial exactness n if

\begin{matrix} \int_{Ω} g d ω = \sum_{j = 1}^{m} w_{j} g (x_{j}), \forall g \in P_{n} . \end{matrix}

(8)

The Marcinkiewicz-Zygmund (MZ) inequality based on a set

\bar{X} = {x_{j}}_{j = 1}^{m} \subset Ω

\begin{matrix} {(\sum_{j = 1}^{m} w_{j} {| g (x_{j}) |}^{2})}^{\frac{1}{2}} \sim {(\int_{Ω} {| g (ω) |}^{2} d ω)}^{\frac{1}{2}}, \forall g \in P_{n}, \end{matrix}

(9)

where the weights

w_{j}

in (9) may be not the same as the

w_{j}

in (7) and (8).

Accord to the view of [34], the quadrature rule (QR) follows automatically from the Marcinkiewicz-Zygmund (MZ) inequality. H.N.Mhaskar et al gave a method of transition from MZ inequality to polynomial exact GR in [35]. So the MZ inequality (9) is an important features for describing the nodes set

\bar{X}

. Since this reason the node set

\bar{X} = {x_{j}}_{j = 1}^{m} \subset Ω

which yields an MZ inequality is given a specially terminology called Marcinkiewicz-Zygmynd Family (MZF) (see [34,36,37,38]). However, from these literatures we know that the MZFs do not totally coincide with the Lagrange interpolation nodes in the case of

d > 1

. The hyperinterpolations are then developed with the help of exact QR (see [39,40,41,42,43]) and are applied to approximation theory and regularized learning (see e.g.[11,12,13,44]). On the other hand, we find the problem of polynomial exact QR is investigated individually (see e,g, [45,46]). The concept of spherical t-design was first defined in [47] and is given investigations by many papers subsequently, one can see the classical references [48,49]. We say

T_{t} = {x_{i}}_{i = 1}^{| T_{t} |} \subset S^{d - 1}

is a spherical t-design if

\begin{matrix} \frac{1}{| T_{t} |} \sum_{i = 1}^{| T_{t} |} π (x_{i}) = \frac{1}{ω_{d - 1}} \int_{S^{d - 1}} π (x) d ω (x), \end{matrix}

(10)

where

ω_{d - 1}

is the volume of

S^{d - 1}

and

π (x)

is a spherical polynomial with degree t. Moreover, in many applications the polynomial exact QR and the MZFS has been used as assumptions. For example, C.P.An et al gave approximation order for the hyperinterpolation approximation under the assumptions that (8), (10) and the MZ inequality (9) hold (see [50,51]). Also, in [25], Lin et al gave investigations on regularized regression associated with zonal translation network by assuming the nodes set

\bar{X} = {x_{j}}_{j = 1}^{m} \subset S^{d - 1}

is a type of spherical t-design.

Polynomial exact QR is also a good tool in approximation theory. For example, we had ever used it to bound the norm for some Mercer matrices (see [52,53,54,55]). In particular, H.N.Mhaskar et al use polynomial exact QR to construct the first periodic translation operators (see [27]) and the zonal translation network operators (see [29]). Along this line, the translation operators defined on the unit ball, on the Euclidean space

R^{d}

and on the interval

[- 1, 1]

are constructed (see [28,30,56]).

Above investigations encourage us to use (8), (10) and (9) as hypothetical conditions to describe the approximation ability of zonal translation networks

Δ_{ϕ, \bar{X}}^{X}

. To ensure the single layer translation network

Δ_{ϕ, \bar{X}}^{X}

can approximation the constant function,

Δ_{ϕ, \bar{X}}^{X}

is modified as

\begin{matrix} N_{ϕ, \bar{X}}^{X} = {\sum_{i = 1}^{m} c_{i} T_{x_{i}} (ϕ, \cdot) + c_{0} : c_{0}, c_{i} \in R, x_{i} \in X, i = 1, 2, \dots, m} . \end{matrix}

(11)

In the case of

T_{y} (ϕ, y) = σ (w^{⊤} x + b)

and

w, x \in R^{d}, b \in R

, R.D.Nowak et al used (11) to design regularized learning frameworks (see [57]). An algorithm is provided by S.B.Lin et al [26] for designing such kind of network and is applied to construct regularized learning algorithms. In [5]

N_{ϕ, \bar{X}}^{X}

is used to construct deep neural network learning frameworks. The same type of investigations are given in [58,59] and [60].

In the present paper, we shall design the translation networks

N_{ϕ, \bar{X}}^{X}

by taking

X = S^{d - 1}

, assuming

\bar{X} = Ω^{(m)} = {x_{j}}_{j = 1}^{m} \subset S^{d - 1}

satisfies equalities (8) and (9), and choosing the zonal translation

T_{x} (ϕ, y) = ϕ (x y)

with

ϕ

being a given integrable function

ϕ

[- 1, 1]

. Under these assumptions we shall provide a learning framework with

N_{ϕ, Ω^{(m)}}^{S^{d - 1}}

as the hypothesis space and give error analysis.

The contributions of the present paper are two folds. First, after absorbing the ideas of [34,36,37,38] and the successful experience of [11,25,27,29,50,51,61,62], we propose the concept of Marcinkiewicz-Zygmund Inequality Setting (MZIS) for the scattered nodes on the sphere unit, based on this as an assumption we show the reproducing property for the convolutional zonal translation network associated with the scattered nodes

Ω^{(m)}

. Second, we give investigation on the kernel regularized neural network learning by combining classical the kernel approach and the convex analysis method, according to this method, the convergence rate given can be dimensional independent and capacity independent. Since the translation networks are produced by the zonal translations of convolutional kernels, we call them the convolutional zonal translation networks.

The paper is organized as follows. In Section 2 we first show the density for the zonal translations class and then show the reproducing property for the translation network

Δ_{ϕ, Ω^{(m)}}^{S^{d - 1}}

. In Section 3, we shall provide some results in the present paper, for example, a new regression learning framework and a learning setting, the error decomposition for the error analysis,and an estimate for the convergence rate. In Section 4, we shall give some lemmas which are used to prove the main results. The proofs for all the theorems and propositions are given in Section 5.

Throughout the paper, we write

A = O (B)

if there is a positive constant C independent of A and B such that

A \leq C B

.In particular, by

A = O (1)

we show A is a bounded quantity. We write

A \sim B

if both

A = O (B)

and

B = O (A)

2. Some Properties of the Translation Network on the Unit Sphere

Let

ϕ \in L_{W_{η}}^{1} = {{ϕ : ∥ ϕ ∥}_{1, W_{η}} = \int_{- 1}^{1} | ϕ (x) | W_{η} (x) d x < + \infty}, W_{η} (x) = {(1 - x^{2})}^{η - \frac{1}{2}}, η > - \frac{1}{2}

. Then H.N.Mhaskar et al constructed in [29] a sequence of approximation operators to show that the zonal translation class

\begin{matrix} Δ_{ϕ}^{S^{d - 1}} & = & c l {ϕ (x y) : y \in S^{d - 1}} \cup {1} \\ = & c l {\sum_{i = 1}^{m} c_{i} ϕ (x_{i} \cdot) + c_{0} : c_{0}, c_{i} \in R, x_{i} \in S^{d - 1}, i = 1, 2, \dots, m; m = 1, 2, \dots} \end{matrix}

is density in

L^{p} (S^{d - 1}) (1 \leq p \leq + \infty)

\hat{a_{l}^{η} (ϕ)} \neq 0

for all

l = 0, 1, 2, \dots,

where

\hat{a_{l}^{η} (ϕ)} = c_{η} \int_{- 1}^{1} ϕ (x) \frac{C_{l}^{η} (x)}{C_{l}^{η} (1)} W_{η} (x) d x, η = \frac{d - 2}{2}

and

C_{n}^{η} (x) = p_{n}^{(η - \frac{1}{2}, η - \frac{1}{2})} (x)

is the n-th Legendre polynomial satisfies the orthogonal relation

\begin{matrix} c_{η} \int_{- 1}^{1} C_{n}^{η} (x) C_{m}^{η} (x) W_{η} (x) d x = h_{n}^{η} δ_{n, m} . \end{matrix}

with

c_{η} = \frac{Γ (η + 1)}{Γ (\frac{1}{2}) Γ (η + \frac{1}{2})}

h_{n}^{η} = \frac{η}{n + η} C_{n}^{η} (1)

and it is known that (see it from (B.2.1),(B.2.2)and (B.5.1) of [63])

C_{n}^{η} (x) \leq C_{n}^{η} (1) = n^{2 η - 1}, x \in [- 1, 1] .

It follows that

\begin{matrix} ϕ (t) = \sum_{l = 0}^{+ \infty} \hat{a_{l}^{η} (ϕ)} \frac{n + η}{η} C_{l}^{η} (t) = \sum_{l = 0}^{+ \infty} \hat{a_{l}^{η} (ϕ)} Z_{l}^{η} (t), \end{matrix}

(12)

where

Z_{l}^{η} (t) = \frac{l + η}{η} C_{l}^{η} (t), η = \frac{d - 2}{2}

For a given real number

p \geq 1

,we denote by

L^{p} (ρ_{X})

the class of

ρ_{X}

-measurable functions f satisfying

{∥ f ∥}_{L^{p} (ρ_{X})} = (\int_{X} {| f (x) |}^{p} d ρ_{X}) < + \infty

Let

P_{n}^{d}

denote the space of all homogeneous polynomials of degree n in d variables. We denote by

L^{p} (S^{d - 1})

the class of all measurable functions defined on

S^{d - 1}

with the finite norm

\begin{matrix} {∥ f ∥}_{p, S^{d - 1}} = \{\begin{matrix} {(\int_{S^{d - 1}} {| f (x) |}^{p} d σ (x))}^{\frac{1}{p}}, & 1 \leq p < + \infty, \\ max_{x \in S^{d - 1}} | f (x) |, & p = + \infty, \end{matrix} \end{matrix}

and for

p = + \infty

we assume that

L^{+ \infty} (S^{d - 1})

is the space

C (S^{d - 1})

of continuous functions on

S^{d - 1}

with the uniform norm.

For a given integer

n \geq 0

, the restriction to

S^{d - 1}

of a homogeneous harmonic polynomial of degree n is called a spherical harmonics of degree n. If

Y \in P_{n}^{d}

, then

Y (x) = {∥ x ∥}^{n} Y (x^{'}), x^{'} = \frac{x}{∥ x ∥}

, so that Y is determined by its restriction on the unit sphere. Let

H_{n} (S^{d - 1})

denote the space of spherical harmonics of degree n. Then

dim H_{n} (S^{d - 1}) = (\begin{matrix} n + d - 2 \\ n \end{matrix}) + (\begin{matrix} n + d - 3 \\ n - 1 \end{matrix}), n = 1, 2, 3, \dots,

Spherical harmonics of different degrees are orthogonal on the unit sphere. For further properties about harmonics one can refer to [64].

For

n = 0, 1, 2, \dots,

let

{Y_{l}^{n} : 1 \leq l \leq dim H_{n} (S^{d - 1})}

be an orthonormal basis of

H_{n} (S^{d - 1})

. Then

\begin{matrix} \frac{1}{ω_{d - 1}} \int_{S^{d - 1}} Y_{l}^{n} (ξ) Y_{l^{'}}^{m} (ξ) d σ (ξ) = δ_{l, l^{'}} δ_{m, n}, \end{matrix}

where

ω_{d - 1}

denotes the surface area of

S^{d - 1}

and

ω_{d - 1} = \frac{2 π^{\frac{d}{2}}}{Γ (\frac{d}{2})}

. Furthermore, by (1.2.8) in [63] we have

\begin{matrix} \sum_{j = 1}^{dim H_{n} (S^{d - 1})} Y_{l}^{n} (x) Y_{l}^{n} (y) = \frac{n + η}{η} C_{n}^{η} (x y), x, y \in S^{d - 1}, \end{matrix}

(13)

where

C_{n}^{η} (t)

is the n-th generalized Legendre polynomial the same as in (12). Combining (12) and (13) we have

\begin{matrix} ϕ (x y) = \sum_{l = 0}^{+ \infty} \hat{a_{l}^{η} (ϕ)} \frac{l + η}{η} C_{l}^{η} (x \cdot y) = \sum_{l = 0}^{+ \infty} \hat{a_{l}^{η} (ϕ)} \sum_{k = 1}^{dim H_{l} (S^{d - 1})} Y_{k}^{l} (x) Y_{k}^{l} (y), x, y \in S^{d - 1} . \end{matrix}

(14)

2.1. Density

We first give a general discrimination method for density.

Proposition 2.1. (see Lemma 1 in Chapter 18 of [65]) For a subset V in a normed linear space E, the following two properties are equivalent:

(a) V is fundamental in E (that is, its linear span is dense in E).

(b)

V^{⊥} = {0}

( that is, 0 is the only element of

E^{*}

that annihilates V).

Based on this proposition, we can show the density of

Δ_{ϕ}^{S^{d - 1}}

Theorem 2.1. Let

ϕ \in L_{W_{η}}^{2}

satisfy

\hat{a_{l}^{η} (ϕ)} > 0

for all

l = 0, 1, 2, \dots .

Then

Δ_{ϕ}^{S^{d - 1}}

is dense in

L^{2} (S^{d - 1})

Proof. See the proof in Section 5.

2.2. Reproducing Property

We first restate a proposition.

Proposition 2.2. For any given

n \geq 1

there exist a finite subset

Ω^{(n)} \subset S^{d - 1}

and corresponding positive numbers

{μ_{ω} : ω \in Ω^{(n)}}

such that

\begin{matrix} \int_{S^{d - 1}} f (x) d σ (x) = \sum_{ω \in Ω^{(n)}} μ_{ω}^{(n)} f (ω), f \in H_{3 n} (S^{d - 1}), \end{matrix}

(15)

and for some

1 \leq p < + \infty

\begin{matrix} \int_{S^{d - 1}} {| f (x) |}^{p} d σ (x) \sim \sum_{ω \in Ω^{(n)}} μ_{ω}^{(n)} {| f (ω) |}^{p}, f \in H_{n} (S^{d - 1}) . \end{matrix}

(16)

Moreover, for any

m \geq n

and

p \geq 1

there exists a constant

c_{p, d} > 0

such that

\begin{matrix} \sum_{ω \in Ω^{(n)}} μ_{ω}^{(n)} {| f (ω) |}^{p} \leq c_{p, d} {(\frac{m}{n})}^{d - 1} \int_{S^{d - 1}} {| f (x) |}^{p} d σ (x), f \in H_{m} (S^{d - 1}) . \end{matrix}

(17)

Proof. (15)-(16) were proved by H.N.Mhaskar et al in [61] and now have been extended to other domains (see e.g.[66,67]). Inequality (17) is proved by [68].

(15)-(16) show the existence of

Ω^{(n)} \subset S^{d - 1}

which has the polynomial exact QR (8) and satisfies MZ inequality (9). Inequality (17) often goes with (15) and (16) and are needed in discussing approximation order(see e.g. [62]).

Based above analysis we propose the following definition.

Definition 2.1(Marcinkiewicz-Zygmund inequalities setting (MZIS)). We say a given finite node set

Ω^{(n)} \subset S^{d - 1}

forms a Marcinkiewicz-Zygmund inequality setting if (15)-(16) and (17) simultaneously hold.

Let

ϕ \in L_{W_{η}}^{2}

. For a given finite set

Ω^{(n)} \subset S^{d - 1}

satisfy Definition 2.1 and the positive numbers

{μ_{ω} : ω \in Ω^{(n)}}

are defined as in Proposition 2.2. We define a zonal translation network by

\begin{matrix} H_{Ω^{(n)}} : = c l {f (x) = \sum_{x_{k} \in Ω^{(n)}} c_{k} μ_{k}^{(n)} T_{x} (ϕ) (x_{k}) + c_{0} : c_{k} \in R, k = 0, 1, 2, \dots, | Ω^{(n)} |}, \end{matrix}

where

T_{x} (ϕ) (y) = ϕ (x y)

. Then it is easy to see that

\begin{matrix} H_{Ω^{(n)}} = H_{ϕ}^{(n)} ⨁ R, \end{matrix}

where for

A, B \subset R

and

A ⋂ B = {0}

we define

A ⨁ B = {a + b : a \in A, b \in B}

and

\begin{matrix} H_{ϕ}^{(n)} : = c l {f (x) = \sum_{x_{k} \in Ω^{(n)}} c_{k} μ_{k}^{(n)} T_{x} (ϕ) (x_{k}) : c_{k} \in R, k = 0, 1, 2, \dots, | Ω^{(n)} |} . \end{matrix}

For

f (x) = \sum_{x_{k} \in Ω^{(n)}} c_{k} μ_{k}^{(n)} T_{x} (ϕ) (x_{k}), g (x) = \sum_{x_{k} \in Ω^{(n)}} d_{k} μ_{k}^{(n)} T_{x} (ϕ) (x_{k})

we define a bivariate operation as

\begin{matrix} {〈 f, g 〉}_{ϕ} = \sum_{x_{k} \in Ω^{(n)}} c_{k} d_{k} μ_{k}^{(n)} \end{matrix}

and

\begin{matrix} {∥ f ∥}_{ϕ} = {(\sum_{x_{k} \in Ω^{(n)}} μ_{k}^{(n)} {| c_{k} |}^{2})}^{\frac{1}{2}} . \end{matrix}

Since (14), we have by Theorem 4 in Chapter 17 of [65] that the matrix

{(T_{x_{i}} (ϕ, x_{j}))}_{i, j = 1, 2, \dots, | Ω^{(n)} |}

is positive definite for a given n. It follows that the vector

c = {μ_{k}^{(n)} c_{k}}_{x_{k} \in Ω^{(n)}}

is uniqueness. Then

(H_{ϕ}^{(n)}, ∥ \cdot ∥_{ϕ})

is a Hilbert space which is isometric isomorphism with

l^{2} (Ω^{(n)})

,where

\begin{matrix} l^{2} (Ω^{(n)}) = {c = {μ_{k}^{(n)} c_{k}}_{x_{k} \in Ω^{(n)}} {: ∥ c ∥}_{l^{2} (Ω^{(n)})} = {(\sum_{x_{k} \in Ω^{(n)}} μ_{k}^{(n)} {| c_{k} |}^{2})}^{\frac{1}{2}} < + \infty} . \end{matrix}

Let

C ([- 1, 1])

be the set of all continuous functions defined on

[- 1, 1]

and

{∥ ϕ ∥}_{C ([- 1, 1])} = sup_{x \in [- 1, 1]} | ϕ (x) | .

We then have the following proposition.

Proposition 2.3. Let

ϕ \in C [- 1, 1]

satisfy

\hat{a_{n}^{η} (ϕ)} > 0

for all

n \geq 1

and

Ω^{(n)} \subset S^{d - 1}

be an MZIS. Then

(H_{ϕ}^{(n)}, ∥ \cdot ∥_{ϕ})

is a reproducing kernel Hilbert space associated with kernel

\begin{matrix} K_{x}^{*} (ϕ) (y) = K^{*} (ϕ, x, y) = \sum_{x_{k} \in Ω^{(n)}} μ_{k}^{(n)} T_{x_{k}} (ϕ, x) T_{x_{k}} (ϕ, y), \end{matrix}

(18)

i.e.,

\begin{matrix} f (x) = {〈f, K_{x}^{*} (ϕ) (\cdot)〉}_{ϕ} \forall x \in S^{d - 1}, \forall f \in H_{ϕ}^{(n)} \end{matrix}

(19)

and there is a constant

k^{*} > 0

such that

\begin{matrix} | f (x) | \leq k^{*} {∥ f ∥}_{ϕ} \forall f \in H_{ϕ}^{(n)}, \forall x \in S^{d - 1} . \end{matrix}

(20)

Proof. See the proof in Section 5.

Corollary 2.1. Under the assumption of Proposition 2.3,

H_{Ω^{(n)}}

is a reproducing kernel Hilbert space associated with the inner product defined by

\begin{matrix} {〈 f, g 〉}_{H_{Ω^{(n)}}} = {〈 f_{1}, g_{1} 〉}_{ϕ} + c_{0} d_{0}, \\ w h e r e \\ f (x) = f_{1} (x) + c_{0}, g (x) = g_{1} (x) + d_{0}, \\ f_{1} (x) = \sum_{x_{k} \in Ω^{(n)}} c_{k} μ_{k}^{(n)} T_{x} (ϕ) (x_{k}), g_{1} (x) = \sum_{x_{k} \in Ω^{(n)}} d_{k} μ_{k}^{(n)} T_{x} (ϕ) (x_{k})} . \end{matrix}

and the corresponding reproducing kernel

K_{x} (ϕ) (y)

\begin{matrix} K_{x} (ϕ) (y) = K_{x}^{*} (ϕ) (y) + 1, x, y \in S^{d - 1} . \end{matrix}

Furthermore, there is a constant

κ > 0

such that

\begin{matrix} | f (x) | \leq {κ ∥ f ∥}_{H_{Ω^{(n)}}}, \forall f \in_{H_{Ω^{(n)}}}, \forall x \in S^{d - 1} . \end{matrix}

(21)

Proof. The results can be obtained from Proposition 2.3, Lemma 4.2 and the fact that the real set R is a reproducing kernel Hilbert space whose reproducing kernel is 1 and the inner product is the usual product for two real numbers.

3. Convergence Rate of the Kernel Regularized Regression

We now make a combination of the zonal translation network with the kernel regularized regression.

3.1. Learning Framework

For a set of observations

z = {(x_{i}, y_{i})}_{i = 1}^{m}

drawn i.i.d. according to a joint distribution

ρ (x, y)

Z = S^{d - 1} \times Y, Y = [- M, M], M > 0 i s a g i v e n r e a l n u m b e r

, satisfying

ρ (x, y) = ρ (y | x) ρ_{S^{d - 1}} (x)

we define a regularized framework as

\begin{matrix} f_{z, λ}^{Ω^{(m)}} : = a r g min_{f \in H_{Ω^{(m)}}} (E_{z} (f) + \frac{λ}{2} {∥ f ∥}_{H_{Ω^{(m)}}}^{2}), \end{matrix}

(22)

where

λ = λ (m)

is the regularization parameter, and

E_{z} (f) = \frac{1}{m} \sum_{i = 1}^{m} {(y_{i} - f (x_{i}))}^{2} .

The general integral model of (22) is

\begin{matrix} f_{ρ, λ}^{Ω^{(m)}} : = a r g min_{f \in H_{Ω^{(m)}}} (E_{ρ} (f) + \frac{λ}{2} {∥ f ∥}_{H_{Ω^{(m)}}}^{2}), \end{matrix}

(23)

where

E_{ρ} (f) = \int_{Z} {(y - f (x))}^{2} d ρ .

To show the convergence analysis for (22) we need to bound the error

\begin{matrix} ∥ f_{z, λ}^{Ω^{(m)}} - f_{ρ} ∥_{L^{2} (ρ_{S^{d - 1}})}, \end{matrix}

which is an approximation problem, whose convergence rate depends upon the approximation ability of

H_{Ω^{(m)}}

. An error decomposition will be given in Section 3.2.

3.2. Error Decomposition

By (2) and the definition of

f_{ρ, λ}^{Ω^{(m)}}

we have the following decompositions

\begin{matrix} ∥ f_{z, λ}^{Ω^{(m)}} - f_{ρ} ∥_{L^{2} (ρ_{S^{d - 1}})} \\ \leq & ∥ f_{z, λ}^{Ω^{(m)}} - f_{ρ, λ}^{Ω^{(m)}} ∥_{L^{2} (ρ_{S^{d - 1}})} + {∥ f_{ρ, λ}^{Ω^{(m)}} - f_{ρ} ∥}_{L^{2} (ρ_{S^{d - 1}})} \\ = & ∥ f_{z, λ}^{Ω^{(m)}} - f_{ρ, λ}^{Ω^{(m)}} ∥_{L^{2} (ρ_{S^{d - 1}})} + \sqrt{E_{ρ} (f_{ρ, λ}^{Ω^{(m)}}) - E_{ρ} (f_{ρ}) + \frac{λ}{2} {∥ f_{ρ, λ}^{Ω^{(m)}} ∥}_{H_{Ω^{(m)}}}^{2}} \\ \leq & ∥ f_{z, λ}^{Ω^{(m)}} - f_{ρ, λ}^{Ω^{(m)}} ∥_{L^{2} (ρ_{S^{d - 1}})} + D^{Ω^{(m)}} {(f_{ρ}, λ)}_{L^{2} (ρ_{S^{d - 1}})}, \end{matrix}

(24)

where

\begin{matrix} D^{Ω^{(m)}} (f_{ρ}, λ) = inf_{g \in H_{Ω^{(m)}}} (∥ g - f_{ρ} ∥_{L^{2} (ρ_{S^{d - 1}})} + \sqrt{\frac{λ}{2}} {∥ g ∥}_{H_{Ω^{(m)}}}) . \end{matrix}

We have used the fact that for

a > 0, b > 0, c > 0

and

0 < p \leq 1

there holds

\begin{matrix} {(a + b + c)}^{p} \leq a^{p} + b^{p} + c^{p} . \end{matrix}

The K-functional

D^{Ω^{(m)}} (f_{ρ}, λ)

shows the approximation error whose decay have been described in [69]. So the main estimate we need to deal with is the sample error

\begin{matrix} ∥ f_{z, λ}^{Ω^{(m)}} - f_{ρ, λ}^{Ω^{(m)}} ∥_{L^{2} (ρ_{S^{d - 1}})} . \end{matrix}

3.3. The Convergence Rate

To give a capacity independent generalization error for algorithm (22), we need some concepts of convex analysis.

G \hat{a} t e a u x

differentiable.Let

(H, ∥ \cdot ∥_{H})

be a Hilbert space,

F (f) : H \to R \cup {\mp \infty}

be a real function. We say F is

G \hat{a} t e a u x

differentiable at

f_{0} \in H

if there is an

ξ \in H

such that for any

g \in H

there holds

\begin{matrix} lim_{t \to 0} \frac{F (f_{0} + t g) - F (f_{0})}{t} = {〈 g, ξ 〉}_{H} \end{matrix}

and write

F_{G}^{'} (f_{0}) = ξ

\nabla_{f} F (f) = ξ

. It known that for a differentiable convex function

F (f)

on H

f_{0} = a r g min_{f \in H} F (f)

if and only if

(\nabla_{f} F (f)) |_{f = f_{0}} = 0

. (see Proposition 17.4 in [70]).

In learning theory, to form an explicit learning rate, one often assume the K-functional has a power convergence rate, i.e., we assume that there exists a

0 < β \leq 1

such that

\begin{matrix} D^{Ω^{(m)}} (f_{ρ}, λ) = O (λ^{β}), λ \to 0^{+} (m \to + \infty) . \end{matrix}

(25)

Since

H_{Ω^{(m)}}

is a reproducing kernel Hilbert space, the decay of

D^{Ω^{(m)}} (f_{ρ}, λ)

can be discussed with the method of [69]. In particular, it is shown in Theorem 2.3 of [71] that one can choose an MZIS

Ω^{(m)} \subset S^{d - 1}, ϕ \in L_{η}^{+ \infty}

and suitable real numbers

α, β

in such a way that

\begin{matrix} D^{Ω^{(m)}} (f_{ρ}, \frac{1}{2^{α m}}) = O (\frac{1}{2^{m β}}) \end{matrix}

\partial_{β} f_{ρ} \in L^{2} (S^{d - 1})

, where we say

\partial_{β} f_{ρ} \in L^{2} (S^{d - 1})

if there is a function

φ \in L^{2} (S^{d - 1})

such that

φ (x) \sim \sum_{l = 0}^{+ \infty} l^{β} Y_{l} (f, x), Y_{l} (f, x) = \sum_{k = 1}^{dim H_{l} (S^{d - 1})} {\hat{f}}_{k, l} Y_{k}^{l} (x), {\hat{f}}_{k, l} = \int_{S^{d - 1}} f (u) Y_{k}^{l} (u) d σ (u),

which shows that (25) is reasonable.

Theorem 3.1.Let

ϕ \in C [- 1, 1]

satisfy

\hat{a_{l}^{η} (ϕ)} > 0

for all

l \geq 0

and

Ω^{(m)} \subset S^{d - 1}

be an MZIS. Then for any

δ \in (0, 1)

, with confidence

1 - δ

, hold

\begin{matrix} ∥ f_{z, λ}^{Ω^{(m)}} - f_{ρ} ∥_{L^{2} (ρ_{S^{d - 1}})} \\ \leq & 4 κ (\frac{M}{λ \sqrt{m}} + \frac{D^{Ω^{(m)}} {(f_{ρ}, λ)}_{L^{2} (ρ_{S^{d - 1}})}}{λ^{\frac{3}{2}} \sqrt{m}}) l o g \frac{4}{δ} + D^{Ω^{(m)}} {(f_{ρ}, λ)}_{L^{2} (ρ_{S^{d - 1}})} . \end{matrix}

(26)

Proof. See the proof in Section 5.

3.4. Conclusion and Discussion

We propose the concept of MZIS for the scattered node set on the unit sphere, with which show that the related convolutional zonal translation network is a reproducing kernel Hilbert space, and show the convergence rate for the kernel regularized least square regression model. We give further analysis to show the advantages of the present paper.

(1)The zonal translation networks which we have chosen is a finite dimensional reproducing kernel Hilbert space, our discussions belong to the scope of kernel method, which is a combination and application of (neural) translation networks to learning theory.

(2)Compared with the existing convergence rate estimate of neural network learning, our upper estimates are dimensional independent (see Theorem in [25], Theorem 3.1 in [72], Theorem 7 in [73], Theorem 1 in [26]).

(3) It is hopeful that the node set in (15)-(16) and (17) may be replaced by a set of uniform distribution observations or a set of random samples (see [72,73]).

(4)The density derivation for the zonal translation network is qualitative, the density is described with the Fourier orthogonal coefficients. We think that this method can be extended to other domains such as the unit ball, the Euclidean space

R^{d}

and

R_{+} = [0, + \infty)

et al.

(5)It is hopeful that with the help of the concept MZIS one may show reproducing property for the deep translation networks and thus give investigations for the performance of deep convolutional translation learning with the kernel method (see e.g.[6,7,59]).

4. Some Lemmas

To prove the main results, we need some lemmas.

Lemma 4.1.Let

(H, ∥ \cdot ∥)

be a Hilbert space and ξ be a random variable on

(Z, ρ)

with values in H and

{z_{i}}_{i = 1}^{m}

be independent samples drawers of ρ.Assume

{∥ ξ ∥}_{H} \leq \tilde{M} < + \infty

almost surely. Then, for any

0 < δ < 1

, with confidence

1 - δ

,it holds

\begin{matrix} ∥ \frac{1}{m} \sum_{i = 1}^{m} ξ (z_{i}) - E (ξ) ∥_{H} \leq \frac{2 \tilde{M} log (\frac{2}{δ})}{\sqrt{m}} \end{matrix}

(27)

Proof. Find it from [74].

Lemma 4.2.Let

(H, ∥ \cdot ∥_{H})

be a Hilbert space over X with respect to kernel K. If

(E, ∥ \cdot ∥_{E})

and

(F, ∥ \cdot ∥_{F})

are closed subspaces of H such that

E ⊥ F

and

E ⨁ F = H

, then

K = L + M

, where L and M are the reproducing kernels of E and F respectively. Moreover, for

h = e + f, e \in E, f \in F

, we have

\begin{matrix} {∥ h ∥}_{H} = {({∥ e ∥}_{E}^{2} + {∥ f ∥}_{F}^{2})}^{\frac{1}{2}} . \end{matrix}

Proof. See Corollary 1 in Chapter 31 of [65] or the Theorem in Section 6 in part I of [75].

Lemma 4.3. There hold the following equalities:

\begin{matrix} \nabla_{f} E_{z} (f) (\cdot) = - \frac{2}{m} \sum_{i = 1}^{m} (y_{i} - f (x_{i})) K_{x_{i}} (ϕ) (\cdot), f \in H_{Ω^{(m)}} . \end{matrix}

(28)

and

\begin{matrix} \nabla_{f} E_{ρ} (f) (\cdot) = - 2 \int_{Z} (y - f (x)) K_{x} (ϕ) (\cdot) d ρ, f \in H_{Ω^{(m)}} . \end{matrix}

(29)

Proof of (28). By the equality

\begin{matrix} a^{2} - b^{2} = 2 (a - b) b + {(a - b)}^{2}, a \in R, b \in R \end{matrix}

(30)

we have

\begin{matrix} lim_{t \to 0} \frac{E_{z} (f + t g) - E_{z} (f)}{t} \\ = & lim_{t \to 0} \frac{\frac{1}{m} \sum_{i = 1}^{m} (- 2 t y_{i} - f (x_{i})) g (x_{i}) + t^{2} g {(x_{i})}^{2}}{t} \\ = & - \frac{2}{m} \sum_{i = 1}^{m} (y_{i} - f (x_{i})) g (x_{i}) \end{matrix}

Since

g (x) = {〈 g, K_{x} (ϕ; \cdot) 〉}_{H_{Ω^{(m)}}}

and the definition of

G \hat{a} t e a u x

derivative, we have by above equality that

\begin{matrix} lim_{t \to 0} \frac{E_{z} (f + t g) - E_{z} (f)}{t} = {〈g, - \frac{2}{m} \sum_{i = 1}^{m} (y_{i} - f (x_{i})) g (x_{i}) K_{x_{i}} (ϕ) (\cdot)〉}_{H_{Ω^{(m)}}} . \end{matrix}

We then have (28). By the same way we can have (29).

Lemma 4.4. Let

(H, {〈 \cdot 〉}_{H})

be a Hilbert space consisting of real functions on X. Then

\begin{matrix} {∥ f ∥}_{H}^{2} - {∥ g ∥}_{H}^{2} = 2 {〈 f - g, g 〉}_{H} + {∥ f - g ∥}_{H}^{2}, \forall f, g \in H \end{matrix}

(31)

and

\begin{matrix} \nabla_{f} {(∥ f ∥}_{H}^{2}) (\cdot) = 2 f (\cdot), \forall f \in H . \end{matrix}

(32)

Proof. (31) is the deformation of the parallelogram formula. (32) can be shown with (31).

Lemma 4.5. (22) has a unique solution

f_{z, λ}^{Ω^{(m)}}

and (23) has a unique solution

f_{ρ, λ}^{Ω^{(m)}}

. Moreover,

(i) There hold the bounds

\begin{matrix} ∥ f_{ρ, λ}^{Ω^{(m)}} ∥_{C (S^{d - 1})} \leq \frac{2 κ D^{Ω^{(m)}} (f_{ρ})}{\sqrt{λ}}, \end{matrix}

(33)

where

κ

is defined as in (21).

(ii) There holds the equality

\begin{matrix} λ f_{z, λ}^{Ω^{(m)}} (\cdot) = \frac{2}{m} \sum_{i = 1}^{m} (y_{i} - f_{z, λ}^{Ω^{(m)}} (x_{i})) K_{x_{i}} (ϕ) (\cdot) \end{matrix}

(34)

and the equality

\begin{matrix} λ f_{ρ, λ}^{Ω^{(m)}} (\cdot) = 2 \int_{Z} (y - f_{ρ, λ}^{Ω^{(m)}} (x)) K_{x} (ϕ) (\cdot) d ρ \end{matrix}

(35)

Proof.Proof of (i). Since

E_{ρ} (f_{ρ, λ}^{Ω^{(m)}}) \geq E_{ρ} (f_{ρ})

, we have by (2) that

\begin{matrix} \frac{λ}{2} {∥ f_{ρ, λ}^{Ω^{(m)}} ∥}_{H_{Ω^{(m)}}}^{2} & \leq & E_{ρ} (f_{ρ, λ}^{Ω^{(m)}}) - E_{ρ} (f_{ρ}) + \frac{λ}{2} {∥ f_{ρ, λ}^{Ω^{(m)}} ∥}_{H_{Ω^{(m)}}}^{2} \\ = & inf_{g \in H_{Ω^{(m)}}} (E_{ρ} (g) - E_{ρ} (f_{ρ}) + \frac{λ}{2} {∥ g ∥}_{H_{Ω^{(m)}}}^{2}) \\ = & inf_{g \in H_{Ω^{(m)}}} (∥ g - f_{ρ} ∥_{L^{2} (ρ_{S^{d - 1}})}^{2} + \frac{λ}{2} {∥ g ∥}_{H_{Ω^{(m)}}}^{2}) . \end{matrix}

(36)

By (36) we have (33).

Proof of (ii). By the definition of

f_{z, λ}^{Ω^{(m)}}

and (32) we have

\begin{matrix} 0 = \nabla_{f} (E_{z} (f) + \frac{λ}{2} {∥ f ∥}_{H_{Ω^{(m)}}}^{2}) |_{f = f_{z, λ}^{Ω^{(m)}}}, \end{matrix}

i.e.,

\begin{matrix} 0 & = & \nabla_{f} E_{z} {(f) |}_{f = f_{z, λ}^{Ω^{(m)}}} + λ \nabla_{f} (\frac{1}{2} {∥ f ∥}_{H_{Ω^{(m)}}}^{2} {) |}_{f = f_{z, λ}^{Ω^{(m)}}} \\ = & - \frac{2}{m} \sum_{i = 1}^{m} (y_{i} - f_{z, λ}^{Ω^{(m)}} (x_{i})) K_{x_{i}} (ϕ) (\cdot) + λ f_{z, λ}^{Ω^{(m)}} (\cdot) . \end{matrix}

Hence, (34) holds. We can prove (35) in the same way.

Lemma 4.6. The solutions

f_{z, λ}^{Ω^{(m)}}

and

f_{ρ, λ}^{Ω^{(m)}}

satisfy the inequality

\begin{matrix} ∥ f_{z, λ}^{Ω^{(m)}} - f_{ρ, λ}^{Ω^{(m)}} ∥_{H_{Ω^{(m)}}} \leq \frac{2 A (z)}{λ}, \end{matrix}

(37)

where

A (z) = ∥ \int_{Z} (y - f_{ρ, λ}^{Ω^{(m)}} (x)) K_{x} (ϕ) (\cdot) d ρ - \frac{1}{m} \sum_{i = 1}^{m} (y_{i} - f_{ρ, λ}^{Ω^{(m)}} (x_{i})) K_{x_{i}} (ϕ) (\cdot) ∥_{H_{Ω^{(m)}}} .

Proof. By (30) we have

\begin{matrix} a^{2} - b^{2} \geq 2 (a - b) b, a \in R, b \in R \end{matrix}

It follows that

\begin{matrix} E_{z} (f_{z, λ}^{Ω^{(m)}}) - E_{z} (f_{ρ, λ}^{Ω^{(m)}}) \\ = & \frac{1}{m} \sum_{i = 1}^{m} [{(y_{i} - f_{z, λ}^{Ω^{(m)}} (x_{i}))}^{2} - {(y_{i} - f_{ρ, λ}^{Ω^{(m)}} (x_{i}))}^{2}] \\ \geq & - \frac{2}{m} \sum_{i = 1}^{m} (y_{i} - f_{ρ, λ}^{Ω^{(m)}} (x_{i})) \times (f_{z, λ}^{Ω^{(m)}} (x_{i}) - f_{ρ, λ}^{Ω^{(m)}} (x_{i})) \\ = & {〈f_{z, λ}^{Ω^{(m)}} - f_{ρ, λ}^{Ω^{(m)}}, - \frac{2}{m} \sum_{i = 1}^{m} (y_{i} - f_{ρ, λ}^{Ω^{(m)}} (x_{i})) K_{x_{i}} (ϕ) (\cdot)〉}_{H_{Ω^{(m)}}}, \end{matrix}

(38)

where we have used the reproducing property

f_{z, λ}^{Ω^{(m)}} (x_{i}) - f_{ρ, λ}^{Ω^{(m)}} (x_{i}) = {〈f_{z, λ}^{Ω^{(m)}} - f_{ρ, λ}^{Ω^{(m)}}, K_{x_{i}} (ϕ) (\cdot)〉}_{H_{Ω^{(m)}}} .

By the definition of

f_{z, λ}^{Ω^{(m)}}

we have

\begin{matrix} 0 & \geq & (E_{z} (f_{z, λ}^{Ω^{(m)}}) + \frac{λ}{2} {∥ f_{z, λ}^{Ω^{(m)}} ∥}_{H_{Ω^{(m)}}}^{2}) - (E_{z} (f_{ρ, λ}^{Ω^{(m)}}) + \frac{λ}{2} {∥ f_{ρ, λ}^{Ω^{(m)}} ∥}_{H_{Ω^{(m)}}}^{2}) . \end{matrix}

On the other hand, by above inequality, (38) and (31) we have

\begin{matrix} 0 & \geq & (E_{z} (f_{z, λ}^{Ω^{(m)}}) - E_{z} (f_{ρ, λ}^{Ω^{(m)}})) + \frac{λ}{2} (∥ f_{z, λ}^{Ω^{(m)}} ∥_{H_{Ω^{(m)}}}^{2} - {∥ f_{ρ, λ}^{Ω^{(m)}} ∥}_{H_{Ω^{(m)}}}^{2}) \\ \geq & {〈f_{z, λ}^{Ω^{(m)}} - f_{ρ, λ}^{Ω^{(m)}}, - \frac{2}{m} \sum_{i = 1}^{m} (y_{i} - f_{ρ, λ}^{Ω^{(m)}} (x_{i})) K_{x_{i}} (ϕ) (\cdot)〉}_{H_{Ω^{(m)}}} \\ + & {〈f_{z, λ}^{Ω^{(m)}} - f_{ρ, λ}^{Ω^{(m)}}, λ j_{q} (f_{ρ, λ}^{Ω^{(m)}})〉}_{H_{Ω^{(m)}}} + λ ∥ f_{z, λ}^{Ω^{(m)}} - f_{ρ, λ}^{Ω^{(m)}} ∥_{H_{Ω^{(m)}}}^{2} . \end{matrix}

Since (35), we have

\begin{matrix} 0 & \geq & 2 〈 f_{z, λ}^{Ω^{(m)}} - f_{ρ, λ}^{Ω^{(m)}}, \int_{Z} (y - f_{ρ, λ}^{Ω^{(m)}} (x)) K_{x} (ϕ) (\cdot) d ρ \\ - \frac{1}{m} \sum_{i = 1}^{m} (y_{i} - f_{ρ, λ}^{Ω^{(m)}} (x_{i})) K_{x_{i}} (ϕ) (\cdot) 〉_{H_{Ω^{(m)}}} + λ ∥ f_{z, λ}^{Ω^{(m)}} - f_{ρ, λ}^{Ω^{(m)}} ∥_{H_{Ω^{(m)}}}^{2} . \end{matrix}

By the Cauchy inequality we have

\begin{matrix} λ ∥ f_{z, λ}^{Ω^{(m)}} - f_{ρ, λ}^{Ω^{(m)}} ∥_{H_{Ω^{(m)}}}^{2} \\ \leq & 2 〈 f_{ρ, λ}^{Ω^{(m)}} - f_{z, λ}^{Ω^{(m)}}, \int_{Z} (y - f_{ρ, λ}^{Ω^{(m)}} (x)) K_{x} (ϕ) (\cdot) d ρ \\ - \frac{1}{m} \sum_{i = 1}^{m} (y_{i} - f_{ρ, λ}^{Ω^{(m)}} (x_{i})) K_{x_{i}} (ϕ) (\cdot) 〉_{H_{Ω^{(m)}}} \\ \leq & ∥ \int_{Z} (y - f_{ρ, λ}^{Ω^{(m)}} (x)) K_{x} (ϕ) (\cdot) d ρ \\ - \frac{1}{m} \sum_{i = 1}^{m} (y_{i} - f_{ρ, λ}^{Ω^{(m)}} (x_{i})) K_{x_{i}} (ϕ) (\cdot) ∥_{H_{Ω^{(m)}}} \times 2 ∥ f_{z, λ}^{Ω^{(m)}} - f_{ρ, λ}^{Ω^{(m)}} ∥_{H_{Ω^{(m)}}} . \end{matrix}

We then have (37).

Lemma 4.7. For the solutions

f_{z, λ}^{Ω^{(m)}}

and

f_{ρ, λ}^{Ω^{(m)}}

we have

\begin{matrix} ∥ f_{z, λ}^{Ω^{(m)}} - f_{ρ, λ}^{Ω^{(m)}} ∥_{H_{Ω^{(m)}}} \leq 4 κ (\frac{M}{λ \sqrt{m}} + \frac{D^{Ω^{(m)}} {(f_{ρ}, λ)}_{L^{2} (ρ_{S^{d - 1}})}}{λ^{\frac{3}{2}} \sqrt{m}}) l o g \frac{4}{δ} . \end{matrix}

(39)

Proof. It is easy to see that

\begin{matrix} A (z) & \leq & ∥ \int_{Z} y K_{x} (ϕ) (\cdot) d ρ - \frac{1}{m} \sum_{i = 1}^{m} y_{i} K_{x_{i}} (ϕ) (\cdot) ∥_{H_{Ω^{(m)}}} \\ + ∥ \int_{Z} f_{ρ, λ}^{Ω^{(m)}} (x) K_{x} (ϕ) (\cdot) d ρ_{S^{d - 1}} - \frac{1}{m} \sum_{i = 1}^{m} f_{ρ, λ}^{Ω^{(m)}} (x_{i}) K_{x_{i}} (ϕ) (\cdot) ∥_{H_{Ω^{(m)}}} \\ = & B (z) + C (z) . \end{matrix}

(40)

Take

ξ (x, y) (\cdot) = y K_{x} (ϕ) (\cdot)

. Then,

\begin{matrix} {∥ ξ (x, y) (\cdot) ∥}_{H_{Ω^{(m)}}} = | y | | K_{x} (ϕ) (x) | \leq M κ . \end{matrix}

(41)

Combining (41) with (27) we have, with confidence

1 - δ

, that

\begin{matrix} B (z) \leq \frac{2 M κ}{\sqrt{m}} l o g \frac{2}{δ} . \end{matrix}

(42)

Also, take

η (x) (\cdot) = f_{ρ, λ}^{Ω^{(m)}} (x) K_{x} (ϕ) (\cdot)

. Then

\begin{matrix} {∥ η (x) ∥}_{H_{Ω^{(m)}}} & = & | f_{ρ, λ}^{Ω^{(m)}} (x) | | K_{x} (ϕ) (x) | \\ \leq \frac{4 κ D^{Ω^{(m)}} {(f_{ρ}, λ)}_{L^{2} (ρ_{S^{d - 1}})}}{\sqrt{λ}} \end{matrix}

(43)

Combing (43) with (27) we have, with confidence

1 - δ

, that

\begin{matrix} C (z) \leq \frac{4 κ D^{Ω^{(m)}} {(f_{ρ}, λ)}_{L^{2} (ρ_{S^{d - 1}})}}{\sqrt{λ m}} l o g \frac{2}{δ} . \end{matrix}

(44)

Collecting (44), (42),(40) and (37), we arrive (39).

5. Proof for Theorems and Propositions

Proof of Theorem

2.1 .

Δ_{ϕ}^{S^{d - 1}}

is not dense in

L^{2} (S^{d - 1})

, i.e.,

c l s p a n (Δ_{ϕ}^{S^{d - 1}}) \neq L^{2} (S^{d - 1}) .

Then by Proposition 2.1 we know

{(c l s p a n (Δ_{ϕ}^{S^{d - 1}}))}^{⊥} \neq {0}

and there is a nonzero functional

F \in L^{2} (S^{d - 1})

such that

F (f) = 0, f \in c l s p a n (Δ_{ϕ}^{S^{d - 1}}) .

It follows that

F (ϕ (\cdot y)) = 0

for all

y \in S^{d - 1}

. By the Riesz representation Theorem, F corresponds to a nonzero

h \in L^{2} (S^{d - 1})

in such a way that

\begin{matrix} F (f) = \int_{S^{d - 1}} f (x) h (x) d σ (x), \forall f \in L^{2} (S^{d - 1}) . \end{matrix}

Consequently,

\int_{S^{d - 1}} ϕ (x y) h (y) d σ (y) = 0, \forall x \in S^{d - 1},

which gives

\begin{matrix} \int_{S^{d - 1}} (\int_{S^{d - 1}} ϕ (x y) h (y) d σ (y)) h (x) d σ (x) = 0 . \end{matrix}

(45)

Combining (45) with (14) we have

\begin{matrix} \sum_{l = 0}^{+ \infty} \hat{a_{l}^{η} (ϕ)} \sum_{k = 1}^{dim H_{l} (S^{d - 1})} {(\int_{S^{d - 1}} h (y) Y_{k}^{l} (y) d σ (y))}^{2} = 0 . \end{matrix}

It follows that

\int_{S^{d - 1}} h (y) Y_{k}^{l} (y) d σ (y) = 0

for all

l \geq 0 .

Therefore,

h = 0 .

We have obtained a contradiction.

Proof of Proposition 2.3. By the the definition of

{〈 \cdot, \cdot 〉}_{ϕ}

and the definition kernel

K_{x}^{*} (ϕ, y)

in (18) we have for any

f (x) = \sum_{x_{k} \in Ω^{(n)}} c_{k} μ_{k}^{(n)} T_{x} (ϕ) (x_{k})

that

\begin{matrix} {〈 f, K_{x}^{*} (ϕ, \cdot) 〉}_{ϕ} = \sum_{x_{k} \in Ω^{(n)}} c_{k} μ_{k}^{(n)} T_{x} (ϕ) (x_{k}) = f (x) . \end{matrix}

(19) then holds. We now show (20). In fact, by the Cauchy’s inequality we have

\begin{matrix} | f (x) | & = & | \sum_{x_{k} \in Ω^{(n)}} c_{k} μ_{k}^{(n)} T_{x} (ϕ) (x_{k}) | \\ \leq & {∥ c ∥}_{L^{2} (Ω^{(n)})} {(\sum_{x_{k} \in Ω^{(n)}} μ_{k}^{(n)} {| T_{x} (ϕ) (x_{k}) |}^{2})}^{\frac{1}{2}} \\ = & {∥ f ∥}_{ϕ} {(\sum_{x_{k} \in Ω^{(n)}} μ_{k}^{(n)} {| ϕ (x_{k} \cdot x) |}^{2})}^{\frac{1}{2}} . \end{matrix}

On the other hand, by the localized kernel theory about the Jacobi polynomials in [76] (see Lemma 2.5 in [76]), for a given n, we have an operator

V_{n} : L_{W_{η}}^{1} \to Π_{n} [- 1, 1]

(where

Π_{n} [- 1, 1]

is the set of all the algebraic polynomials of order

\leq n

defined on

[- 1, 1]

and

η = \frac{d - 2}{2}

) such that

V_{n} (ϕ) \in Π_{2 n} [- 1, 1], V_{n} (g) = g

for any

g \in Π_{n} [- 1, 1]

and there is constant

c > 0

such that

\begin{matrix} ∥ V_{n} {(ϕ) ∥}_{C ([- 1, 1])} \leq {c ∥ ϕ ∥}_{C ([- 1, 1])}, {∥ V_{n} (ϕ) - ϕ ∥}_{C ([- 1, 1])} \leq c E_{n} {(ϕ)}_{C ([- 1, 1])}, \end{matrix}

where

E_{n} {(ϕ)}_{C ([- 1, 1])} = inf_{p \in Π_{n} [- 1, 1]} {∥ ϕ - p ∥}_{C ([- 1, 1])} .

It follows by the Minkowski inequality and inequality (17) that

\begin{matrix} {(\sum_{x_{k} \in Ω^{(n)}} μ_{k}^{(n)} {| ϕ (x_{k} \cdot x) |}^{2})}^{\frac{1}{2}} \\ \leq & {(\sum_{x_{k} \in Ω^{(n)}} μ_{k}^{(n)} {| (ϕ - V_{n} (ϕ)) (x_{k} \cdot x) |}^{2})}^{\frac{1}{2}} + {(\sum_{x_{k} \in Ω^{(n)}} μ_{k}^{(n)} {| V_{n} (ϕ) (x_{k} \cdot x) |}^{2})}^{\frac{1}{2}} \\ \leq & c E_{n} {(ϕ)}_{C ([- 1, 1])} {(\sum_{x_{k} \in Ω^{(n)}} μ_{k}^{(n)})}^{\frac{1}{2}} + O (1) {(\int_{S^{d - 1}} {| V_{n} (ϕ) (x \cdot y) |}^{2} d σ (x))}^{\frac{1}{2}} \\ = & O (1) E_{n} {(ϕ)}_{C ([- 1, 1])} + O (1) {(\int_{- 1}^{1} {| V_{n} (ϕ) (u) |}^{2} W_{η} (u) d u)}^{\frac{1}{2}} \\ = & O (1) . \end{matrix}

(20) thus holds, where we have used the Funk-Hecke formula (see (1.2.11) in [63])

\begin{matrix} \int_{S^{d - 1}} f (x \cdot y) Y_{n} (y) d σ (y) = λ_{n} (f) Y_{n} (x), Y_{n} \in H_{n} (S^{d - 1}) \end{matrix}

and

\begin{matrix} λ_{n} (f) = ω_{d - 1} \int_{- 1}^{1} f (t) \frac{C^{η} (t)}{C^{η} (1)} d t, η = \frac{d - 1}{2} . \end{matrix}

Proof of Theorem 3.2. Since (39) we have by (21) that

\begin{matrix} ∥ f_{z, λ}^{Ω^{(m)}} - f_{ρ, λ}^{Ω^{(m)}} ∥_{L^{2} (ρ_{S^{d - 1}})} \\ \leq & ∥ f_{z, λ}^{Ω^{(m)}} - f_{ρ, λ}^{Ω^{(m)}} ∥_{C (S^{S^{d - 1}})} \\ \leq & κ ∥ f_{z, λ}^{Ω^{(m)}} - f_{ρ, λ}^{Ω^{(m)}} ∥_{H_{Ω^{(m)}}} \\ \leq & 4 κ (\frac{M}{λ \sqrt{m}} + \frac{D^{Ω^{(m)}} {(f_{ρ}, λ)}_{L^{2} (ρ_{S^{d - 1}})}}{λ^{\frac{3}{2}} \sqrt{m}}) l o g \frac{4}{δ} . \end{matrix}

(46)

Taking (46) into (24) we have (26).

Author Contributions

Conceptualization, Ran. X.X.; methodology, Sheng B.H.; validation, Ran. X.X.; formal analysis, Ran. X.X.; resources, Sheng B.H. and Wang S.H.; writing—original draft preparation, Ran. X.X. and Sheng B.H.; writing—review and editing, Sheng B.H. and Wang S.H.; supervision, Sheng B.H.. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China under Grants No.61877039, the NSFC/RGC Joint Research Scheme (Project No. 12061160462 and N_CityU102/20) of China and Natural Science Foundation of Jiangxi Province of China (20232BAB201021).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Communications of the ACM., 2017, 60, 84–90. [Google Scholar] [CrossRef]
Wu, Y.; Schuster, M.; Chen, Z.; Le, Q.V.; Norouzi, M.; Macherey, W.; Cao, Y.; Gao, Q. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv 2016, arXiv:1609.08144v2 [cs.CL]. [Google Scholar]
Alipanahi, B.; Delong, A.; Weirauch, M.T.; Frey, B.J. Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nature Biotechnology. 2015, 33, 831–838. [Google Scholar] [CrossRef] [PubMed]
Chui, C.K.; Lin, S.B.; Zhou, D. X. Construction of neural networks for realization of localized deep learning. arXiv 2018, arXiv:1803.03503 [cs.LG]. [Google Scholar] [CrossRef]
Chui, C. K.; Lin, S.B.; Zhou, D.X. Deep neural networks for rotation-invariance approximation and learning. Anal. and Appl. 2019, 17, 737–772. [Google Scholar] [CrossRef]
Fang, Z.Y.; Feng, H.; Huang, S.; Zhou, D.X. Theory of deep convolutional neural networks II: spherical analysis. Neural Networks. 2020, 131, 154–162. [Google Scholar] [CrossRef] [PubMed]
Feng, H.; Huang, S.; Zhou, D.X. Generalization analysis of CNNs for classification on spheres. IEEE Transactions on Neural Networks and Learning Systems. [CrossRef] [PubMed]
Zhou, D. X. Deep distributed convolutional neural networks: universality. Anal. Appl. 2018, (16), 895–919. [Google Scholar] [CrossRef]
Zhou, D. X. Universality of deep convolutional neural networks. Appl. Comput. Harmon. Anal. 2020, 48, 787–794. [Google Scholar] [CrossRef]
Cucker, F.; Smale, S. On the mathematical foundations of learning. Bull.Amer.Math.Soc. 2001, 39, 1–49. [Google Scholar] [CrossRef]
An, C.P.; Chen, X.J.; Sloan, I.H.; Womersley, R.S. Regulatized least squares approximations on the sphere using spherical designs. SIAM J.Numer.Anal. 2012, 50, 1513–1534. [Google Scholar] [CrossRef]
An, C.P.; Wu, H.N. Lasso hyperinterpolation over general regions. SIAM J. Sci. Comput. 2021, 43, A3967–A3991. [Google Scholar] [CrossRef]
An, C.P.; Ran, J.S. Hard thresholding hyperinterpolation over general regions. arXiv 2023, arXiv:2209.14634v2 [math.NA]. [Google Scholar]
De Mol, C.; De Vito, E.; Rosasco, L. Elastic-net regularization in learning theory. J. Complexity, 2009, 25, 201–230. [Google Scholar] [CrossRef]
Fischer, S.; Steinwart, I. Sobolev norm learning rates for regularized least-squares algorithms. J.Mach.Learn.Res. 2020, 21, 8464–8501. [Google Scholar]
Lai, J.F.; Li, Z.F.; Huang, D.G.; Lin, Q. The optimality of kernel classifiers in Sobolev space. arXiv 2024, arXiv:2402.01148v1 [math.ST]. [Google Scholar]
Sun, H.W.; Wu, Q. Least square regression with indefinite kernels and coefficient regularization. Appl. Comput. Harmon. Anal. 2011, 30, 96–109. [Google Scholar] [CrossRef]
Wu, Q.; Zhou, D.X. Learning with sample dependent hypothesis spaces. Comput. Math. Appl. 2008, 56, 2896–2907. [Google Scholar] [CrossRef]
Chen, H.; Wu, J.T.; Chen, D.R. Semi-supervised learning for regression based on the diffusion matrix (in Chinese). Sci Sin Math. 2014, 44, 399–408. [Google Scholar]
Sun, X.J.; Sheng, B.H. The learning rate of kernel regularized regression associated with a correntropy-induced loss. Adv. in Math.(Beijing) 2024, 53, 633–652. [Google Scholar]
Wu, Q.; Zhou, D.X. Analysis of support vector machine classification. J. Comput.Anal. Appl. 2006, 8, 99–119. [Google Scholar]
Cucker, F.; Zhou, D. X. Learning Theory: An Approximation Theory Viewpoint. Cambridge University Press, New York, 2007.
Sheng, B. H. Reproducing property of bounded linear operators and kernel regularized least square regressions. Int. J. Wavelets Multiresolution Inf. Process. 2024, 2450013. [Google Scholar] [CrossRef]
Steinwart, I.; and Christmann, A. Support Vector Machines,Springer-Verlag, New York, 2008.
Lin, S.B.; Wang, D.; Zhou, D.X. Sketching with spherical designs for noisy data fitting on spheres. SIAM J.Sci.Comput. 2024, 46, A313–A337. [Google Scholar] [CrossRef]
Lin, S.B.; Zeng, J.S.; Zhang, X.Q. Constructive neural network learning. IEEE Trans. on Cybernetics. 2019, 49, 221–232. [Google Scholar] [CrossRef] [PubMed]
Mhaskar, H.N.; Micchelli, C.A. Degree of approximation by neural and translation networks with single hidden layer. Adv. Appl. Math. 1995, 16, 151–183. [Google Scholar] [CrossRef]
Sheng, B.H.; Zhou, S.P.; Li, H.T. On approximation by tramslation networks in L^p(R^k) spaces. Adv. in Math.(Beijing) 2007, 36, 29–38. [Google Scholar]
Mhaskar, H. N.; Narcowich, F. J.; Ward, J. D. Approximation properties of zonal function networks using scattered data on the sphere. Adv. Comput. Math. 1999, 11, 121–137. [Google Scholar] [CrossRef]
Sheng, B. H. On approximation by reproducing kernel spaces in weighted L^p-spaces. J. Syst. Sci. Complex. 2007, 20(4), 623–638. [Google Scholar] [CrossRef]
Narcowich, F.J.; Ward, J.D.; Wendland, H. Sobolev error estimates and a Bernstein inequality for scattered data interpolation via radial basis functions. Constr.Approx. 2006, 24, 175–186. [Google Scholar] [CrossRef]
Narcowich, F.J.; Ward, J.D. Scattered data interpolation on spheres: error estimates and locally supported basis functions. SIAM J.Math.Anal. 2002, 33, 1393–1410. [Google Scholar] [CrossRef]
Narcowich, F.J.; Sun, X.P.; Ward, J.D.; Wendland, H. Direct and inverse Sobolev error estimates for scattered data interpolation via spherical basis functions. Found.Comput.Math. 2007, 7, 369–390. [Google Scholar] [CrossRef]
Gröchenig, K. Sampling, Marcinkiewicz-Zygmund inequalities, approximation and quadrature rules. J.Approx.Theory 2020, 257, 105455. [Google Scholar] [CrossRef]
Mhaskar, H.N.; Narcowich, F.J.; Sivakumar, N.; Ward, J.D. Approximation with interpolatory constraints. Proc.Amer.Math. Soc. 2001, 130, 1355–1364. [Google Scholar] [CrossRef]
Marzo, J. Marcinkiewicz-Zygmund inequalities and interpolation by spherical harmonics. J. Funct. Anal. 2007, 250, 559–587. [Google Scholar] [CrossRef]
Marzo, J.; Pridhnani, B. Sufficiant conditions for sampling and interpolation on the sphere. Constr. Approx. 2014, 40, 241–257. [Google Scholar] [CrossRef]
Wang, H.P. Marcinkiewicz-Zygmund inequalities and interpolation by spherical polynomials with respect to doubling weights. J.Math.Anal.Appl. 2015, 423, 1630–1649. [Google Scholar] [CrossRef]
Gia, T.L.; Slon, I.H. The nuiform norm of hyperinterpolation on the unit sphere in an arbitrary number of dimensions em Constr. Approx. 2001, 17, 249–265. [Google Scholar] [CrossRef]
Sloan, I.H. Polynomial interpolation and hyperinterpolation over general regions. J.Approx.Theory 1995, 83, 238–254. [Google Scholar] [CrossRef]
Sloan, I.H.; Womersley, R.S. Constructive polynomial approximation on the sphere. J.Approx. Theory 2000, 103, 91–118. [Google Scholar] [CrossRef]
Wang, H.P. Optimal lower estimates for the worst case cubature error and the approximation by hyperinterpolation operators in the Sobolev space sertting on the sphere. Int. J. Wavelets Multiresolution Inf. Process. 2009, 7, 813–823. [Google Scholar] [CrossRef]
Wang, H.P.; Wang, K.; Wang, X.L. On the norm of the hyperinterpolation operator on the d-dimensional cube. Comput. Appl. 2014, 68, 632–638. [Google Scholar]
Sloan, I.H.; Womersley, R.S. Filtered hyperinterpolation: a constructive polynomial approximation on the sphere. Int.J.Geomath. 2012, 3, 95–117. [Google Scholar] [CrossRef]
Bondarenko, A.; Radchenko, D.; Viazovska, M. Well-seperated spherical designs. Constr. Approx. 2015, 41, 93–112. [Google Scholar] [CrossRef]
Hesse, K.; Womersley, R.S. Numerical integration with polynomial exactness over a spherical cap. Adv.Math.Math. 2012, 36, 451–483. [Google Scholar] [CrossRef]
Delsarte, P.; Goethals, J.M.; Seidel, J.J. Spherical codes and designs. Geom.Dedicata. 1977, 6, 363–388. [Google Scholar] [CrossRef]
An, C.P.; Chen, X.J.; Sloan, I.H.; Womersley, R.S. Well conditioned spherical designs for integration and interpolation on the two-sphere. SIAM J. Numer.Anal. 2010, 48, 2135–2157. [Google Scholar] [CrossRef]
Chen, X.; Frommer, A.; Lang, B. Computational existence proof for spherical t-designs. Numer. Math. 2010, 117, 289–305. [Google Scholar] [CrossRef]
An, C.P.; Wu, H.N. Bypassing the quadrature exactness assumption of hyperinterpolation on the sphere. J.Complexity. 2024, 80, 101789. [Google Scholar] [CrossRef]
An, C.P.; Wu, H.N. On the quadrature exactness in hyperinterpolation. BIT Numer. Math. 2022, 62, 1899–1919. [Google Scholar] [CrossRef]
Sheng, B.H. The covering numbers for some periodic reproducing kernel spaces Acta Math. Scientia 2009, 29A(6), 1590–1600. [Google Scholar]
Sheng, B.H. Estimate the norm of the Mercer kernel matrices with discrete orthogonal transforms. Acta Math. Hungar. 2009, 122, 339–355. [Google Scholar] [CrossRef]
Sheng, B.H.; Wang, J.L.; Li, P. On the covering number of some Mercer kernel Hilbert spaces. J. Complexity 2008, 24, 241–258. [Google Scholar]
Zhang, C.P.; Sheng, B.H.; Chen, Z.X. Applications of Bernstein -Durrmeyer operators in estimating the norm of Mercer kernel matrices Anal. Theory and Appl. 2008, 24, 74–86. [Google Scholar]
Sun, X. J.; Sheng, B. H.; Liu, L.; Pan, X. L. On the density of translation networks defined on the unit ball. Math. Found. Comput. 2023. [Google Scholar] [CrossRef]
Parhi, R.; Nowak, R.D. Banach space representer theorems for neural networks and ridge splines. J. Mach. Learn Res. 2021, 22, 1–40. [Google Scholar]
Oono, K.; Suzuki, Y.J. Approximation and non-parameteric estimate of ResNet-type convolutional neural networks. arXiv 2023, arXiv:1903.10047v4 [stat.ML]. [Google Scholar]
Shen, G.H.; Jiao, Y.L.; Lin, Y.Y.; Huang, J. Non-asymptotic excess risk bounds for classification with deep convolutional neural networks. arXiv 2021, arXiv:2105.00292 [cs.LG]. [Google Scholar]
Mallat, S. Understanding deep convolutional networks. Phil. Trans. R. Soc. 2016, 374A, 20150203. [Google Scholar] [CrossRef]
Mhaskar, H.N.; Narcowich, F.J.; Ward, J.D. Spherical Marcinkiewicz-Zygmund inequalities and positive quadratue. Math.Comput. 2001, 70, 1113–1130, Corrigendum: Math.Comp. 2001, (71), 453-454.). [Google Scholar] [CrossRef]
Wang, H.P.; Wang, K. Optimal recovery of Besov classes of generalized smoothness and Sobolev class on the sphere. J. Complexity 2016, 32, 40–52. [Google Scholar] [CrossRef]
Dai, F.; Xu, Y. Approximation Theory and Harmonic Analysis on Spheres and Balls. Springer, New York, 2013.
Müller, C. Spherical Harmonic, Springer-Verlag, Berlin, 1966.
Cheney, W.; Light, W. A Course in Approximation Theory, China Machine Press, Beijing, 2004.
Brown, G.; Dai, F. Approximation of smooth functions on compact two-point homogeneous spaces. J. Funct. Anal. 2005, 220, 401–422. [Google Scholar] [CrossRef]
Dai, F.; Wang, H.P. Positive cubature formulas and Marcinkiewicz-Zygmund inequalities on spherical caps. Constr. Approx. 2010, 31, 1–36. [Google Scholar] [CrossRef]
Dai, F. On generalized hyperinterpolation on the sphere. Proc. Amer. Math. Soc. 2006, 134, 2931–2941. [Google Scholar] [CrossRef]
Sheng, B. H.; Wang, J. L. Moduli of smoothness, K-functionals and Jackson-type inequalities associated with kernel function approximation in learning theory. Anal. Appl. 2024. [Google Scholar] [CrossRef]
Bauschke, H. H.; Combettes, P. L. Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer, New York, 2010.
Sheng, B.H.; Xiang, D.H. The convergence rate for a K-functional in learning theory. J. Inequal. Appl. 2010, 249507. [Google Scholar] [CrossRef]
Lin, S.B.; Wang, Y.G.; Zhou, D.X. Distributed filtered hyperinterpolation for noisy data on the sphere. SIAM J. Numer. Anal. 2021, 59, 634–659. [Google Scholar] [CrossRef]
Montúfar, G.; Wang, Y.G. Distributed learning via filtered hyperinterpolation on manifolds. Found.Comput.Math. 2022, 22, 1219–1271. [Google Scholar] [CrossRef]
Smale, S.; Zhou, D.X. Learning theory estimates via integral operators and their applications. Constr. Approx. 2007, 26, 153–172. [Google Scholar] [CrossRef]
Aronszajn, N. Theory of reproducing kernels. Trans. Amer. Math. Soc. 1950, 68, 337–404. [Google Scholar] [CrossRef]
Kyriazis, G.; Petrushev, P.; Xu, Y. Jacobi decomposition of weighted Triebel-Lizorkin and Besov spaces. arXiv 2006, arXiv:math/0610624 [math.CA]. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

MDPI Initiatives

Important Links

Choose an area of interest and we will send you notifications of new preprints at your preferred frequency.

Disclaimer

Convergence Rate of Regularized Regression Associated with Zonal Translation Networks

Abstract

1. Introduction

2. Some Properties of the Translation Network on the Unit Sphere

2.1. Density

2.2. Reproducing Property

3. Convergence Rate of the Kernel Regularized Regression

3.1. Learning Framework

3.2. Error Decomposition

3.3. The Convergence Rate

3.4. Conclusion and Discussion

4. Some Lemmas

5. Proof for Theorems and Propositions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

MDPI Initiatives

Important Links

Subscribe