1. Introduction
It is known that convolutional neural network provides various models and algorithms to process data model in many fields such as computer vision(see e.g. [
1]), natural lagrange processing (see e.g. [
2]), and sequence analysis in bio-informatics(see e.g. [
3]). The regularized neural network learning has thus become an attractive research topic (see e.g.[
4,
5,
6,
7,
8,
9]). In the present paper, we shall give theory analysis for the convergence rate of regularized regression associated with zonal translation network on the unit sphere.
Let
X be a compact subset in the
d-dimensional Euclidean space
with the usual norm
for
and
Y be a nonempty closed subset contained in
for a given
. The aim of the regression learning problem is to learn the target function which describes the relationship between the input
and the output
from a hypothesis function space. In most of the cases, the target function is offered as a set of observations
which has been drawn independently and identically distributed (i.i.d.) according to a probability joint distribution (measure)
on
, where
is the conditional probability of
y for a given
x and
is the marginal probability about
x, i.e.,for every integrable function
there holds
For a given normed space
consisting of real functions on
X we define the regularized learning framework with
B as
where
is the regularization parameter,
is the empirical mean
The optimal target function is the regression function
satisfying
where
and the inf is taken over all the
-measurable functions
f. Moreover, there holds the famous equality (see e.g.[
10])
The choices for the hypothesis space
B in (
1) are riches. For example, C.P.An et al choose the algebraic polynomial class as
B (see [
11,
12,
13],). In [
14], C. De Mol et al chose the dictionary as
B. Recently some papers chose the Sobolev space as the hypothesis space
B (see [
15,
16]). By kernel method we mean traditionally replacing
B with a reproducing kernel Hilbert space (RKHS)
which is a Hilbert space consisting of real functions defined on
X and there is a Mercer kernel
on
(i.e.,
is a continuous and symmetric function on
and for any
and any
the Mercer matrices
are positive semi-definite) such that
and there hold the embeded inequality
where
c is a constant independent of
f and
x. There are two results about the optimal solution
. The reproducing property (
3) yields the representation
The embeded inequality (
4) yields the inequality
Representation (
5) is the theory basis for kernel regularized regression (see e.g.[
17,
18]). Inequality (
6) is the key inequality for bounding the learning rate with covering number method (see e.g.[
19,
20,
21]). For other skills of the kernel method one can cite [
10,
22,
23,
24] et al.
It is particularly important to mention here that translation networks have recently been used for the hypothesis space of regularized learning (see e.g. [
25,
26]). From the view of approximation theory, a simple single layer translation network with
m neurons is a function space produced by translating a given function
which can be written as
where
is a given node set, and for a given
is a translation operator corresponding to
X. For example, when
and
, we choose
as the usual convolution translation operator
for the
defined on
or an
-periodic function
(see [
27,
28]. When
is the unit sphere in
, one can choose
as the zonal translation operator
for a given
defined on the interval
(see [
29]). In [
30], we defined a kind of translation operator
for
.
It is easy to see that the approximation ability and the construction of a translation network depend upon the nodes set
(see e.g. [
31,
32,
33]). On the other hand, according to the view of [
34], the quadrature rule and the Marcinkiewicz-Zygmund (M-Z) inequality associated with
also influence the construction of the translation network
. For a bounded closed set
with measure
satisfying
. We denote by
the linear space of polynomials on
of degree at most
n, equipped with the
-inner product
The
m-point quadrature rule (QR) is
where
and weights
are all positive for
We say the QR (
7) has polynomial exactness
n if
The Marcinkiewicz-Zygmund (MZ) inequality based on a set
is
where the weights
in (
9) may be not the same as the
in (
7) and (
8).
Accord to the view of [
34], the quadrature rule (QR) follows automatically from the Marcinkiewicz-Zygmund (MZ) inequality. H.N.Mhaskar et al gave a method of transition from MZ inequality to polynomial exact GR in [
35]. So the MZ inequality (
9) is an important features for describing the nodes set
. Since this reason the node set
which yields an MZ inequality is given a specially terminology called Marcinkiewicz-Zygmynd Family (MZF) (see [
34,
36,
37,
38]). However, from these literatures we know that the MZFs do not totally coincide with the Lagrange interpolation nodes in the case of
. The hyperinterpolations are then developed with the help of exact QR (see [
39,
40,
41,
42,
43]) and are applied to approximation theory and regularized learning (see e.g.[
11,
12,
13,
44]). On the other hand, we find the problem of polynomial exact QR is investigated individually (see e,g, [
45,
46]). The concept of spherical
t-design was first defined in [
47] and is given investigations by many papers subsequently, one can see the classical references [
48,
49]. We say
is a spherical
t-design if
where
is the volume of
and
is a spherical polynomial with degree
t. Moreover, in many applications the polynomial exact QR and the MZFS has been used as assumptions. For example, C.P.An et al gave approximation order for the hyperinterpolation approximation under the assumptions that (
8), (
10) and the MZ inequality (
9) hold (see [
50,
51]). Also, in [
25], Lin et al gave investigations on regularized regression associated with zonal translation network by assuming the nodes set
is a type of spherical
t-design.
Polynomial exact QR is also a good tool in approximation theory. For example, we had ever used it to bound the norm for some Mercer matrices (see [
52,
53,
54,
55]). In particular, H.N.Mhaskar et al use polynomial exact QR to construct the first periodic translation operators (see [
27]) and the zonal translation network operators (see [
29]). Along this line, the translation operators defined on the unit ball, on the Euclidean space
and on the interval
are constructed (see [
28,
30,
56]).
Above investigations encourage us to use (
8), (
10) and (
9) as hypothetical conditions to describe the approximation ability of zonal translation networks
. To ensure the single layer translation network
can approximation the constant function,
is modified as
In the case of
and
, R.D.Nowak et al used (
11) to design regularized learning frameworks (see [
57]). An algorithm is provided by S.B.Lin et al [
26] for designing such kind of network and is applied to construct regularized learning algorithms. In [
5]
is used to construct deep neural network learning frameworks. The same type of investigations are given in [
58,
59] and [
60].
In the present paper, we shall design the translation networks
by taking
, assuming
satisfies equalities (
8) and (
9), and choosing the zonal translation
with
being a given integrable function
on
. Under these assumptions we shall provide a learning framework with
as the hypothesis space and give error analysis.
The contributions of the present paper are two folds. First, after absorbing the ideas of [
34,
36,
37,
38] and the successful experience of [
11,
25,
27,
29,
50,
51,
61,
62], we propose the concept of Marcinkiewicz-Zygmund Inequality Setting (MZIS) for the scattered nodes on the sphere unit, based on this as an assumption we show the reproducing property for the convolutional zonal translation network associated with the scattered nodes
. Second, we give investigation on the kernel regularized neural network learning by combining classical the kernel approach and the convex analysis method, according to this method, the convergence rate given can be dimensional independent and capacity independent. Since the translation networks are produced by the zonal translations of convolutional kernels, we call them the convolutional zonal translation networks.
The paper is organized as follows. In
Section 2 we first show the density for the zonal translations class and then show the reproducing property for the translation network
. In
Section 3, we shall provide some results in the present paper, for example, a new regression learning framework and a learning setting, the error decomposition for the error analysis,and an estimate for the convergence rate. In
Section 4, we shall give some lemmas which are used to prove the main results. The proofs for all the theorems and propositions are given in
Section 5.
Throughout the paper, we write if there is a positive constant C independent of A and B such that .In particular, by we show A is a bounded quantity. We write if both and .
2. Some Properties of the Translation Network on the Unit Sphere
Let
. Then H.N.Mhaskar et al constructed in [
29] a sequence of approximation operators to show that the zonal translation class
is density in
if
for all
where
and
is the
n-th Legendre polynomial satisfies the orthogonal relation
with
,
and it is known that (see it from (B.2.1),(B.2.2)and (B.5.1) of [
63])
It follows that
where
.
For a given real number ,we denote by the class of -measurable functions f satisfying .
Let
denote the space of all homogeneous polynomials of degree
n in
d variables. We denote by
the class of all measurable functions defined on
with the finite norm
and for
we assume that
is the space
of continuous functions on
with the uniform norm.
For a given integer
, the restriction to
of a homogeneous harmonic polynomial of degree
n is called a spherical harmonics of degree
n. If
, then
, so that
Y is determined by its restriction on the unit sphere. Let
denote the space of spherical harmonics of degree
n. Then
Spherical harmonics of different degrees are orthogonal on the unit sphere. For further properties about harmonics one can refer to [
64].
For
let
be an orthonormal basis of
. Then
where
denotes the surface area of
and
. Furthermore, by (1.2.8) in [
63] we have
where
is the
n-th generalized Legendre polynomial the same as in (
12). Combining (
12) and (
13) we have
2.1. Density
We first give a general discrimination method for density.
Proposition 2.1. (see Lemma 1 in Chapter 18 of [
65]) For a subset
V in a normed linear space
E, the following two properties are equivalent:
(a) V is fundamental in E (that is, its linear span is dense in E).
(b) ( that is, 0 is the only element of that annihilates V).
Based on this proposition, we can show the density of .
Theorem 2.1. Let satisfy for all Then is dense in .
2.2. Reproducing Property
We first restate a proposition.
Proposition 2.2. For any given
there exist a finite subset
and corresponding positive numbers
such that
and for some
Moreover, for any
and
there exists a constant
such that
Proof. (
15)-(
16) were proved by H.N.Mhaskar et al in [
61] and now have been extended to other domains (see e.g.[
66,
67]). Inequality (
17) is proved by [
68].
(
15)-(
16) show the existence of
which has the polynomial exact QR (
8) and satisfies MZ inequality (
9). Inequality (
17) often goes with (
15) and (
16) and are needed in discussing approximation order(see e.g. [
62]).
Based above analysis we propose the following definition.
Definition 2.1(
Marcinkiewicz-Zygmund inequalities setting (MZIS)). We say a given finite node set
forms a Marcinkiewicz-Zygmund inequality setting if (
15)-(
16) and (
17) simultaneously hold.
Let
. For a given finite set
satisfy Definition 2.1 and the positive numbers
are defined as in Proposition 2.2. We define a zonal translation network by
where
. Then it is easy to see that
where for
and
we define
and
For
we define a bivariate operation as
and
Since (
14), we have by Theorem 4 in Chapter 17 of [
65] that the matrix
is positive definite for a given
n. It follows that the vector
is uniqueness. Then
is a Hilbert space which is isometric isomorphism with
,where
Let
be the set of all continuous functions defined on
and
We then have the following proposition.
Proposition 2.3. Let
satisfy
for all
and
be an MZIS. Then
is a reproducing kernel Hilbert space associated with kernel
i.e.,
and there is a constant
such that
Corollary 2.1. Under the assumption of Proposition 2.3,
is a reproducing kernel Hilbert space associated with the inner product defined by
and the corresponding reproducing kernel
is
Furthermore, there is a constant
such that
Proof. The results can be obtained from Proposition 2.3, Lemma 4.2 and the fact that the real set R is a reproducing kernel Hilbert space whose reproducing kernel is 1 and the inner product is the usual product for two real numbers.