1.1. Motivation
It seems the set of measurable functions with infinite or undefined expected values (def. 1), using the
uniform measure ([
17], pp. 32-37), may be a
prevalent subset [
11,
14] of the set of all measurable functions, meaning "almost every" measurable function has infinite or undefined expected values. Furthermore, when the Lebesgue measure of
A, measurable in the Caratheodory sense, has zero or infinite volume (or undefined measure), there may be multiple, conflicting ways of defining a "natural" uniform measure on
A.
Below I will attempt to define a question regarding an extension of the expected value (when it’s undefined or infinite) which allows for finite values instead (def. 3).
Note the reason the question will be so long is there are plenty of “meaningless” extensions of the expected value (e.g. if the expected value is infinite or undefined we can just replace it with zero).
Therefore we must be more specific about what is meant by “meaningful” extension but there are some preliminary definitions we must clarify.
1.2. Preliminary Definitions
Definition 1 (Expected value w.r.t the Uniform Probability Measure).
From an answer to a question in cross validated (a website for statistical questions) [10] , let denote a uniform random variable on set and denote the probability density function from the radon-nikodym derivative ([2], pp. 419-427) of the uniform probability measure on A measurable in the Carathèodory sense. If denotes the indicator function on :
then the radon-nikodym derivative of uniform probability measure must have the form . (Note is not the derivative of U in the sense of calculus but rather the denominator of the probability density function derived from the uniform probability measure U.)
Therefore, by using the law of the unconscious statistician, we should get
such the expected value is undefined when A does not have a uniform probability distribution or f is not integrable w.r.t the measure .
Definition 2 (Defining the pre-structure). Since there’s a chance that does not exist or f is not integrable w.r.t to , using def. 1 we define a sequence of sets where if:
- (a)
- (b)
then we have:
where is apre-structureof A, since for every the sequence does not equal A, but "converges" to A as r increases (see (a) & (b) of this definition).
Example 1. Suppose . One pre-structure of is since:
For every , set is finite, meaning each term of the pre-structure has a discrete uniform distribution. Therefore, exists.
For every , is finite; meaning is the counting measure. Furthermore, since and for all , is positive and finite, criteria (3) of def. 2 is satisfied.
Example 2.
Suppose . Another pre-structure of is
where we note the following:
For every , set is finite, meaning each term of the pre-structure has a discrete uniform distribution. Therefore, exists.
For every , is finite; meaning is the counting measure, since (when is the Euler’s totient function [15], pp.239-249) we have , and if correct, is greater than zero and positive for all . Therefore, criteria (3) of def. 2 is satisfied.
There are plenty of pre-structures of . Infact, there may be countably infinite many of these pre-structures.
Example 3.
We need additional examples, where is not the counting measure. Perhaps one example of (where ) is:
Note that the uniform random variable of doesn’t exist but for every , the uniform density of is .
Furthermore, for every , is the 1-d Lebesgue measure where , such where is positive and finite (since ).
Definition 3 (Expected value of
f on Pre-Structure).
If is a pre-structure of A (def. 2), then for , if
we then have that the expected value of f on the pre-structure could be described as where:
Example 4.
Suppose where such that:
Using the pre-structure in example 1 or , we presume (and prove) using def. 3 is 1.
And using the pre-structure in example 2 or
we presume (but must prove) , using def. 3 is .
This shows different pre-structures give different expected values; therefore, we must choose a unique set of equivelant pre-structures (def. 8) which gives the same & finite expected value.
Definition 4 (Uniform coverings of each term of the pre-structure). We define the uniform ε coverings of each term of the pre-structure (i.e., ) as a group of pair-wise disjoint sets that cover for every , such the measure of each of the sets that cover have the same value of , where and the total sum of of the coverings is minimized. In shorter notation, if
and set Ω is defined as:
then for every , the set of uniform ε coverings is defined using where ω “enumerates" all possible uniform ε coverings of for every .
Example 5. Suppose
Inorder to calculate , note that:
and; since and is the counting measure, one example of is
Note (in this case the counting measure) of each set in the uniform ε covering is 2 where we’re "over-covering" by one element (i.e. ) as we are minimizing the total sum of of the coverings (which for is ).
If , then
Also note, for counting measure , where and (i.e. ), we have that .
Definition 5 (Sample of the uniform coverings of each term of the pre-structure). The sample of uniform ε coverings of each term of the pre-structure or is the set of points, such for every and , we take a point from each pair-wise disjoint set in the uniform ε coverings of (def. 4). In shorter notation, if
and set is defined as:
then for every , the set of all samples of the set of uniform ε coverings is defined using , where ψ “enumerates" all possible samples of .
Example 6. From example 5 where:
Then one sample of is:
and another sample of is:
Definition 6 (Entropy on the sample of uniform coverings of each term of the pre-structure). Since there are finitely many points in the sample of the uniform ε coverings of each term of pre-structure (def. 5), we:
Arrange the x-value of the points in the sample of uniform ε coverings from least to greatest. This is defined as:
Take the multi-set of the absolute differences between all consecutive pairs of elements in (1). This is defined as:
Normalize (2) into a probability distribution. This is defined as:
Take the entropy of (3), (for further reading, see [12]). This is defined as:
where (4) is the entropy on the sample of uniform coverings of .
Example 7. From example 6:
Then
which organizes elements in from least to greatest.
Since we use this to normalize (2) into a probability distribution
Hence we take the entropy of or:
Definition 7 (Pre-Structure Converging Uniformly to
A).
For every (
using def. 4, 5, and 6)
if set A is finite and for , we have , we then want:
and if set A is non-finite:
we say the pre-structure converges uniformlyto A (or in shorter notation):
(Note we wish to define a uniform convergence of a sequence of sets to A since the definition is analogous to a uniform measure.)
Theorem 1. Show every pre-structure of A converges uniformly to A.
Example 8. I assume, using example 5, if
then . I need to prove this.
Definition 8 (Equivalent Pre-Structures).
The pre-structures and of A areequivalentif for all , where from def. 3, or such that:
Definition 9 (Equivelant Pre-Structures The pre-structures and of A areequivalentif we have:
is the r-value (for every ) where is minimized
is the r-value (for every ) where is maximized
is the j-value (for every ) where is minimized and:
is the j-value (for every ) where is maximized such that:
means the pre-structures and are equivelant.
Example 9.
From example 3, if where , the cantor set is and . Since with either pre-structure, is the 1-d dimensional Lebesgue measure and (using equation 12) we get:
Definition 10 (Non-Equivalent Pre-Structures).
The pre-structures and of A arenon-equivalentif there exists an , where from def. 3, or where:
Definition 11 (Non-Equivelant Pre-Structures The pre-structures and of A arenon-equivalentif we have:
is the r-value (for every ) where is minimized
is the r-value (for every ) where is maximized
is the j-value (for every ) where is minimized and:
is the j-value (for every ) where is maximized such that:
means the pre-structures and are non-equivelant.
Example 10.
From example 4, if , pre-structures and are non-equivelant since for where:
we have (i.e. the expected value of f on ) and (i.e. the expected value of f on ), which means
hence from def. 10, the pre-structures and are non-equivelant.
Example 11.
Suppose , where , and
is undefined (i.e. the expected value of f on ) and (i.e. the expected value of f on ). Since at least one of the pre-structure i.e. has a defined expected value and (i.e. undefined values do not equal 1), we can say that and are non-equivelant.
Definition 12 (Pre-Structures converging Sublinearly, Linearly, or Superlinearly to A compared to that of another Sequence).Suppose pre-structures and are non-equivalent and converge uniformly to A; and suppose for every , where and :
- (a)
-
From def. 5 and 6, suppose we have:
- (b)
-
From def. 5 and 6, suppose we have:
-
If using equations 15 and 17 we have that:
we say converges uniformly to A at asuperlinear rateto that of .
-
If using equations 15 and 17 we have either:
- (a)
-
- (b)
-
- (c)
-
- (d)
-
we then say converges uniformly to A at alinear rateto that of .
-
If using equations 15 and 17 we have that:
we say converges uniformly to A at asublinear rateto that of .
Note 2. Since def. 12 is difficult to apply, we make assumptions (without proofs) for the examples below:
Example 12 (Example of pre-structure converging super-linearly to A compared to that of another pre-structure). From example 5:
we assume that converges uniformly to A, at asuperlinearrate, compared to that of .
Example 13 (Obvious Example of pre-structure converging linearly to A compared to that of another pre-structure). Consider the following:
we assume that converges uniformly to A, at alinearrate, compared to that of , since using programming we assume:
Example 14 (Non-Obvious Example of pre-structure converging linearly to A compared to another pre-structure). If is the nearest integer function and is the floor function, consider the following:
(we choose this pre-structure since if is the highest entropy (def. 6) that could be for every , we say has ahigher entropy per elementthan that of if there exists a , such for all , ).
despite having a higher entropy per element, converges uniformly to A at alinearrate, compared to that of , since using programming we assume:
which should satisfy criteria (2a) in def. 12.
Theorem 2. If converges super-linearly to A compared to that of then converges sub-linearly to A compared to that of
Example 15 (Example of pre-structure converging sub-linearly to A compared to another pre-structure). In example 12, if we swap for where:
we assume that converges to A at asublinearrate to that of .