Defining the Most Generalized, Natural Extension of the Expected Value on Measurable Functions

Bharath Krishnan

doi:10.20944/preprints202302.0367.v1

Submitted:

21 February 2023

Posted:

22 February 2023

Read the latest preprint version here

Abstract

In this paper, we will extend the expected value of the function w.r.t the uniform probability measure on the Caratheodory extension to a larger class of functions, since the set of all functions with infinite or undefined expected values may form a prevalent subset of the set of all measurable functions. Before we get to the specific problem (or main question) of the paper, we will outline some preliminary definitions. We then will define a precise main question that will attempt to offer a unique solution and we'll offer a partial solution to the question. Along the way, we will ask a series of questions that will clarify our understanding of the paper.

Keywords:

Expected Value

;

Uniform Measure

;

Measure theory

;

Prevalence

;

Entropy

;

Sample

;

Linear

;

Superlinear

;

Choice Function

;

Bernard's Paradox

;

Pseudo-random

Subject:

Computer Science and Mathematics - Mathematics

1. Background

I am an undergraduate from Indiana University despite being the age of a grad student. I should have graduated by now, but my obsession with research prevents me from moving forward. There is a chance that I might have a learning disability since writing isn’t very easy for me.

As I’ve been in and out of college, I never got the chance to rigorously learn the subjects I’m researching. Most of what I learned was from Wikipedia, blogs and random research articles. I know little of what I read but I learn what I can from questions on math stack exchange.

What I truly want, however; is for someone to take my ideas and publish them.

I warn that the definitions may not be rigorous so try to go easy on me. (I recommend using programming such as Mathematica to understand these definitions).

2. Preliminaries

Suppose A is a set measurable in the Carathèodory sense, such for

n \in N

,

A \subseteq R^{n}

, and function

f : A \to R

.

2.1. Motivation

It seems the set of all functions with infinite or undefined expected values (def. 1), using the uniform measure [1] p.32-37, may be a prevalent subset [2,3] of the set of all measurable functions, meaning "almost every" measurable function has infinite or undefined expected values. Furthermore, when the uniform measure of A, measurable in the Caratheodory sense, has zero or infinite volume (or undefined measure), there may be multiple, conflicting ways of defining a "natural" uniform measure on A.

Below I will attempt to define a question regarding an extension of the expected value (when it’s undefined or infinite) which allows for a finite value instead.

Note the reason the question will be so long is there are plenty of “meaningless” extensions of the expected value (e.g. if the expected value is infinite or undefined we can just replace it with zero).

Therefore we must be more specific about what is meant by “meaningful” extension but there are some preliminary definitions we must clarify.

2.2. Preliminary Definitions

Definition 1

(Expected Value w.r.t the Uniform Probability Measure). From an answer to a question in cross validated [4], let X∼ Uniform(A) denote a uniform random variable on set

A \subseteq R^{n}

and

p_{X}

denote the probability density function from the radon-nikodym derivative [5], p.419-427 of the uniform probability measure on sets in the σ-algebra of Carathèodory measurable sets. If

I (x \in A)

denotes the indicator function on

x \in A

:

I (x \in A) = \{\begin{matrix} 1 & x \in A \\ 0 & x \notin A \end{matrix}

then the radon-nikodym derivative of uniform probability measure must have the form

I (x \in A) / U^{'} (A)

. (Note

U^{'}

is not the derivative of U in the sense of calculus but rather different from the uniform probability measure defined as U.)

Therefore, using the law of the unconscious statistician, we should get

\begin{matrix} E [f (X)] & = \int_{R^{n}} f (x) \cdot p_{X} (x) d x \\ = \int_{R^{n}} f (x) \cdot \frac{I (x \in A)}{U^{'} (A)} d x \\ = \frac{1}{U^{'} (A)} \int_{A} f (x) d x \\ = E_{U^{'}} [f (X)] \end{matrix}

(P1)

such the expected value is undefined when A does not have a uniform probability distribution or f is not integrable w.r.t the measure

U^{'}

.

Definition 2

(Defining The Pre-Structure

{\{F_{r}\}}_{r \in N}

). Since there’s a chance that

X \sim Uniform (A)

does not exist or f is not integrable w.r.t to

U^{'}

, using def. 1 we define a sequence of sets

{\{F_{r}\}}_{r \in N}

where:

(1): $⋃_{r = 1}^{\infty} F_{r} = A$
(2): For all $r \in N$ , $X_{r} \sim Uniform (F_{r})$ exists (if A is countable infinite then for every $r \in N$ , $F_{r}$ must be a finite set since $X_{r}$ is a discrete uniform distribution of $F_{r}$ ; otherwise, if A is uncountable, then $X_{r}$ is the normalized Lebesgue measure of $F_{r}$ or some other uniform measure on $F_{r}$ [6] where for every $r \in N$ the Lebesgue measure or some uniform measure on $F_{r}$ exists and is finite. [1], p.32-37.
(3): For all $r \in N$ , $U^{'} (F_{r})$ is positive and finite where $U^{'}$ is intrinsic. (For countably infinite A, $U^{'}$ is the counting measure, and $U^{'} (F_{r})$ is positive and finite since $F_{r}$ is finite. For uncountable A, $U^{'}$ is the Lebesgue or radon-nikodym derivative on some uniform measures [6] where either of the measures on $F_{r}$ are positive or finite.)

then ${\{F_{r}\}}_{r \in N}$ is a pre-structure of A, since for every $r \in N$ the sequence does not equal A, but “approaches" A as r increases.

Definition 3

(Expected Value of $F_{r}$ ). If

{\{F_{r}\}}_{r \in N}

is a pre-structure of A (def. 2), then for

r \in N

, if

E_{U^{'}} [f (X_{r})] = \frac{1}{U^{'} (F_{r})} \int_{F_{r}} f d x

(2)

we then have that the expected value of the pre-structure could be described as

E_{U^{'}} [f (X_{r})] \to E_{U^{'}}^{★} [f]

(def. 1) where:

\begin{matrix} \forall (ϵ > 0) \exists (N \in N) \forall (r \in N) (r \geq N \Rightarrow | E_{U^{'}} [f (X_{r})] - E_{U^{'}}^{★} [f] | < ϵ) \Rightarrow \\ \forall (ϵ > 0) \exists (N \in N) \forall (r \in N) (r \geq N \Rightarrow | \frac{1}{U^{'} (F_{r})} \int_{F_{r}} f d x - E_{U^{'}}^{★} [f] | < ϵ) \end{matrix}

(3)

Definition 4

(Uniform $ε$ coverings of $F_{r}$ ). We define the uniform ε coverings of

F_{r}

as a group of pair-wise disjoint sets that covers

F_{r}

for every

r \in N

, such the measure

U^{'}

of each of the sets have the same value of

ε \in range (U^{'})

where

ε > inf (range (U^{'}))

and the total sum for

U^{'}

of the coverings is minimized. As a shortcut, if

The element $t \in N$
The set $T \supset N$ is arbitrary and uncountable.

and set Ω is defined as:

Ω = \{\begin{matrix} \{1, \cdot \cdot \cdot, t\} & if there are t ways of writing uniform ε coverings of F_{r} \\ N & if there are countably infinite ways of writing uniform ε coverings of F_{r} \\ T & if there are uncountable ways of writing uniform ε coverings of F_{r} \end{matrix}

(4)

then for every

ω \in Ω

, the set of uniform ε coverings is defined using the notation

U (ϵ, F_{r}, ω)

where ω “enumerates" all possible uniform ε coverings of

F_{r}

for every

r \in N

.

Definition 5

(Sample of the uniform $ε$ coverings of $F_{r}$ ). The set of points, such for every

ε \in range (U^{'})

and

r \in N

, we take a point from each of pair-wise disjoint set in the uniform ε coverings of

F_{r}

(def. 4). As a shortcut, if

The element $k \in N$
The set $K \supset N$ is arbitrary and uncountable.

and set

Ψ_{ω}

is defined as:

Ψ_{ω} = \{\begin{matrix} \{1, \cdot \cdot \cdot, k\} & if there are k ways of writing the sample of uniform ε coverings of F_{r} \\ N & if there are countably infinite ways of writing the sample of uniform ε coverings of F_{r} \\ K & if there are uncountable ways of writing the sample of uniform ε coverings of F_{r} \end{matrix}

(5)

then for every

ψ \in Ψ_{ω}

, the set of all samples of the set of uniform ε coverings is defined using the notation

S (U (ϵ, F_{r}, ω), ψ)

, where

Ψ_{ω}

“enumerates" all possible samples of

U (ϵ, F_{r}, ω)

.

Definition 6

(Entropy on the sample of uniform coverings of $F_{r}$ ). Since there are finitely many points in the sample of the uniform ε coverings of

F_{r}

(def. 5), we:

(1): Arrange the x-value of the points in the sample of uniform ε coverings from least to greatest. This is defined as:

$Ord (S (U (ϵ, F_{r}, ω), ψ))$
(2): Take the multi-set of the absolute differences between all consecutive pairs of elements in (1). This is defined as:

$\nabla Ord (S (U (ϵ, F_{r}, ω), ψ))$
(3): Normalize (2) into a probability distribution, where for multi-set X, we have $| X |$ as the cardinality of all elements in the multi-set, including repeated ones. This is defined as:

$P (S (U (ϵ, F_{r}, ω), ψ)) = \{y / |\nabla Ord (S (U (ϵ, F_{r}, ω), ψ))| : y \in \nabla Ord (S (U (ϵ, F_{r}, ω), ψ))\}$
(4): Take the entropy of (3), (for further reading, see [7], p.61-95). This is defined as:

$E (S (U (ϵ, F_{r}, ω), ψ)) = - \sum_{x \in P (S (U (ϵ, F_{r}, ω), ψ))} x {log}_{2} x$

where (4) is the entropy on the sample of uniform coverings of

F_{r}

.

Definition 7

(Sequence of sets converging uniformly to A). For every

r \in N

(using def. 4, 5, and 6) if set A is finite:

lim_{ε \to inf (range (U^{'}))} sup_{r \in N} sup_{ω \in Ω} sup_{ψ \in Ψ_{ω}} E (S (U (ϵ, F_{r}, ω), ψ)) \geq E (F_{r})

and if set A is non-finite:

lim_{ε \to inf (range (U^{'}))} sup_{r \in N} sup_{ω \in Ω} sup_{ψ \in Ψ_{ω}} E (S (U (ϵ, F_{r}, ω), ψ)) = + \infty

we then say pre-structure

{\{F_{r}\}}_{r \in N}

converges uniformly to A (or in shorter notation):

F_{r} \overset{r \in N}{⇉} A

(6)

(Note we wish to define a uniform convergence of a sequence of sets to A since the definition is analogous to a uniform measure.)

Definition 8

(Equivalent Sequences of Sets). The pre-structures

{\{F_{r}\}}_{r \in N}

and

{F_{j}^{'}}_{j \in N}

of A are equivalent if, from def. 3, (

E_{U^{'}} [f (X_{r})] \to E_{U^{'}}^{★} [f]

and

E_{U^{'}} [f (X_{j}^{'})] \to E_{U^{'}}^{★ ★} [f]

):

\forall (f \in R^{A}) (E_{U^{'}}^{★} [f] = E_{U^{'}}^{★ ★} [f])

Definition 9

(Non-Equivalent Sequences of Sets). The pre-structures

{\{F_{r}\}}_{r \in N}

and

{F_{j}^{'}}_{j \in N}

of A are non-equivalent if, from def. 3, (

E_{U^{'}} [f (X_{r})] \to E_{U^{'}}^{★} [f]

and

E_{U^{'}} [f (X_{j}^{'})] \to E_{U^{'}}^{★ ★} [f]

)

\exists (f \in R^{A}) (E_{U^{'}}^{★} [f] \neq E_{U^{'}}^{★ ★} [f])

Definition 10

(Sequence converging Sublinearly, Linearly, or Superlinearly to A compared to that of another Sequence). Suppose pre-structures

{\{F_{r}\}}_{r \in N}

and

{F_{j}^{'}}_{j \in N}

are non-equivalent and converge uniformly to A. Also, suppose for every

ε \in range (U^{'})

where

ε > inf (range (U^{'}))

and

r \in N

:

(a): We take the cardinality of the sample of the uniform ε coverings of $F_{r}$ (def. 5) divided by the smallest cardinality of the sample of the uniform ε coverings of $F_{j}$ (def. 5), where the entropy on the sample of uniform coverings on $F_{j}$ (def. 6) is larger than the entropy on the sample of uniform coverings on $F_{r}$ (def. 6). In other words, if:

$\begin{matrix} \bar{S (U (ϵ, F_{r}, ω), ψ)} = \\ inf \{| S (U (ϵ, F_{j}^{'}, ω^{'}), ψ^{'}) | : j \in N, ω^{'} \in Ω, ψ^{'} \in Ψ_{ω}, E (S (U (ϵ, F_{j}^{'}, ω^{'}), ψ^{'})) \geq E (S (U (ϵ, F_{r}, ω), ψ))\} \end{matrix}$

(7)

then the ratio at the beginning of the paragraph is defined (using 7) as

$\bar{α} (ϵ, r, ω, ψ) = |S (U (ϵ, F_{r}, ω), ψ))| / \bar{|S (U (ϵ, F_{r}, ω), ψ)|}$

(8)
(b): We take the cardinality of the sample of uniform ε covering of $F_{r}$ (def. 5) divided by the largest cardinality of the sample of the uniform ε covering of $F_{j}$ (def. 5), where the entropy on the sample of uniform coverings on $F_{j}$ (def. 6) is smaller then the entropy on the sample of uniform coverings on $F_{r}$ (def. 6). In other words if:

$\begin{matrix} \underset{̲}{S (U (ϵ, F_{r}, ω), ψ)} = \\ sup \{| S (U (ϵ, F_{j}^{'}, ω^{'}), ψ^{'}) | : j \in N, ω^{'} \in Ω, ψ^{'} \in Ψ_{ω}, E (S (U (ϵ, F_{j}^{'}, ω^{'}), ψ^{'})) \leq E (S (U (ϵ, F_{r}, ω), ψ))\} \end{matrix}$

(9)

then the ratio at the beginning of the paragraph is defined (using 9) as

$\underset{̲}{α} (ϵ, r, ω, ψ) = |S (U (ϵ, F_{r}, ω), ψ))| / \underset{̲}{|S (U (ϵ, F_{r}, ω), ψ)|}$

(10)

(1): If using equations 8 and 10 we have that:

$lim_{ε \to inf (range (U^{'}))} sup_{r \in N} sup_{ω \in Ω} sup_{ψ \in Ψ_{ω}} \bar{α} (ϵ, r, ω, ψ) = lim_{ε \to inf (range (U^{'}))} sup_{r \in N} sup_{ω \in Ω} sup_{ψ \in Ψ_{ω}} \underset{̲}{α} (ϵ, r, ω, ψ) = 0$

we say ${\{F_{r}\}}_{r \in N}$ converges uniformly to A at a superlinear rate to that of ${F_{j}^{'}}_{j \in N}$ .
(2): If using equations 8 and 10 we have that:

$0 < lim_{ε \to inf (range (U^{'}))} sup_{r \in N} sup_{ω \in Ω} sup_{ψ \in Ψ_{ω}} \bar{α} (ϵ, r, ω, ψ) = lim_{ε \to inf (range (U^{'}))} sup_{r \in N} sup_{ω \in Ω} sup_{ψ \in Ψ_{ω}} \underset{̲}{α} (ϵ, r, ω, ψ) < + \infty$

we say ${\{F_{r}\}}_{r \in N}$ converges uniformly to A at a linear rate to that of ${F_{j}^{'}}_{j \in N}$ .
(3): If using equations 8 and 10 we have that:

$lim_{ε \to inf (range (U^{'}))} sup_{r \in N} sup_{ω \in Ω} sup_{ψ \in Ψ_{ω}} \bar{α} (ϵ, r, ω, ψ) = lim_{ε \to inf (range (U^{'}))} sup_{r \in N} sup_{ω \in Ω} sup_{ψ \in Ψ_{ω}} \underset{̲}{α} (ϵ, r, ω, ψ) = + \infty$

we say ${\{F_{r}\}}_{r \in N}$ converges uniformly to A at a sublinear rate to that of ${F_{j}^{'}}_{j \in N}$ .

I assume

\bar{α}

and

\underset{̲}{α}

are always equal but I’m not sure how to prove this.

2.3. Question on Preliminary Definitions

(1): Are there “simpler" alternatives to either of the preliminary definitions? (Keep this in mind as we continue reading forward).

3. Main Question

Does there exist a unique extension (or a method that constructively defines a unique extension) of the expected value of f when the value’s finite (def. 1), using the radon-nikodym derivative [5], p.419-427 of the uniform probability measure [1], p.32-37 on sets measurable in the Carathèodory sense, such we replace f with infinite or undefined expected values with f defined on a pre-structure where:

(1): The expected value of f on each term of the pre-structure is finite
(2): The pre-structure converges uniformly to A (def. 7)
(3): The pre-structure converges uniformly to A at a linear or superlinear rate to that of other non-equivalent pre-structures of A (def. 9 & 10) which satisfy (1) and (2).
(4): The generalized expected value [1] of f on a pre-structure satisfying (1), (2), and (3) (while the pre-structure converges uniformly to A) is finite (i.e. def. 3)
(5): A choice function is defined which chooses a pre-structure satisfying (1), (2), (3), and (4) such the generalized expected value [1] is unique and finite for the largest subset of $R^{A}$ .
(6): Out of all the choice functions which satisfy (1), (2), (3), (4) and (5), we choose the choice function with the “simplest form", (meaning for a general pre-structure of A, when each choice function is fully expanded, we take the choice function with the fewest variables/numbers (excluding those with quantifiers) for which the variables are added and exponentiated by infinitesimal amounts and multiplied by the difference of one and infinitesimal amount.

How do we answer this question?

4. Informal Attempt to Answer Main Question

4.1. Choice Function

Suppose

S^{'} (A)

is the set of all pre-structures of A which satisfy criteria (1) and (2) of the question in sec. 3 where the expected value of the pre-structures, as they converge uniformly to A is finite (def. 7), where the pre-structure

{\{F_{r}^{″}\}}_{r \in N} \in S^{'} (A)

should be a sequence of sets that satisfy criteria (1), (2), (3), (4), and (5) of the question such that

{F_{j}^{'}}_{j \in N} \in S^{″} (A)

represents all other non-equivalent pre-structures of A from

{F_{r}^{″}}_{r \in N}

. (Note I’m unable to find a choice function which satisfies criteria (6)).

Note from (a) and (b) (in def. 10), we represent the “smallest cardinality of the sample of the uniform

ε

covering of

F_{j}

(def. 5), where the entropy on the sample of uniform coverings on

F_{j}

(def. 6) is larger the entropy on the sample of uniform coverings on

F_{r}

(def. 6)" to be defined as follows:

\begin{matrix} \bar{S (U (ϵ, F_{r}^{″}, ω), ψ)} = \\ inf \{| S (U (ϵ, F_{j}^{'}, ω^{'}), ψ^{'}) | : j \in N, ω^{'} \in Ω, ψ^{'} \in Ψ_{ω}, E (S (U (ϵ, F_{j}^{'}, ω^{'}), ψ^{'})) \geq E (S (U (ϵ, F_{r}, ω), ψ))\} \end{matrix}

(11)

and we represent the “largest cardinality of the sample of the uniform

ε

covering of

F_{j}

(def. 5), where the entropy on the sample of uniform coverings on

F_{j}

(def. 6) is smaller then the modified entropy on the sample of uniform coverings on

F_{r}

(def. 6)" to be defined as follows:

\begin{matrix} \underset{̲}{S (U (ϵ, F_{r}^{″}, ω), ψ)} = \\ sup \{| S (U (ϵ, F_{j}^{'}, ω^{'}), ψ^{'}) | : j \in N, ω^{'} \in Ω, ψ^{'} \in Ψ_{ω}, E (S (U (ϵ, F_{j}^{'}, ω^{'}), ψ^{'})) \geq E (S (U (ϵ, F_{r}, ω), ψ))\} \end{matrix}

(12)

Therefore, using def. 5 and equations 11 and 12, if we take:

sup_{ω \in Ω} sup_{ψ \in Ψ_{ω}} \bar{S (U (ϵ, F_{r}^{″}, ω), ψ)} = \bar{S^{'}} (ε, F_{r}^{″}) = \bar{S^{'}}

(13)

sup_{ω \in Ω} sup_{ψ \in Ψ_{ω}} \underset{̲}{S (U (ϵ, F_{r}^{″}, ω), ψ)} = \underset{̲}{S^{'}} (ε, F_{r}^{″}) = \underset{̲}{S^{'}}

(14)

sup_{ω \in Ω} sup_{ψ \in Ψ_{ω}} S (U (ϵ, F_{r}^{″}, ω), ψ) = S^{'} (ε, F_{r}^{″}) = S^{'}

(15)

and we take:

\begin{matrix} S (r) = & (sup (F_{r + 1}^{″}) - sup (F_{r}^{″})) (inf (F_{r}^{″}) - inf (F_{r + 1}^{″})) | (inf (F_{r}^{″}) - inf (F_{r + 1}^{″})) (sup (F_{r + 1}^{″}) - sup (F_{r}^{″}) - 1) | \end{matrix}

(16)

such that

\begin{matrix} T (r) = & (sup (F_{r + 1}^{″}) inf (F_{r}^{″}) - sup (F_{r}^{″}) inf (F_{r + 1}^{″})) ((inf (F_{r}^{″}) - inf (F_{r + 1}^{″})) - (sup (F_{r + 1}^{″}) - sup (F_{r}^{″})) - 1) \\ (inf (F_{r}^{″}) - inf (F_{r + 1}^{″})) (sup (F_{r + 1}^{″}) - sup (F_{r}^{″})) \end{matrix}

(17)

then, using equations 13, 14, 15, 16 and 17, for the nearest integer function

[\times]

of equations that replace × (and the absolute value function

| | \times | |

of equations that replace ×) we want:

\begin{matrix} K (ε, F_{r}^{″}) = | | 1 - S (r) | | (| | \frac{| S^{'} | (1 + [\frac{| S^{'} | (\underset{̲}{| S^{'} |} + 2 | S^{'} |)}{(\underset{̲}{| S^{'} |} + | S^{'} |) (\underset{̲}{| S^{'} |} + | S^{'} | + \bar{| S^{'} |})}]) (1 + [\underset{̲}{| S^{'} |} / | S^{'} |])}{(1 + [| S^{'} | / \bar{| S^{'} |}]) (1 + [\underset{̲}{| S^{'} |} / \bar{| S^{'} |}])} - | S^{'} | | | + | S^{'} |) - T (r) \end{matrix}

(18)

such, using equations 15 and 18, if set

S^{″} (A) \subseteq S^{'} (A)

, the choice function is the following set (when the set contains only one element):

\begin{matrix} C (A) = \\ sup { & S^{″} (A) : S^{″} (A) \subseteq S^{'} (A), \exists (M \in N) \forall (ε \in range (U^{'})) \exists (j \in N) \forall (r \in N) \forall (\{F_{r}^{″}\} \in S^{″} (A)) \\ (inf (range (U^{'})) \leq ε \leq M, r \geq j \Rightarrow K (ε, F_{r}^{″}) \leq S^{'} (ε, F_{r}^{″}))} \end{matrix}

(19)

4.2. Questions on Choice Function

Suppose for

k \in N

,

C^{k} (A)

represents the k-th iteration of the choice function of A, e.g.

C^{3} (A) = C (C (C (A)))

, where the infinite iteration of

C (A)

(if it exists) is

lim_{k \to \infty} C^{k} (A) = C^{\infty} (A)

. Therefore, when taking the following:

C^{'} (A) = \{\begin{matrix} C (A) & if C (A) contains one element \\ C^{j} (A) & if j \in N, such for all k \geq j, C^{k} (A) contains one element \\ C^{\infty} (A) & if it exists, and C^{\infty} (A) contains one element \end{matrix}

(1)

What unique pre-structure would

C^{'} (A)

contain (if it exists) for:

(a): $Z$ where if ${\{F_{r}\}}_{r \in N} \in C^{'} (Z)$ , we want ${\{F_{r}\}}_{r \in N} = {\{\{m \in Z : - r \leq m \leq r\}\}}_{r \in N}$
(b): $Q$ where if ${\{F_{r}\}}_{r \in N} \in C^{'} (Q)$ , we want ${\{F_{r}\}}_{r \in N} = {\{\{s / t! : s \in Z, t \in N, - r t! \leq s \leq r t!\}\}}_{r \in N}$
(c): $R$ where we’re not sure what ${\{F_{r}\}}_{r \in N} \in C^{'} (R)$ would be in this case. What would ${\{F_{r}\}}_{r \in N}$ be if it’s unique?

(2)

There are a total of 113 variables in the choice function C (excluding quantifiers). Is there a choice function with fewer variables that answers criteria (1), (2), (3), (4) & (5) of the question in sec. 3 for a "larger" subset of

R^{A}

? (This might be impossible to answer since such a solution cannot be shown with prevalence or shyness [2,3]). Therefore, we need a more precise version of “size" with some examples (as shown in this answer [8]) being the following:

(a): Fractal Dimension notions
(b): Kolmogorov Entropy
(c): Baire Category and Porosity

4.3. Generalized Expected Values

Using the choice function in section 4.1, if the image of f under A is

f [A] : = \{f (x) : x \in A\}

, we take

\{F_{r}^{″}\} \in C (f [A])

and then take the pre-image under f of

F_{r}

(which is defined as

f^{- 1} [F_{r}^{″}]

) where:

f^{- 1} [F_{r}^{″}] \overset{r \in N}{⇉} A

However, the expected value of

f^{- 1} [F_{r}^{″}]

(def. 3) may be infinite (e.g. unbounded f). Hence, for every

r \in N

, we take

{\{{\{F_{t_{r}}^{‴}\}}_{t_{r} \in N}\}}_{r \in N} \in C (f^{- 1} [F_{r}^{″}])

where:

\forall (r \in N) (F_{t_{r}}^{‴} \overset{t_{r} \in N}{⇉} F_{r}^{″})

Thus, if there exists a unique and finite

\ddot{E} [f]

where:

\begin{matrix} \forall (ϵ > 0) \exists (N \in N) \forall (r \in N) \forall (t_{r} \in N) \forall ({\{{\{F_{t_{r}}^{‴}\}}_{t_{r} \in N}\}}_{r \in N} \in C (f^{- 1} [F_{r}^{″}])) \\ (r \geq N, t_{r} \geq N \Rightarrow \frac{1}{U^{'} (F_{t_{r}}^{‴})} \int_{F_{t_{r}}^{‴}} f d x - \ddot{E} [f] < ϵ) \end{matrix}

(20)

Then

\ddot{E} [f]

is the generalized expected value w.r.t choice function C, which answers criteria (1), (2), (3), (4), (perhaps (5)) of the question in sec. 3; however, there is still a chance that the equation 20 fails to give an unique

\ddot{E} [f]

. Hence; if

k \in N

, we take the k-th iteration of the choice function C in 19, such there exists a

j \in N

, where for all

k \geq j

, the new expected value

E^{†} [f]

(or the generalized expected value w.r.t finitely iterated C) is unique and finite.

Hence, if the k-th iteration of C is represent as

C^{[k]}

(where e.g.

C^{3} (f [A]) = C (C (C (f [A])))

), we want a unique

E^{†} [f]

where:

\begin{matrix} \forall (ϵ > 0) \exists (N \in N) \forall (r \in N) \forall (t_{r} \in N) \exists (j \in N) \forall (k \in N) (k \geq j \Rightarrow \\ \forall ({\{{\{F_{t_{r}}^{‴}\}}_{t_{r} \in N}\}}_{r \in N} \in C^{[j]} (f^{- 1} [F_{r}^{″}])) (r \geq N, t_{r} \geq N \Rightarrow \frac{1}{U^{'} (F_{t_{r}}^{‴})} \int_{F_{t_{r}}^{‴}} f d x - E^{†} [f] < ϵ)) \end{matrix}

(21)

If this still does not give a unique and finite expected value, we then take the most generalized expected value w.r.t an infinitely iterated C i.e.

E^{‡} [f]

where if the infinite iteration of C is stated as

lim_{k \to \infty} C^{[k]} (f [A]) = C^{\infty} (f [A])

, we then take:

\begin{matrix} \forall (ϵ > 0) \exists (N \in N) \forall (r \in N) \forall (t_{r} \in N) \\ \forall ({\{{\{F_{t_{r}}^{‴}\}}_{t_{r} \in N}\}}_{r \in N} \in C^{\infty} (f^{- 1} [F_{r}^{″}])) (r \geq N, t_{r} \geq N \Rightarrow \frac{1}{U^{'} (F_{t_{r}}^{‴})} \int_{F_{t_{r}}^{‴}} f d x - E^{‡} [f] < ϵ) \end{matrix}

(22)

However, the averages

\ddot{E} [f]

,

E^{†} [f]

, and

E^{‡} [f]

in equations 20, 21, and 22 (respectively) should only be attempted for functions where the expected value is infinite or undefined or for worst-case functions—poorly behaved

f : A \to R

(where for

n \in N

,

A \subseteq R^{n}

, where f is a function) defined on infinite points covering an infinite expanse of space. For example:

(1): For a worst-case f defined on countably infinite A (e.g. countably infinite "pseudo-random points" non-uniformly scattered across the real plane), one might typically use $\ddot{E} [f]$ (20) (since countable sets may need just one iteration of C for the generalized expected value to be unique); otherwise, one may use $E^{†} [f]$ (21) for finite iterations of C.
(2): For a worst-case f defined on uncountable A, we may have to use $E^{‡} [f]$ (22) as the function is so difficult to analyze. We can imagine this function as an uncountable number of "pseudo-random" points non-uniformly generated on a subset of the real plane (see 5.1 for a visualization.)

Note that no matter how generalized and “meaningful" the extension of an expected value is, there will always be an f where the expected value does not exist.

4.4. Questions Regarding The Answer

(1)

Using prevalence and shyness [2,3], can we say the set of f where

\ddot{E} [f]

(20),

E^{†} [f]

(21), or

E^{‡} [f]

(22) are unique and finite form either a prevalent or neither prevalent nor shy subset of

R^{A}

. (If the subset is prevalent, this implies the generalized expected values are unique and finite for a “large" subset of

R^{A}

; however, if the subset is neither prevalent nor shy we need a more precise definition of “size" which takes “an exact probability that the expected values are unique & finite"—some examples (as shown in this answer [8]) being:

(a): Fractal Dimension notions
(b): Kolmogorov Entropy
(c): Baire Category and Porosity

(2)

Can either

\ddot{E} [f]

(20),

E^{†} [f]

(21), or

E^{‡} [f]

(22) (when A is the set of all Liouville numbers &

f = {id}_{A}

) give a finite value on the Liouville numbers? What would the value be?

(3)

Similar to how def. 11 approximates the expected value in def. 1, how do approximate

\ddot{E} [f]

(20),

E^{†} [f]

(21), and

E^{‡} [f]

(22)?

(4)

Can we use programming to estimate

\ddot{E} [f]

(20),

E^{†} [f]

(21), and

E^{‡} [f]

(22) (if a unique and finite results exists)?

4.5. Application

(1)

In Quanta magazine [9], Wood writes on Feynman Path Integrals: “No known mathematical procedure can meaningfully average[2] an infinite number of objects covering an infinite expanse of space in general. The path integral is more of a physics philosophy than an exact mathematical recipe."—despite Wood’s statement, mathematicians Bottazzi E. and Eskew M. [10] found a constructive solution to the statement using integrals defined on filters over families of finite sets; however, the solution was not unique as one has to choose a value in a partially ordered ring of infinite and infinitesimal elements. In addition, although there were ways of preventing the use of the axiom of choice (within their integral), the axiom was still required for certain cases.

(a): Perhaps, if Botazzi’s and Eskew’s Filter integral [10] is not enough to solve Wood’s statement, could we replace the path integral with expected values $\ddot{E} [f]$ (20), $E^{†} [f]$ (21), and $E^{‡} [f]$ (22)? (See, again, sec. 5.1 for a visualization.)

(2)

As stated in sec. 2.1, “when the uniform measure of A, measurable in the Caratheodory sense, has zero or infinite volume (or undefined measure), there may be multiple, conflicting ways of defining a "natural" uniform measure on A." This is an example of Bertand’s Paradox which shows, "the principle of indifference (that allows equal probability among all possible outcomes when no other information is given) may not produce definite, well-defined results for probabilities if applied uncritically when the domain of possibilities (i.e. sample space) is infinite [11].

Using sec. 4.2, perhaps if we take:

C^{'} (A) = \{\begin{matrix} C (A) & if C (A) contains one element \\ C^{j} (A) & if j \in N, such for all k \geq j, C^{k} (A) contains one element \\ C^{\infty} (A) & if it exists, and C^{\infty} (A) contains one element \end{matrix}

then for

{\{F_{r}\}}_{r \in N} \in C^{'} (A)

, if

S \subseteq A

if we have the following:

\forall (ϵ > 0) \exists (N \in N) \forall (r \in N) (r \geq N \Rightarrow \frac{U^{'} (S \cap F_{r})}{U^{'} (F_{r})} - U (S) < ϵ)

(23)

Then

U (S)

might serve as a solution to Bertand’s Paradox (unless there is a simpler solution to the main question in sec. 3).

(a)

How do we apply

U (S)

(or a simpler solution) to the usual example which demonstrates the Bertand’s Paradox as follows: for an equilateral triangle (inscribed in a circle), suppose a chord of the circle is chosen at random—what is the probability that the chord is longer than a side of the triangle? [12] (According to Bertand’s Paradox there are three arguments which correctly use the principle of indifference yet give different solutions to this problem [12]:

(i): The “random endpoints" method: Choose two random points on the circumference of the circle and draw the chord joining them. To calculate the probability in question imagine the triangle rotated so its vertex coincides with one of the chord endpoints. Observe that if the other chord endpoint lies on the arc between the endpoints of the triangle side opposite the first point, the chord is longer than a side of the triangle. The length of the arc is one-third of the circumference of the circle, therefore the probability that a random chord is longer than a side of the inscribed triangle is $1 / 3$ .
(ii): The "random radial point" method: Choose a radius of the circle, choose a point on the radius, and construct the chord through this point and perpendicular to the radius. To calculate the probability in question imagine the triangle rotated so a side is perpendicular to the radius. The chord is longer than a side of the triangle if the chosen point is nearer the center of the circle than the point where the side of the triangle intersects the radius. The side of the triangle bisects the radius, therefore the probability a random chord is longer than a side of the inscribed triangle is $1 / 2$ .
(iii): The "random midpoint" method: Choose a point anywhere within the circle and construct a chord with the chosen point as its midpoint. The chord is longer than a side of the inscribed triangle if the chosen point falls within a concentric circle of radius $1 / 2$ the radius of the larger circle. The area of the smaller circle is one-fourth the area of the larger circle, therefore the probability a random chord is longer than a side of the inscribed triangle is $1 / 4$ .

5. Glossary

5.1. Example of Case (2) of Worst Case Functions

(If the explanation below is difficult to understand, see this visualization to accompany the explanation [13], then when changing the sliders each time, wait a couple of seconds for the graph to load.)

We wish to create a function that appears to be a “pseudo-randomly" distributed but has infinite points that are non-uniform (i.e. does not have complete spatial randomness [14]) in the sub-space of

R^{2}

, where the expected value or integral of the function w.r.t uniform probability measure [1] [p.32-37] is non-obvious (i.e. not the center of the space the function covers nor the area of that space).

Suppose for real numbers

x_{1}, x_{2}, y_{1}

and

y_{2}

, we generate an uncountable number of "nearly pseudo-random" points that are non-uniform in the subspace

[x_{1}, x_{2}] \times [y_{1}, y_{2}] \subseteq R^{2}

.

We define the function as

f : [x_{1}, x_{2}] \to [y_{1}, y_{2}]

.

Now suppose

b \in \{2, 3, \cdot \cdot \cdot, 10\}

where the base-b expansion of real numbers, in interval

[x_{1}, x_{2}]

, have infinite decimals that approach x from the right side so when

x_{1} = x_{2}

we get

f (x_{1}) = f (x_{2})

.

Furthermore, for

N \cup \{0\} = N_{0}

, if

r \in N_{0}

and

{digit}_{b} : R \times Z \to \{0, 1, \cdot \cdot \cdot, b - 1\}

is a function where

{digit}_{b} (x, r)

takes the digit in the

b^{r}

-th decimal fraction of the base-b expansion of x (e.g.

{digit}_{10} (1.789, 2) = 8

), then

{\{{g_{r}}^{'}\}}_{r \in N_{0}}

is a sequence of functions such that

{g_{r}}^{'} : N_{0} \to N_{0}

is defined to be:

g_{r}^{'} (x) = [\frac{10}{b} sin (r x) + \frac{10}{b}]

(24)

then for some large

k \in N

and

x_{1}, x_{2} \in R

, the intermediate function (before f) or

f_{1} : [x_{1}, x_{2}] \to R

is defined to be

\begin{matrix} f_{1} (x) = & |(\sum_{r = 0}^{\infty} g_{r + 1}^{'} (\sum_{p = r}^{r + k} {digit}_{b} (x, p)) / b^{r}) - 10| = \\ |((\sum_{r = 0}^{\infty} [\frac{10}{b} sin ((r + 1) (\sum_{p = r}^{r + k} {digit}_{b} (x, p))) + \frac{10}{b}]) / b^{r}) - 10| \end{matrix}

(25)

where the points in

f_{1}

are "almost pseudo-randomly" and non-uniformly distributed on

[x_{1}, x_{2}] \times [0, 10]

. What we did was convert every digit of the base-b expansion of x to a pseudo-random number that is non-equally likely to be an integer, including and also in-between, 0 and

(10 \cdot 10^{s}) / b

. Furthermore, we make the function truly “appear pseudo-random", by adding the

b^{r}

-th decimal fraction with the next k decimal fractions; however, we also want to control the end-points of

[0, 10^{s + 1}]

such if

y_{1}, y_{2} \in R

, we convert

[x_{1}, x_{2}] \times [0, 10]

to

[x_{1}, x_{2}] \times [y_{1}, y_{2}]

by manipulating equation 25 to get:

\begin{matrix} f (x) = & y_{2} - \frac{y_{2} - y_{1}}{10} f_{1} (x) \\ y_{2} - (\frac{y_{2} - y_{1}}{10}) |((\sum_{r = 0}^{\infty} [\frac{10}{b} sin ((r + 1) (\sum_{p = r}^{r + k} {digit}_{b} (x, p))) + \frac{10}{b}]) / b^{r}) - 10| \end{matrix}

(26)

such that the larger k is, the more pseudo-random the distribution of points in f in the space

[x_{1}, x_{2}] \times [y_{1}, y_{2}]

but unlike most distributions of these points, f is uncountable.

5.2. Question Regarding Case (2) of The Worst Case Function

Let us be more specific, suppose for the function in equation 25 of sec. 5.1, we have:

$b = 3$
$[x_{1}, x_{2}] \times [y_{1}, y_{2}] = [0, 1] \times [0, 1]$
$k = 100$

(one could try simpler parameters); what is the expected value using either

E^{†} [f]

(21), or

E^{‡} [f]

(22) if the answer is finite and unique?

What about for f in general (i.e. in terms of b,

x_{1}

,

x_{2}

,

y_{1}

,

y_{2}

and k)?

(Note if

x_{1}, y_{1} \to - \infty

and

x_{2}, y_{2} \to \infty

, then the function is an explicit example of the function that Wood [3] describes in Quanta Magazine)

5.3. Approximating the Expected Value

Definition 11

(Approximating the Expected Value). In practice, the computation of this expected value may be complicated if the set A is complicated. If analytic integration does not give a closed-form solution then a general and relatively simple way to compute the expected value (up to high accuracy) is with importance sampling. To do this, we produce values

X_{1}, X_{2}, . . ., X_{M} \sim IID g

for some density function g with support

A \subseteq support (g) \subseteq R^{n}

(hopefully with support fairly close to A) and we use the estimator:

\begin{matrix} {\hat{μ}}_{M} & \equiv \frac{\sum_{i = 1}^{M} I (X_{i} \in A) \cdot f (X_{i}) / g (X_{i})}{\sum_{i = 1}^{M} I (X_{i} \in A) / g (X_{i})} \end{matrix}

(27)

From the law of large numbers, we can establish that

E [f (X)] = {lim}_{M \to \infty} {\hat{μ}}_{M}

so if we take M to be large then we should get a reasonably good computation of the expected value of interest.

Note importance sampling requires three things:

(1): We need to know when point x is in set A or not
(2): We need to be able to generate points from a density g that is on a support that covers A but is not too much bigger than A
(3): We have to be able to compute $f (x)$ and $g (x)$ for each point $x \in A$

References

T., L.; E., R. The maximum entropy of a metric space. https://arxiv.org/pdf/1908.11184.pdf.
Ott, W.; Yorke, J.A. Prevelance. Bulletin of the American Mathematical Society 2005, 42, 263–290, https://www.ams.org/journals/bull/2005-42-03/S0273-0979-05-01060-8/S0273-0979-05-01060-8.pdf. [Google Scholar] [CrossRef]
Hunt, B.R. Prevalence: a translation-invariant “almost every” on infinite-dimensional spaces 1992. https://arxiv.org/abs/math/9210220. [CrossRef]
(https://stats.stackexchange.com/users/173082/ben), B. In statistics how does one find the mean of a function w.r.t the uniform probability measure? Cross Validated, [https://stats.stackexchange.com/q/602939]. https://stats.stackexchange.com/q/602939 (version: 2023-01-24).
B., P. 3 ed.; John Wiley & Sons: New York, 1995; pp. 419–427. https://www.colorado.edu/amath/sites/default/files/attached-files/billingsley.pdf.
(https://mathoverflow.net/users/46214/mark mcclure), M.M. Integral over the Cantor set Hausdorff dimension. MathOverflow, [https://mathoverflow.net/q/235609]. https://mathoverflow.net/q/235609 (version: 2016-04-07).
M., G. 2 ed.; Springer New York: New York [America];, 2011; pp. 61–95. https://ee.stanford.edu/~gray/it.pdf. [CrossRef]
(https://math.stackexchange.com/users/13130/dave-l renfro), D.L.R. Proof that neither “almost none” nor “almost all” functions which are Lebesgue measurable are non-integrable. Mathematics Stack Exchange, [https://math.stackexchange.com/q/4623168]. https://math.stackexchange.com/q/4623168 (version: 2023-01-21).
C., W. Mathematicians Prove 2D Version of Quantum Gravity Really Works. Quanta Magazine. https://www.quantamagazine.org/mathematicians-prove-2d-version-of-quantum-gravity-really-works-20210617.
E., B.; M., E. Integration with Filters. https://arxiv.org/pdf/2004.09103.pdf.
Shackel, N. Bertrand’s Paradox and the Principle of Indifference. Philosophy of Science 2007, 74, 150–175, https://orca.cardiff.ac.uk/id/eprint/3803/1/Shackel%20Bertrand’s%20paradox%205.pdf. [Google Scholar] [CrossRef]
Drory, A. Failure and Uses of Jaynes’ Principle of Transformation Groups. Foundations of Physics 2015, 45, 439–460, https://arxiv.org/pdf/1503.09072.pdf. [Google Scholar] [CrossRef]
B., K. Visualization of Uncountable Number of Psuedo-random Points Generate on Subset of the Real Plane, 2023. https://www.wolframcloud.com/obj/4e78f594-1578-402a-a163-ebb16319ada2.
Maimon O., R.L. 2 ed.; Springer New York: New York [America];, 2010; pp. 851–852. [CrossRef]

[1]	The result of algebraic manipulation on the expected value in def. 3 that is unique and finite for the largest subset of $R^{A}$ .
[2]	Meaningful Average—The measure inside the average is canonical when the measure is normalized as a uniform probability measure [1], p. 32-37
[3]	Wood wrote on Feynman Path Integrals: “No known mathematical procedure can meaningfully average [2] an infinite number of objects covering an infinite expanse of space in general. The path integral is more of a physics philosophy than an exact mathematical recipe."

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Defining the Most Generalized, Natural Extension of the Expected Value on Measurable Functions

Abstract

Keywords:

Subject:

1. Background

2. Preliminaries

2.1. Motivation

2.2. Preliminary Definitions

2.3. Question on Preliminary Definitions

3. Main Question

4. Informal Attempt to Answer Main Question

4.1. Choice Function

4.2. Questions on Choice Function

4.3. Generalized Expected Values

4.4. Questions Regarding The Answer

4.5. Application

5. Glossary

5.1. Example of Case (2) of Worst Case Functions

5.2. Question Regarding Case (2) of The Worst Case Function

5.3. Approximating the Expected Value

References

MDPI Initiatives

Important Links

Subscribe