1. Background
I am an undergraduate from Indiana University despite being the age of a grad student. I should have graduated by now, but my obsession with research prevents me from moving forward. There is a chance that I might have a learning disability since writing isn’t very easy for me.
As I’ve been in and out of college, I never got the chance to rigorously learn the subjects I’m researching. Most of what I learned was from Wikipedia, blogs and random research articles. I know little of what I read but learn what I can from asking questions on math stack exchange.
What I truly want, however; is for someone to take my ideas and publish them.
I warn that the definitions may not be rigorous so try to go easy on me. (I recommend using programming such as Mathematica, Python, JavaScript or Matlab to understand
Section 3 and
Section 4).
2. Preliminaries
Suppose
A is a set measurable in the
Carathèodory sense [
7], such for
,
, and function
.
2.1. Motivation
It seems the set of measurable functions with infinite or undefined expected values (Definition 1), using the
uniform measure (pp. 32-37 [
18]), may be a
prevalent subset [
11,
15] of the set of all measurable functions, meaning "almost every" measurable function has infinite or undefined expected values. Furthermore, when the Lebesgue measure of
A, measurable in the Caratheodory sense, has zero or infinite volume (or undefined measure), there may be multiple, conflicting ways of defining a "natural" uniform measure on
A.
Below I will attempt to define a question regarding an extension of the expected value (when it’s undefined or infinite) which allows finite values instead.
Note the reason the question will be so long is there are plenty of “meaningless” extensions of the expected value (e.g. if the expected value is infinite or undefined we can just replace it with zero).
Therefore we must be more specific about what is meant by “meaningful” extension but there are some preliminary definitions we must clarify.
2.2. Preliminary Definitions
Definition 1 (
Expected value w.r.t the Uniform Probability Measure).
From an answer to a question in cross validated (a website for statistical questions) [10], let denote a uniform random variable on set and denote the probability density function from the radon-nikodym derivative (pp. 419-427 [2]) of the uniform probability measure on A measurable in the Carathèodory sense. If denotes the indicator function on : then the radon-nikodym derivative of uniform probability measure must have the form . (Note is not the derivative of U in the sense of calculus but rather the denominator of the probability density function derived from the uniform probability measure defined as U.)
Therefore, using the law of the unconscious statistician, we should get
such the expected value is undefined when A does not have a uniform probability distribution or f is not integrable w.r.t the measure.
Definition 2 (Defining the pre-structure). Since there’s a chance that does not exist or f is not integrable w.r.t to , using Definition 1 we define a sequence of sets where if:
- (a)
- (b)
then we have:
where is a pre-structure of A, since for every the sequence does not equal A, but "converges" to A as r increases (see (a) & (b) of this definition).
Example 1.
Suppose . One pre-structure of is since:
For every , set is finite, meaning each term of the pre-structure has a discrete uniform distribution. Therefore, exists.
For every , is finite; meaning is the counting measure. Furthermore, since and for all , is positive and finite, criteria (3) of Definition 2 is satisfied.
Example 2.
Suppose . Another pre-structure of is
where an union is added, since without the union
Note that:
For every , set is finite, meaning each term of the pre-structure has a discrete uniform distribution. Therefore, exists.
For every ,
is finite; meaning is the counting measure, since (when is the Euler’s totient function (pp. 239-249 [16])) we have , and if correct, is greater than zero and positive for all . Therefore, criteria (3) of Definition 2 is satisfied.
There are plenty of pre-structures of . Infact, there may be countably infinite many of these pre-structures.
Example 3.
We need additional examples, where is not the counting measure. Perhaps one example of (where A is theLiouville numbers[6]) is:
Note we can show
However, we must also show for every , there is a uniform measure on . We assume this uniform measure is the normalized h-Hausdorff measure where h is the(exact) dimension functionof A [14].
If the h-Hausdorff measure is positive and finite for every , then must be the h-Hausdorff measure which, again, is positive and finite. Therefore or Equation (2.2.2) is a pre-structure.
Definition 3 (
Expected value of Pre-Structure).
If is a pre-structure of A (Definition 2), then for , if
we then have that the expected value of the pre-structure could be described as (Definition 1) where:
Example 4.
Suppose where such that:
Using the pre-structure in Example 1 or , we presume (and prove) using Definition 3 is 1.
And using the pre-structure in Example 2 or
we presume (but must prove) , using Definition 3 is
.
This shows different pre-structures give different expected values; therefore, we must choose a unique set of equivelant pre-structures (Definition 8) which gives the same & finite expected value.
Definition 4 (Uniform coverings of each term of the pre-structure). We define the uniform ε coverings of each term of the pre-structure (i.e., ) as a group of pair-wise disjoint sets that cover for every , such the measure of each of the sets that cover have the same value of , where and the total sum of of the coverings is minimized. In shorter notation, if
and set Ω is defined as:
then for every , the set of uniform ε coverings is defined using
where ω “enumerates" all possible uniform ε coverings of for every .
Definition 5 (Sample of the uniform coverings of each term of the pre-structure). The sample of uniform ε coverings of each term of the pre-structure or is the set of points, such for every and , we take a point from each pair-wise disjoint set in the uniform ε coverings of (Definition 4). In shorter notation, if
and set is defined as:
then for every , the set of all samples of the set of uniform ε coverings is defined using , where ψ “enumerates" all possible samples of .
Definition 6 (Entropy on the sample of uniform coverings of each term of the pre-structure). Since there are finitely many points in the sample of the uniform ε coverings of each term of pre-structure (Definition 5), we:
Arrange the x-value of the points in the sample of uniform ε coverings from least to greatest. This is defined as:
Take the multi-set of the absolute differences between all consecutive pairs of elements in (1). This is defined as:
Normalize (2) into a probability distribution, where for multi-set X, we have as the cardinality of all elements in the multi-set, including repeated ones. This is defined as:
Take the entropy of (3), (for further reading, see [12]). This is defined as:
where (4) is the entropy on the sample of uniform coverings of .
Definition 7 (
Pre-Structure Converging Uniformly to A).
For every (using Definitions 4–6)if set A is finite:
and if set A is non-finite:
we say the pre-structure converges uniformly to A (or in shorter notation):
(Note we wish to define a uniform convergence of a sequence of sets to A since the definition is analogous to a uniform measure.)
Definition 8 (E
quivalent Pre-Structures).
The pre-structures and of A are equivalent if, from Definition 3, where and :
Definition 9 (
Non-Equivalent Pre-Structures)
The pre-structures and of A are non-equivalent if, from Definition 3, where and :
Definition 10 (Pre-Structures converging Sublinearly, Linearly, or Superlinearly to A compared to that of another Sequence). Suppose pre-structures and are non-equivalent and converge uniformly to A; and suppose for every , where and :
- (a)
From Definition 5 and 6, suppose we have:
then (using 2.2.9) we have
- (b)
From Definitions 5 and 6, suppose we have:
then (using 2.2.11) we get
If using Equations (2.2.10) and (2.2.12) we have that:
then we say converges uniformly to A at a superlinear rate to that of.
If using Equations (2.2.10) and (2.2.12) we have that:
then we say converges uniformly to A at a linear rate to that of .
If using Equations (2.2.10) and (2.2.12) we have that:
we say converges uniformly to A at a sublinear rate to that of .
[leftmargin=*,labelsep=4.9mm] I assume and are always equal but I’m not sure how to prove this.
2.3. Question on Preliminary Definitions
3. Main Question
Does there exist a unique extension (or a method that constructively defines a unique extension) of the expected value of
f when the value’s finite, using the
uniform probability measure (pp. 32-37 [
18]) on sets measurable in the Carathèodory sense, such we replace
f with infinite or undefined expected values with
f defined on a
chosen pre-structure which depends on
A where:
The expected value of f on each term of the pre-structure is finite
The pre-structure converges uniformly to A
The pre-structure converges uniformly to A at a linear or superlinear rate to that of other non-equivalent pre-structures of A which satisfies (1) and (2).
The generalized expected value of f on a pre-structure (i.e. an extension of Definition 3 to answer the full question) has a unique & finite value, such the pre-structure satisfies (1), (2), and (3).
A choice function is defined which chooses a pre-structure from A where the following satisfies (1), (2), (3), and (4) for the largest possible subset of .
If there is more than one choice function that satisfies (1), (2), (3), (4) and (5), we choose the choice
function with the “simplest form", meaning for a general pre-structure of A, when each choice function
is fully expanded, we take the choice function with the fewest variables/numbers (excluding those
with quantifiers).
4. Informal Attempt to Answer Main Question
(I advise using computer programmings such as Mathematica, Python, JavaScript, or Matlab to understand the definitions of the answer below.)
4.1. Generalized Expected Values
If the image of
f under
A is
, such from Definition 2 and 7, we take the pre-structure of
where:
and take the pre-image under
f of
(defined as
) such that:
However, note the expected value of
(Definition 3) may be infinite (e.g. unbounded
f). Hence, for every
, we take
where:
Thus, the
generalized expected value or
is:
and (similar to Definitions 2 & 3) if
we describe the process of the
generalized expected value as
.
4.2. Choice Function
Suppose
is the set of all pre-structures of
A which satisfies criteria (1) and (2) of the main question where the
generalized expected value of the pre-structures, as they converge uniformly to
A, is unique and finite such the pre-structure
should be a sequence of sets that satisfies criteria (1), (2), (3) and (4) of the main question where (using the end of
Section 4.1):
and pre-structure
is an element of
such (using the end of
Section 4.1):
but is not an element of the set of equivelant pre-structures of
(i.e. Definition 8).
Further note from (a), with Equation (2.2.9) in Definition 10, if we take:
and from (b), with Equation (2.2.11) in Definition 10, we take:
Then, using Definition 5 with Equations (4.2.3) and (4.2.4), if:
where, using absolute value function
, we have:
such that
and, using Equations (4.2.5)–(4.2.9) with the nearest integer function
, we want:
such, using Equation (4.2.10), if set
and
is the power-set, then set
is the largest element of:
w.r.t to inclusion, such the
choice function is
if the following contains just one element.
Otherwise, for
, suppose we say
represents the
k-th iteration of the choice function of
A, e.g.
, where the infinite iteration of
(if it exists) is
. Therefore, when taking the following:
we say
is the
choice function and the expected value, using Definition 4.2.1, is
.
4.3. Questions on Choice Function
4.4. Increasing Chances of an Unique and Finite Expected Value
In case
, in equation 4.2.12, does not exist; if there exists a unique and finite
(see
Section 4.1) where:
Then
is the
generalized expected value w.r.t choice function C, which answers criteria (1), (2), (3), (4), (perhaps (5)) of the question in
Section 3; however, there is still a chance that the Equation (4.4.1) fails to give an unique
. Hence; if
, we take the
k-th iteration of the choice function
C in 4.2.11, such there exists a
, where for all
, if
is unique and finite then the following is the
generalized expected value w.r.t finitely iterated C.
In other words, if the
k-th iteration of
C is represented as
(where e.g.
), we want a unique and finite
where:
If this still does not give a unique and finite expected value, we then take the
most generalized expected value w.r.t an infinitely iterated C where if the
infinite iteration of
C is stated as
, we then want a unique
where:
However, in such cases, should only be used for functions where the expected value is infinite or undefined or for worst-case functions—badly behaved (where for , , and f is a function) defined on infinite points covering an infinite expanse of space. For example:
For a worst-case f defined on countably infinite A (e.g. countably infinite "pseudo-random points" non-uniformly scattered across the real plane), one may need just one iteration of C (since most function on countable sets need just one iteration of C for to be unique); otherwise, one may use Equation (4.4.2) for finite iterations of C.
For a worst-case
f defined on uncountable
A, we might have to use Equation (4.4.3) as averaging such a function might be nearly impossible. We can imagine this function as an uncountable number of "pseudo-random" points non-uniformly generated on a subset of the real plane (see
Section 5.1 for a visualization.)
Note, however, that no matter how generalized and “meaningful" the extension of an expected value is, there will always be an f where the expected value does not exist.
4.5. Questions Regarding the Answer
4.6. Applications
-
In Quanta magazine [
3], Wood writes on Feynman Path Integrals: “No known mathematical procedure can
meaningfully average (Meaningful Average—The average answers the main question in
Section 3) an infinite number of objects covering an infinite expanse of space in general. The path integral is more of a physics philosophy than an exact mathematical recipe."—despite Wood’s statement, mathematicians Bottazzi E. and Eskew M. [
5] found a constructive solution to the statement using integrals defined on filters over families of finite sets; however, the solution was not unique as one has to choose a value in a partially ordered ring of infinite and infinitesimal elements.
- (a)
Perhaps, if Botazzi’s and Eskew’s Filter integral [
5] is not enough to solve Wood’s statement, could we replace the path integral with expected values from Equations (4.4.1)–(4.4.3) respectively (or a complete solution to
Section 3)? (See, again,
Section 5.1 for a visualization of Wood’s statement.)
As stated in
Section 2.1, “when the Lebesgue measure of
A, measurable in the Caratheodory sense, has zero or infinite volume (or undefined measure), there may be multiple, conflicting ways of defining a "natural" uniform measure on
A." This is an example of
Bertand’s Paradox which shows, "the
principle of indifference (that allows equal probability among all possible outcomes when no other information is given) may not produce definite, well-defined results for probabilities if applied uncritically, when the domain of possibilities is infinite [
17].
Using
Section 4.1, perhaps if we take (from Definition 4.2.12):
then for
, if we want
and we get the following:
Then
might serve as a solution to
Bertand’s Paradox (unless there’s a better
and
which completely solves the main question in
Section 3).
Now consider the following:
- (a)
-
How do we apply
(or a better solution) to the usual example which demonstrates the
Bertand’s Paradox as follows: for an equilateral triangle (inscribed in a circle), suppose a chord of the circle is chosen at random—what is the probability that the chord is longer than a side of the triangle? [
4] (According to
Bertand’s Paradox there are three arguments which correctly use the principle of indifference yet give different solutions to this problem [
4]:
The “random endpoints" method: Choose two random points on the circumference of the circle and draw the chord joining them. To calculate the probability in question imagine the triangle rotated so its vertex coincides with one of the chord endpoints. Observe that if the other chord endpoint lies on the arc between the endpoints of the triangle side opposite the first point, the chord is longer than a side of the triangle. The length of the arc is one-third of the circumference of the circle, therefore the probability that a random chord is longer than a side of the inscribed triangle is .
The "random radial point" method: Choose a radius of the circle, choose a point on the radius, and construct the chord through this point and perpendicular to the radius. To calculate the probability in question imagine the triangle rotated so a side is perpendicular to the radius. The chord is longer than a side of the triangle if the chosen point is nearer the center of the circle than the point where the side of the triangle intersects the radius. The side of the triangle bisects the radius, therefore the probability a random chord is longer than a side of the inscribed triangle is .
The "random midpoint" method: Choose a point anywhere within the circle and construct a chord with the chosen point as its midpoint. The chord is longer than a side of the inscribed triangle if the chosen point falls within a concentric circle of radius the radius of the larger circle. The area of the smaller circle is one-fourth the area of the larger circle, therefore the probability a random chord is longer than a side of the inscribed triangle is .
5. Glossary
5.1. Example of Case (2) of Worst Case Functions
We wish to create a function that appears to be a “pseudo-randomly" distributed but has infinite points that are non-uniform (i.e. does not have
complete spatial randomness [
13]) in the sub-space of
, where the expected value or integral of the function w.r.t
uniform probability measure [
18][ p.32-37] is non-obvious (i.e. not the center of the space the function covers nor the area of that space).
Suppose for real numbers and , we generate an uncountable number of "nearly pseudo-random" points that are non-uniform in the subspace .
We therefore define the function as .
Now suppose where the base-b expansion of real numbers, in interval , have infinite decimals that approach x from the right side so when we get .
Furthermore, for
, if
and
is a function where
takes the digit in the
-th decimal fraction of the base-
b expansion of
x (e.g.
), then
is a sequence of functions such that
is defined to be:
then for some large
and
, the intermediate function (before
f) or
is defined to be
where the points in
are "almost pseudo-randomly" and non-uniformly distributed on
. What we did was convert every digit of the base-
b expansion of
x to a pseudo-random number that is non-equally likely to be an integer, including and in-between, 0 and
. Furthermore, we also make the function appear truly “pseudo-random", by adding the
-th decimal fraction with the next
k decimal fractions; however, we want to control the end-points of
such if
, we convert
to
by manipulating Equation (5.1.2) to get:
such the larger
k is, the more pseudo-random the distribution of points in
f in the space
, but unlike most distributions of such points,
f is uncountable.
Let us give a specific example, suppose for the function in Equation (5.1.3) of
Section 5.1, we have:
(one can try simpler parameters); what is the expected value using either Equations (4.4.2) and (4.4.3) (or a more complete solution to
Section 3) if the answer is finite and unique?
What about for f in general (i.e. in terms of b, , , , and k)?
(Note if and , then the function is an explicit example of the function that Wood (Wood wrote on Feynman Path Integrals: “No known mathematical procedure can meaningfully average 1 an infinite number of objects covering an infinite expanse of space in general".) describes in Quanta Magazine)
Approximating the Expected Value
Definition 11 (
Approximating the Expected Value)
In practice, the computation of this expected value may be complicated if the set A is complicated. If analytic integration does not give a closed-form solution then a general and relatively simple way to compute the expected value (up to high accuracy) is with importance sampling. To do this, we produce values for some density function g with support
(hopefully with support fairly close to A) and we use the estimator:
From the law of large numbers, we can establish that so if we take M to be large then we should get a reasonably good computation of the expected value of interest.
Note importance sampling requires three things:
We need to know when point x is in set A or not
We need to be able to generate points from a density g that is on a support that covers A but is not too much bigger than A
We have to be able to compute and for each point
References
- Krishnan B. Visualization of uncountable number of psuedo-random points generate on subset of the real plane, 2023. Available online: https://www.wolframcloud.com/obj/4e78f594-1578-402a-a163-ebb16319ada2.
- Patrick B. John Wiley & Sons, New York, 3 edition, 1995. Available online: https://www.colorado.edu/amath/sites/default/files/attached-files/billingsley.pdf.
- Wood C. Mathematicians prove 2d version of quantum gravity really works. Quanta Magazine. Available online: https://www.quantamagazine.org/mathematicians-prove-2d-version-of-quantum-gravity-really-works-20210617.
- Alon Drory. Failure and uses of jaynes’ principle of transformation groups. Foundations of Physics, 45(4):439–460, feb 2015. Available online: https://arxiv.org/pdf/1503.09072.pdf.
- Bottazi E. and Eskew M. Integration with filters. Available online: https://arxiv.org/pdf/2004.09103.pdf.
- Adam Grabowski and Artur Kornilowicz. Introduction to liouville numbers. Formalized Mathematics, 25, 01 2017.
- Michael Greinecker (https://mathoverflow.net/users/35357/michael greinecker). Demystifying the caratheodory approach to measurability. MathOverflow. Available online: https://mathoverflow.net/q/34007 (accessed on 31 July 2010).
- Mark McClure (https://mathoverflow.net/users/46214/mark mcclure). Integral over the cantor set hausdorff dimension. MathOverflow. https://mathoverflow.net/q/235609 (version: 2016-04-07). Available online: https://mathoverflow.net/q/235609 (accessed on 07 April 2016).
- Dave L. Renfro (https://math.stackexchange.com/users/13130/dave-l renfro). Proof that neither “almost none” nor “almost all” functions which are lebesgue measurable are non-integrable. Mathematics Stack Exchange. Available online: https://math.stackexchange.com/q/4623168 (accessed on 21 January 2023).
- Ben (https://stats.stackexchange.com/users/173082/ben). In statistics how does one find the mean of a function w.r.t the uniform probability measure? Cross Validated. https://stats.stackexchange.com/q/602939 (version: 2023-01-24). Available online: https://stats.stackexchange.com/q/602939 (accessed on 24 January 2023).
- Brian R. Hunt. Prevalence: a translation-invariant “almost every” on infinite-dimensional spaces. 1992. Available online: https://arxiv.org/abs/math/9210220.
- Gray M. Springer New York, New York [America];, 2 edition, 2011. Available online: https://ee.stanford.edu/~gray/it.pdf.
- Rokach L. Maimon O. Springer New York, New York [America];, 2 edition, 2010. [CrossRef]
- L Olsen. The exact hausdorff dimension functions of some cantor sets. Nonlinearity, 16(3):963, mar 2003.
- William Ott and James A. Yorke. Prevelance. Bulletin of the American Mathematical Society, 42(3):263–290, 2005. Available online: https://www.ams.org/journals/bull/2005-42-03/S0273-0979-05-01060-8/S0273-0979-05-01060-8.pdf.
- Kenneth H. Rosen. Elementary number theory and its applications (6. ed.). Addison-Wesley, 1993.
- Nicholas Shackel. Bertrand’s paradox and the principle of indifference. Philosophy of Science, 74(2):150–175, 2007. Available online: https://orca.cardiff.ac.uk/id/eprint/3803/1/Shackel%20Bertrand’s%20paradox%205.pdf.
- Leinster T. and Roff E. The maximum entropy of a metric space. Available online: https://arxiv.org/pdf/1908.11184.pdf.
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).