3.1.2. Fuzzy Random Variable, Fuzzy Stochastic Process, and Integral Overview
This section provides some necessary background for fuzzy processes. The material mainly comes from the fuzzy set theory of Zadeh [23]. Here we will define a number of measures that are used in this paper. The Lebesgue measure is widely used to measure lengths, areas, volumes, and higher-dimensional extents of sets. It is defined on the Borel σ-algebra of subsets of Euclidean space. For example, the Lebesgue measure of an interval [a, b] in one dimension is simply the length of the interval, given by [b – a]. Counting measure - in combinatorics and discrete mathematics, the counting measure is a fundamental measure that assigns the number of elements in a set. It is defined on any set and is particularly useful for counting finite sets. For instance, the counting measure of a finite set A is equal to the cardinality (number of elements) of A. Probability measure - in probability theory, probability measures assign probabilities to events in a sample space. A probability measure is a measure that is defined on a σ-algebra of subsets of the sample space and satisfies the properties of non-negativity, assigning probability 0 to the empty set, and countable additivity. For example, the uniform distribution on the interval [0, 1] is a probability measure that assigns equal probability to all sub-intervals of [0, 1]. Hausdorff measure - in geometric measure theory, the Hausdorff measure is used to quantify the "size" or "extent" of fractional sets. It is defined on metric spaces and captures the concept of dimension. For instance, the Hausdorff measure of a fractional set like the Cantor set or the Sierpinski triangle measures the "length" or "area" of the set in a fractional sense.
Let K(R) be the family of all nonempty, compact, and convex subsets of R (field). The Hausdorff metric d
H is defined by the following:
A metric space is said to be separable if there exists a countable dense subset. If A, B, C ⊂ K(R) then:
Let Ω, A, P be a probability space. The mapping F: Ω-> K(R) is called A-measurable if it satisfies {w ϵ Ω : F(w) ⋂ C ≠ φ} ϵ A, for every closed set C ⊂ R. This means that for any element w that belongs to space Ω such that F(w) is disjoint with C, it is not equal to zero for every closed set C ⊂ R. A closed set means that all limiting and interior points belong to a set. A multifunction F ⊂ M is said to be Lp – integrably bounded if pth power is integrable for p ≥ 1, iff (if and only if) there exists h ⊂ Lp(Ω, A, P, R+) such that the norms metric (the norm ||.|| induces a metric on F, known as the metric induced by the norm or the norm metric) such that ||F|| ≤ h P-a.e (almost everywhere - event holds for almost all elements in a set, with the exception of a negligible subset of elements according to the probability measure P), R+ = [0, ∞) and ||F|| = dH(F, {0}) = sup |f|, f ϵ F. This means the norms in terms of metric is the greatest value of F. For example, imagine we have a function F that takes some inputs and gives us sets (groups of things) as outputs. Now, this function F is special in two ways:
- i)
it doesn't make sets that are too big or too wild.
- ii)
we can measure how big these sets are.
The first point means that if we look at the sets produced by this function, they're not too wild. They don't stretch out forever or become enormous. The second point is about measuring. We can measure the size of these sets by giving them a number that says how big they are. Now, this function F can't be too unpredictable. It has to follow certain rules. If we square the sizes of its sets and add them all up, that should be a reasonable number (integrable). This is where the "Lp" comes in. "p" is just a number, and when it's 1 or bigger, it means things are behaving well.
To make sure these sets aren't too big, there's another function (let's call it "h") that keeps an eye on them. It's like a supervisor. This supervisor function, "h," is also measured in a special way, and it makes sure our function's sets don't get too big. So, in simple terms, an Lp-integrably bounded multifunction is like a machine that produces sets. These sets are controlled and supervised to make sure they're not too big, and we can measure their sizes using numbers.
The membership function u on an interval closed set: R -> [0,1] is defined for a fuzzy set u ⊂ R, where u(x) is the degree of membership of x in the fuzzy set u. Let us denote by F(R) the fuzzy set u: R -> [0,1] such that [u]a ⊂ K(R) for every a ⊂ [0,1], where [u]a ={x ϵ R: u(x) ≥ a}. Define domain d∞: F(R) x F(R) -> [0,∞) codomain /space of images and F(R) with d∞ is a complete metric space by d∞(u, v) = sup dH ([u]a, [v]a), where a ⊂ [0,1]. This means that we are defining a function d∞ that takes two inputs from the set F(R) which represents the set of all real valued functions and returns a value in the range [0, ∞) – the non-negative real numbers. We will now show that d∞ is a metric in F(R) and (F(R) d∞) is a complete metric space. To accomplish this, we are considering F(R) equipped with the metric d∞ to establish if it is a complete metric space. A metric space consists of a set equipped with a metric, which is a function that measures the distance between elements in the set. In this case, F(R) is the set, and d∞ is the metric we are defining. The metric d∞(u,v) is defined as the supremum or the least upper bound of the Hausdorff distance between the equivalence classes [u]a and [v]a, where a belongs to the interval [0,1]. Here [u]a and [v]a represent equivalence classes of functions u and v, respectively, under the relation defined by the parameter a. The Hausdorff distance measures the “closeness” between two sets by considering the furthest distance from each set to the other. It is the maximum distance of a set to the nearest point in the other set. In this case, the sets [u]a and [v]a are being compared, and their Hausdorff distance is computed. The supremum is then taken over all values of a in the interval [0,1]. By defining the metric d∞ this way, we establish that F(R) is a complete metric space. Completeness means that every Cauchy sequence in F(R) converges to a limit within F(R). In other words, F(R) with the metric d∞ satisfies the property that any sequence of functions that gets arbitrarily close to each other eventually has a limit within the same set. So, we have defined the metric d∞ on the set of real valued functions F(R) and established F(R) equipped with d∞ as a complete metric space. The metric measures the Hausdorff distance between equivalence classes of functions under a parameter a in the interval [0,1]. For every u, v, w, z ϵ F(R), λ ϵ R, we have the following properties:
additive property d∞(u+w, v+w) = d∞(u,v)
distribution property d∞(u+v, w+z) = d∞(u,w) + d∞(v,z)
transitive(triangular) property d∞(u,v) ≤ d∞(u,w) + d∞(w,v)
scalar multiplication property d∞(λu, λv) = |λ| d∞(u,v)
non-negativity d∞(u, v) is non-negative for all u, v in F(R)
identity of indiscernibles d∞(u, v) = 0 if and only if u = v
symmetry d∞(u, v) = d∞(v, u) for all u, v in F(R)
consistency d∞(u, v) = d∞(v, u) = 0 implies u = v.
Expanding the work of [16], particularly in 6 and 7 below, we will now define a fuzzy random variable as a function that assigns fuzzy sets to elements in probability space. We will introduce the concept of measurability with respect to different metrics such as d∞ and ds and describe the properties and relationships between these metrics and notion of a fuzzy random variable. We will break it down step by step:
-
Let (Ω, A, P) be a probability space, where Ω represents the sample space, A is a σ-algebra of subsets of Ω, and P is a probability measure defined on A. Given a set Ω, a σ-algebra of subsets of Ω, denoted as Σ, is a collection of subsets of Ω that satisfies certain properties. It is a fundamental concept in measure theory and serves as a foundation for defining measurable sets and functions. To formally define a σ-algebra, we require it to satisfy the following properties:
The empty set (∅) is in Σ: ∅ ∈ Σ.
Σ is closed under complementation: If A is in Σ, then its complement, denoted as Ac, is also in Σ.
Σ is closed under countable unions: If A₁, A₂, A₃... are subsets of Ω and each Aᵢ is in Σ, then the union of all the Aᵢ, denoted as ⋃ Aᵢ, is also in Σ.
In other words, a σ-algebra is a collection of subsets of Ω that includes the empty set, is closed under taking of complements, and is closed under countable unions.
The purpose of defining a σ-algebra is to identify a set of subsets of Ω that can be considered "measurable" in a certain sense. The elements of Σ are often referred to as measurable sets, and they form a structure that allows us to define measures (such as probability measures) on these sets.
By specifying a σ-algebra on Ω, we establish a framework for defining and working with measurable functions, integrating functions, and constructing measures. The σ-algebra provides a systematic way to determine which subsets of Ω are considered "measurable" within the given context, allowing us to analyse and reason about these subsets in a rigorous mathematical manner. In summary, a σ-algebra of subsets of Ω is a collection of subsets that satisfies specific properties, including containing the empty set, being closed under complementation, and being closed under countable unions. It provides a foundation for defining measurable sets and functions, allowing us to study measures and perform various mathematical operations on these sets.
- 2.
A fuzzy random variable is defined as a function X: Ω → F(R), where F(R) represents the set of fuzzy sets over the real numbers. This means that X assigns each element ω in Ω a fuzzy set X(ω) in F(R).
- 3.
For every α ∈ [0, 1], the mapping [X]α: Ω → K(R) (where K(R) denotes the set of fuzzy sets over R) should be A-measurable. This implies that the pre-images of measurable sets in K(R) under [X]α should be measurable sets in A.
- 4.
Consider a metric ρ on the set F(R) and let Bρ be the σ-algebra generated by the topology induced by ρ. A fuzzy random variable X can be seen as a measurable mapping between the measurable spaces (Ω, A) and (F(R), Bρ), and we refer to X as being A|Bρ-measurable.
- 5.
The metric ds(u, v) is defined as follows:
ds(u, v) := inf max {sup|λ(t) – t|, sup dH(Xu(t), Xu(λ(t)))} where λ∈Φ, t ∈ [0,1] and t∈[0,1], Φ denotes the set of strictly increasing continuous functions λ: [0,1] → [0,1] satisfying λ(0) = 0 and λ(1) = 1. Xu and Xv are càdlàg (right-continuous with left limits) representations of the fuzzy sets u and v in F(R), respectively. dH represents the Hausdorff distance between fuzzy sets.
- 6.
The space (F(R), d∞) is complete (meaning all Cauchy sequences converge) and non-separable (does not contain a countable dense subset), while the space (F(R), ds) is a Polish metric space (a complete and separable metric space). The reason for the difference in these properties lies in their respective definitions and metrics. The metric d∞, defined as the supremum of the Hausdorff distances between equivalence classes [u]a and [v[a, is a robust way to measure the “closeness” of fuzzy sets. It can be shown that the space (F(R), d∞) is complete, meaning all Cauchy sequences converge. However, d∞ doesn’t lend itself to finding a countable dense subset within the space (F(R), d∞). This makes (F(R), d∞) non-separable because it lacks the property of separability, even though it is complete. On the other hand, the metric ds is defined differently, and results in a metric space that is both complete and separable. This metric is designed in such a way that it allows for the existence of a countable subset within the space (F(R), ds ). Because (F(R), ds ) is complete (all Cauchy sequences converge) and separable (contains a countable dense subset), it mees the criteria to be classified as a Polish metric space.
- 7.
For a mapping X: Ω → F(R) on the probability space (Ω, A, P), X is a fuzzy random variable iff X is A|Bds-measurable and if X is A|Bd∞-measurable, then it is a fuzzy random variable, but the opposite is not necessarily true. For X to be considered a fuzzy random variable, it must meet the condition of being A|Bds-measurable. This means that X can be measured using ds metric. The ds metric is a specific way to compare how close or similar two fuzzy sets are. If X is also A|Bd∞-measurable, it indicates that X can be analysed using the d∞ metric. These two metrics, ds and d∞, are used to assess the closeness of fuzzy sets, but they do so differently. ds looks at the behaviour of the fuzzy sets through continuous λ function and their values at different points. In contrast, d∞ considers the worst case scenario by looking at the maximum difference between the fuzzy set at any point. It is important to note that while A|Bd∞-measurable is more inclusive, the opposite is not necessarily true. Just because X is A|Bd∞-measurable doesn’t automatically mean it is A|Bds-measurable. Therefore, A|Bds-measurable is a more specific and essential criterion for defining a complete fuzzy random variable. In summary, the given explanations define a fuzzy random variable as a function that assigns fuzzy sets to elements in probability space. It introduces the concept of measurability with respect to different metrics, such as ds and d∞, and describes the properties and relationships between these metrics and notion of a fuzzy random variable.
Next, we expand the work of [16] in fuzzy random random variable and fuzzy stochastic process. We define a fuzzy random variable X: Ω → F(R) is said to be Lp -integrably bounded for p ≥ 1 if the equivalence class [X]α belongs to the space Lp (Ω, A, P; K(R)) for every α ∈ [0,1]. Here, the random variables X, Y ∈ Lp (Ω, A, P; K(R)) are identical if P(d∞(X, Y) = 0) = 1. This means that for X and Y to be considered identical, the probability that their distance d∞ is equal to zero should be 1. d∞ measures the “distance” or dissimilarity between two fuzzy variables X and Y. When the distance d∞ between X and Y is zero, it indicates that X and Y are indistinguishable from each other in terms of their fuzzy characteristics. For a fuzzy random variable X: Ω → F(R) and p ≥ 1, the following conditions are equivalent: a) X ∈ Lp(Ω, A, P; F(R)). b) The equivalence class [X]0 belongs to Lp(Ω, A, P; F(R)). c) The norm of [X]0 denoted as ||[X]0|| belongs to Lp(Ω, A, P; R+), where R+ represents the non-negative real numbers. The statement then asserts that these three conditions (a, b, and c) are equivalent. In other words, if any one of these conditions is true, the other two conditions will also be true, and vice versa. This demonstrates the relationships between the integrability of a fuzzy random variable, its equivalence class, and the norm of that equivalence class in different spaces. The notation I := [0, T] signifies the interval from 0 to T, which means it is a closed interval from 0 to T inclusive and (Ω, A, P) represents a complete probability space equipped with a filtration {At}t∈I. A filtration is a sequence or family of sub-σ-algebras {At}t∈I defined on a probability space. In this case, the filtration is defined for each t in the interval I = [0, T]. A complete probability space refers to a probability space where every subset of a null set (a set with measure zero) is also measurable and has measure zero. A complete probability space means that not only is the sample space, the σ-algebra, and the probability measure defined, but it also has an additional property that every subset of a null set (a set with measure zero) is also measurable. In other words, if you have a set of outcomes that is so unlikely (e.g., its probability measure is zero), then any subset of that unlikely set is also considered measurable. Being measurable means that it's part of the σ-algebra, so you can assign probabilities to it. The requirement that these subsets of null sets are measurable ensures that there are no hidden or unmeasurable sets that could potentially lead to issues in probability theory. It essentially fills in any gaps or irregularities in the probability space, making it more robust.
The filtration is an increasing and right-continuous family of sub-σ-algebras of A, which contains all P-null sets. The filtration {At}t∈I is said to be increasing, which means that as t increases, the sub-σ-algebra At becomes larger or contains more sets. In other words, if s ≤ t, then As is a subset of At. This property ensures that the information or knowledge available at time s is also available at any later time t ≥ s. The filtration {At}t∈I is also right-continuous, which means that for any t in the interval I, the sub-σ-algebra At is equal to the intersection of all sub-σ-algebras As with s > t. In other words, the information available at time t is already available at any slightly earlier time t - ε, where ε is a small positive value. The filtration {At}t∈I is required to contain all P-null sets. A P-null set is a subset of the sample space Ω with measure zero. By including all P-null sets in each sub-σ-algebra At, the filtration captures all negligible information about the underlying probability space.
A fuzzy stochastic process X is said to be d∞-continuous if almost all its trajectories, denoted as X(·, ω), are d∞-continuous functions. Here, the trajectories refer to the mappings X(·, ω) that associate each time point t in the interval I with a fuzzy set X(t, ω) in F(R). The d∞-continuity of these trajectories means that they exhibit continuity with respect to the d∞ metric, which measures the dissimilarity between fuzzy sets. A fuzzy stochastic process X is considered measurable if the equivalence class [X]α: I × Ω → K(R) is a measurable multifunction for all α in the interval [0, 1]. Here, [X]α represents the equivalence class of X with respect to the parameter α, and K(R) denotes the set of fuzzy sets over the real numbers. A measurable multifunction [X]α: I × Ω → K(R) maps a measurable set in the time domain I and the sample space Ω to a set of measurable fuzzy sets in the value domain R. For example, Let X(t) be a fuzzy stochastic process that represents the price of a stock at time t. The parameter α can represent the level of confidence that we have in the price prediction. For example, α = 0.95 could represent a 95% confidence level. We can define the equivalence class [X(t)]α as follows: [X(t)]⍺ = {Y ∈ K(R) | Y(x) ≥ α for all x ∈ R}. The parameter α represents the level of confidence in the price prediction. It quantifies the minimum membership value required for a fuzzy set to be considered as part of the equivalence class [X(t)]⍺. This equivalence class contains all fuzzy sets that are at least as precise as X(t) with respect to the confidence level α. The equivalence class [X(t)]⍺ is defined as the set of fuzzy sets Y in K(R) that have a membership value greater than or equal to α for all x in the real numbers (R). In other words, [X(t)]⍺ contains all fuzzy sets that are at least as precise as X(t) with respect to the confidence level α.We can then show that the multifunction [X(t)]⍺: I × Ω → K(R) is measurable. This means that the fuzzy stochastic process X(t) is measurable. By stating that the multifunction [X(t)]⍺: I × Ω → K(R) is measurable, we are asserting that the mapping from the Cartesian product of the index set I and the sample space Ω to the set of fuzzy sets K(R) is a measurable multifunction. This means that the pre-images of measurable sets in K(R) under [X(t)]⍺ are measurable sets in I × Ω. The pre-images, in this context, refer to the subsets of the domain I × Ω that get mapped to a specific set in the target space K(R) under the mapping [X(t)]⍺: I × Ω → K(R).To put it simply, if we have a set A in K(R), the pre-image of A under [X(t)]⍺ is the set of all points (i, ω) in the domain I × Ω such that [X(t)]⍺(i, ω) belongs to A.
In other words, the pre-image of a set A in K(R) consists of all the elements (i, ω) in I × Ω that get mapped to A under the mapping [X(t)]⍺. By saying that the pre-images of measurable sets in K(R) under [X(t)]⍺ are measurable sets in I × Ω, it means that if we consider a measurable set A in K(R), the pre-image of A under [X(t)]α is a measurable set in I × Ω. This implies that the behaviour of the fuzzy stochastic process X(t) with respect to such pre-images is well-behaved and can be characterised by measurable sets in the domain I × Ω. For example, we have a fuzzy stochastic process X(t) that represents the temperature at a particular location over time. The parameter α represents the confidence level in our temperature predictions. The domain of the process is the Cartesian product of the time index set I and the sample space Ω. Let's assume I = {1, 2, 3} represents three time points and Ω = {A, B} represents two possible outcomes (e.g., sunny, or cloudy days). The target space K(R) represents the set of fuzzy sets over the real numbers. Equivalence class- for a specific time point t, let's say X(t) generates a fuzzy set that represents the temperature with respect to the confidence level α. Measurable sets in K(R), let's consider a measurable set A that represents a specific range of temperatures. Now, let's define the pre-image of A under [X(t)]⍺: the pre-image of A under [X(t)]⍺ is the set of all (i, ω) in I × Ω such that [X(t)]⍺(i, ω) belongs to A. In other words, it consists of all the time points and sample outcomes for which the fuzzy set generated by X(t) at that particular time and outcome falls within the range represented by A. For example, let's assume A represents temperatures between 20 and 30 degrees Celsius. If [X(t)]⍺ (2, B) represents a fuzzy set with membership values indicating temperatures around 25 degrees Celsius, then (2, B) would be in the pre-image of A because the temperature falls within the range represented by A. In this case, the pre-image of the measurable set A under [X(t)]⍺ would be a subset of I × Ω containing all the time points and sample outcomes where the fuzzy sets generated by X(t) fall within the temperature range represented by A.
We propose that the condition that [X]⍺ be measurable for all α in the interval [0, 1] ensures that the fuzzy stochastic process X can be integrated with respect to time and the sample space.
B(I) is the Borel σ-algebra, which consists of all subsets of the interval I. To determine measurability, [X]α: I × Ω → K(R) should be B(I) ⊗ A-measurable. Here, B(I) ⊗ A represents the product σ-algebra, which is generated by the Borel σ-algebra B(I) and the σ-algebra A. This requirement ensures that the equivalence class [X]α is measurable with respect to the joint σ-algebra generated by B(I) and A. In this context, K(R) denotes the set of fuzzy sets over the real numbers. Fuzzy sets are sets that assign membership degrees or degrees of truth to elements, allowing for a more flexible representation of uncertainty or partial truth. B(I) refers to the Borel σ-algebra, which is the σ-algebra generated by the open sets in the interval I. The Borel σ-algebra consists of all subsets of I that can be formed by taking unions, intersections, and complements of open sets. B(I) ⊗ A represents the product σ-algebra, which is the smallest σ-algebra generated by the sets of the form B × A, where B is a Borel set from B(I) and A is a set from the σ-algebra A. The requirement of [X]α being B(I) ⊗ A-measurable ensures that the equivalence class [X]α is measurable with respect to the joint σ-algebra generated by B(I) and A. This joint σ-algebra allows us to analyse and reason about the measurability properties of the fuzzy stochastic process X across both the time domain (Borel sets in I) and the probability space (sets in A). Suppose we have two sets: B(I), which represents the Borel σ-algebra generated by the open sets in the interval I, and A, which represents a σ-algebra defined on a sample space Ω. We want to generate a σ-algebra that captures the joint measurability of sets from B(I) and A. The product σ-algebra B(I) ⊗ A is defined as the smallest σ-algebra generated by sets of the form B × A, where B is a Borel set from B(I) and A is a set from the σ-algebra A. Let's consider a specific example to illustrate this concept. Assume that I is the interval [0, 1] and A is the power set of Ω = {a, b}, which contains all possible subsets of Ω. B(I): B(I) is the Borel σ-algebra on I, generated by the open sets in I. It includes sets such as open intervals, closed intervals, half-open intervals, and countable unions or intersections of such intervals. A: A is the power set of Ω, which includes all subsets of Ω. In this case, A = {∅, {a}, {b}, {a, b}}. To construct the product σ-algebra B(I) ⊗ A, we consider all possible combinations of Borel sets from B(I) and sets from A. For example, let's take B = (0.2, 0.8) (an open interval in I) and A = {a} (a singleton set in A). The set B × A is the Cartesian product of B and A, which results in the set {(x, a) | 0.2 < x < 0.8}. Similarly, we can consider other combinations of Borel sets from B(I) and sets from A. The product σ-algebra B(I) ⊗ A is then defined as the smallest σ-algebra that includes all such sets B × A for all possible choices of B from B(I) and A from A. In this example, the product σ-algebra B(I) ⊗ A would consist of sets such as B × A, where B represents Borel sets from B(I) and A represents sets from A. It captures the joint measurability of sets from B(I) and A, allowing us to reason about sets that involve both the real number domain and the sample space. Imagine we want to analyse rainfall data for different cities over time. We have a set of cities, say {London, Los Angeles, Paris}, and we want to consider different time intervals for our analysis. Our set of cities is like the interval I in our previous discussion. The time intervals could represent events or outcomes (such as days with rainfall data) in our probability space, denoted by the σ-algebra A. Now, we want to understand the measurability of rainfall data across both the cities and time intervals. This is where the product σ-algebra B(I) ⊗ A comes into play. For each city, we can create Borel sets, which are sets formed from open intervals. In this context, these could represent different subsets of time periods for each city. For example, we might have the Borel set for London's rainy days, the Borel set for Los Angeles' rainy days, and so on. σ-Algebra A - this represents our probability space, which includes various outcomes or events in our analysis. In this case, each outcome or event is a different time interval (e.g., a specific date or range of dates). Now, the product σ-algebra B(I) ⊗ A is the smallest σ-algebra that contains all possible combinations of sets formed by pairing a Borel set from B(I) with a set from σ-algebra A.
In our example, it means we're looking at measurable sets that are related to both specific cities (Borel sets) and specific time intervals (σ-algebra A). It allows us to answer questions like: "How does rainfall in London relate to specific time intervals, and how does it compare to rainfall in Los Angeles during those times?" So, B(I) ⊗ A helps us explore the measurability of events that occur at the intersection of these two domains: cities and time intervals.
A process X is said to be nonanticipating if, for every α ∈ [0, 1], the multifunction [X]α is measurable with respect to the σ-algebra N. The σ-algebra N is defined as follows: N := {A ∈ B(I) ⊗ A : At ∈ A for every t ∈ I}, where At represents the set of outcomes ω such that the pair (t, ω) belongs to the set A. In other words, N consists of subsets of I × Ω for which the corresponding time t belongs to the set A.
Before proceeding further, we need to state a few theorems.
Definition 1
The Fubini's theorem [24], also known as the Fubini-Tonelli theorem, is a fundamental result in measure theory and integration theory. It provides a condition under which the iterated integrals of a measurable function over a product space can be computed by performing separate integrations. Formally, let (Ω₁, A₁, μ₁) and (Ω₂, A₂, μ₂) be two measure spaces, where Ω₁ and Ω₂ are sets, A₁ and A₂ are σ-algebras of subsets of Ω₁ and Ω₂, respectively, and μ₁ and μ₂ are measures defined on A₁ and A₂, respectively. Consider the product measure space (Ω, A, μ) defined as the product of the two measure spaces, where Ω = Ω₁ × Ω₂, A = A₁ ⊗ A₂ (the product σ-algebra), and μ is the product measure. Fubini's theorem states that if f: Ω → [0, +∞] is a non-negative measurable function, then the iterated integrals ∫Ω₁∫Ω₂ f(ω₁, ω₂) dμ₁(ω₁) dμ₂(ω₂) and ∫Ω₂∫Ω₁ f(ω₁, ω₂) dμ₂(ω₂) dμ₁(ω₁) are both well-defined and equal, provided at least one of them is finite. In other words, under appropriate conditions, the order of integration can be interchanged without affecting the result.
Definition 2
The Aumann integral, also known as the integral with respect to a fuzzy measure, is a generalization of traditional integration theory that applies to fuzzy sets. It was introduced by Robert J. Aumann, a mathematician and economist, to extend the concept of integration to accommodate uncertainty and partial truth [25]. In traditional integration theory, the integral of a function measures the "area under the curve" or the accumulation of a quantity over a given range. However, when dealing with fuzzy sets, where membership degrees represent degrees of truth or uncertainty, the concept of area or accumulation needs to be generalised. The Aumann integral addresses this by considering the integration of a fuzzy set with respect to a fuzzy measure. A fuzzy measure assigns a degree of "importance" or "size" to subsets of a set, similar to how a traditional measure assigns a value to subsets. The Aumann integral captures the concept of "accumulation" of fuzzy sets over a range of values. Formally, let X be a fuzzy set defined on a set Ω, and let μ be a fuzzy measure defined on the power set of Ω. The Aumann integral of X with respect to μ, denoted as ∫ X dμ, is a generalized notion of integration that takes into account the degrees of truth or membership of X. The Aumann integral is typically defined in terms of a level-wise integral. For each α in the interval [0, 1], the Aumann integral ∫ X dμ is equal to the integral of the α-cut of X (i.e., the subset of elements with membership degree at least α) with respect to the α-cut of the fuzzy measure μ. The Aumann integral allows for the incorporation of fuzzy sets and fuzzy measures into integration theory, enabling the quantification of accumulation and aggregation of fuzzy information. For example, let's consider a product space where X is the set of real numbers ℝ, and L is the lattice of closed intervals [a, b] on the real line, ordered by inclusion. Suppose we have a function f: ℝ × [0, 1] → ℝ defined as f(x, [a, b]) = x
2;. Here, x represents a point in the real line, and [a, b] represents a closed interval. To compute the Aumann integral of f, we divide the lattice of closed intervals into levels based on the length of the intervals, such as intervals of length 1, intervals of length 1/2, intervals of length 1/4, and so on. We then integrate the function f(x, [a, b]) = x
2; over each level, considering the measure on the real line. Finally, we combine these integrals according to lattice operations to obtain the Aumann integral. The Aumann integral of the function f(x,[a,b]) = x
2 over the product space
X x L, where X is the set of real numbers and L is the lattice of the closed intervals on the real line is given by:
where μ is the Lebesgue measure on the real line and λ is the counting measure on the lattice L. Since the counting measure gives a weight of 1 to each interval, the integral over L is simply the sum of the values of f(x,[a,b]) over all intervals [a,b] ϵ L. Therefore, the Aumann integral becomes:
The sum over all intervals [a,b] can be replaced by an integral over the real line, since the intervals are contiguous and non-overlapping.
Evaluating the inner integral, we get:
Evaluating the outer integral, we get:
The reason the Aumann integral is infinite lies in the nature of the counting measure and the unbounded function. For any given x, there are infinitely many intervals [a, b] in L that contain x. Since the function f(x, [a, b]) = x2 depends only on x and is unbounded for large values of x, summing its value over infinitely many intervals will inevitably lead to an infinite result. Therefore, the Aumann integral of the function f(x, [a,b]) = x2 over the product space X x L is infinite. To obtain a finite Aumann integral, we could restrict the lattice L to a subset of intervals with finite measure or specific properties.
Following works of [16], we will now use Fubini’s theorem and Aumann integral to build the fuzzy integral and discuss its properties. A fuzzy stochastic process X is called Lp -integrably bounded (p ≥ 1) if there exists a real-valued stochastic process h ∈ Lp(I × Ω, N; R+), where I is the interval and Ω is the probability space, such that the fuzzy set ||[X(t, ω)]0|| is bounded by h(t, ω) for almost all (t, ω) in I × Ω. The notation ∣∣[X(t, ω)]0∣∣ represents the magnitude or absolute value of the equivalence class [X(t, ω)]0. Lp(I × Ω, N; F(R)) denotes the set of nonanticipating and Lp-integrably bounded fuzzy stochastic processes. Nonanticipating means that the process X does not rely on future information, but only on the current and past information. By employing the Fubini theorem, the fuzzy integral of a fuzzy stochastic process X, denoted as is defined. Here, T is a fixed time point, and the integral is taken over the interval from 0 to T. The integral is defined for ω in the set Ω minus NX, where NX is a subset of Ω belonging to the σ-algebra A with P(NX) = 0, meaning it has measure zero. The fuzzy integral can be defined level-wise. For each α in the interval [0, 1] and for each ω in the set Ω minus NX, the Aumann integral belongs to the set K(R), which represents fuzzy sets over the real numbers. Therefore, the fuzzy random variable belongs to the set F(R) for every ω in the set Ω\NX.
Now we define the fuzzy stochastic Lebesgue-Aumann integral of X ϵ L
1(I x Ω, N; F(R)).
We will now discuss properties of this integral LX. First, we introduce some more terminology. The material comes from Kolmogorov [26] and Zadeh [23]. The d∞ metric, also known as the supremum or uniform metric, is a mathematical measure of dissimilarity or distance between fuzzy sets. It quantifies the maximum discrepancy between the membership degrees of corresponding elements in two fuzzy sets. Formally, let A and B be two fuzzy sets defined on a common universe of discourse. The d∞ metric between A and B, denoted as d∞(A, B), is defined as the supremum or maximum of the absolute differences in membership degrees: where μA(x) and μB(x) represent the membership degrees of the element x in the fuzzy sets A and B, respectively. The supremum is taken over all elements x in the universe of discourse. The d∞ metric provides a measure of the largest difference in membership degrees between corresponding elements of two fuzzy sets. It captures the maximum discrepancy or dissimilarity between the two sets, indicating how much they differ in terms of their membership degrees. Now we will define L1 norm. L1 refers to a mathematical space or norm in the context of functions and integrals. Specifically, L1 represents a function space of integrable functions with a finite integral norm. Formally, L1 is a function space defined over a given measure space (Ω, A, μ). It consists of all measurable functions f: Ω → ℝ (or f: Ω → ℂ) such that the integral of the absolute value of f, denoted as ∫|f| dμ, is finite: .
In simpler terms, L1 consists of functions whose absolute value is integrable over the measure space, meaning that the area under the curve of the absolute value function is finite. The norm associated with the L1 space, denoted as ‖f‖₁, is defined as the integral of the absolute value of f: ‖f‖₁ = ∫|f| dμ. The L1 norm measures the size or magnitude of the function f in terms of its integral, capturing how spread out or concentrated the function is. L1 represents a function space of integrable functions, where the absolute value of the function is integrable. The associated norm, ‖f‖₁, measures the size of the function based on its integral. In the context of stochastic processes, the L1 space refers to a function space that consists of processes that are integrable with respect to a given measure. Formally, let (Ω, A, P) be a probability space and let X be a stochastic process defined on that space. The L1 space, denoted as L1(Ω, A, P), consists of all stochastic processes X that satisfy the condition of integrability. This means that the expected value or integral of the absolute value of X is finite: L1(Ω, A, P) = {X: Ω → ℝ (or ℂ) | E[|X|] < ∞}. Here, E[·] denotes the expected value or the integral with respect to the probability measure P. The absolute value |X| ensures that both positive and negative values contribute to the integrability condition. In simpler terms, a stochastic process X belongs to the L1 space if the expected value of its absolute value is finite. This indicates that the process is well-behaved in terms of its average behaviour and does not exhibit extreme fluctuations or infinite values. Now we define the LP space. The LP space, also known as a p-norm space, is a function space that consists of functions with finite p-norms. Formally, the LP space, denoted as Lp(Ω, A, μ), is defined over a given measure space (Ω, A, μ). It consists of all measurable functions f: Ω → ℝ (or ℂ) such that the p-th power of the absolute value of f, denoted as |f|p, is integrable with respect to the measure μ: Lp(Ω, A, μ) = {f: Ω → ℝ (or ℂ) | ∫|f|p dμ < ∞}. Here, p is a positive real number greater than or equal to 1. The p-norm of a function f, denoted as ‖f‖p, is defined as the p-th root of the integral of |f|p: ‖f‖p = (∫|f|p dμ)(1/p). The LP space is a function space that captures functions whose p-th power is integrable, indicating that the function is well-behaved in terms of its magnitude and spread. The norm associated with the LP space, the p-norm, measures the size or magnitude of the function f with respect to its integral, emphasizing the contribution of higher values due to the exponent p. The LP spaces have various properties, including completeness, which means that every Cauchy sequence of functions converges to a limit within the space. Different values of p lead to different LP spaces. For instance, when p = 1, we have the L1 space, and when p = 2, we have the L2 space. In summary, the LP space consists of functions with finite p-norms, where the p-th power of the absolute value is integrable. It provides a framework for studying well-behaved functions in terms of their magnitudes and spread, with the p-norm quantifying their sizes. In the context of stochastic processes, the LP space refers to a function space that consists of processes that are p-integrable with respect to a given measure. Formally, let (Ω, A, P) be a probability space and let X be a stochastic process defined on that space. The LP space, denoted as Lp(I × Ω, A × N ; F(R)), consists of all stochastic processes X that satisfy the condition of p-integrability. This means that the p-th power of the absolute value of X, denoted as |X|p, is integrable with respect to the product measure A × N: Lp(I × Ω, A × N ; F(R)) = {X: I × Ω → F(R) | ∫∫|X(t, ω)|p dP(t, ω) < ∞}. Here, p is a positive real number greater than or equal to 1. The integral is taken over the product measure P, which is the joint measure defined on the σ-algebra generated by the Cartesian product of the intervals I and the probability space (Ω, A, P). The p-integrability condition ensures that the p-th power of the absolute value of the process X is well-behaved and has a finite integral.
Having defined integrability conditions, we can now expand the work of [16] with exploring the Lebesgue-Aumann integral. If X ∈ L
p(I × Ω, N ; F(R)) for p ≥ 1, then L
X(·, ·) ∈ L
P(I × Ω, N ; F(R)): This property states that if the fuzzy stochastic process X belongs to the L
P space, which consists of processes that are p-integrable, then the fuzzy stochastic Lebesgue-Aumann integral L
x, defined over the same probability space, also belongs to the L
P space. In other words, the integrability of X carries over to the integrability of L
x. Let X ∈ L
1(I × Ω, N ; F(R)), then {L
X(t)}
t∈I is d
∞-continuous. This property states that if the fuzzy stochastic process X belongs to the L
1 space, which consists of processes that are integrable, then the trajectories of the fuzzy stochastic Lebesgue-Aumann integral {L
x(t)}
t∈I are d
∞-continuous. In other words, the integral L
x(t) is a d
∞-continuous function of time t for each fixed outcome ω. This implies that the fuzzy integral exhibits continuity with respect to the d
∞ metric, capturing the dependence and smoothness of the integral over time. We will now define an inequality that provides a bound on the maximum difference between fuzzy sets L
t,x(u) and L
t, y(u) over the interval [0, t] in terms of the cumulative difference between fuzzy sets X(s) and Y(s) over the sample interval. The inequality captures the relationship between the supremum and integral of the d
∞ metric almost everywhere (a.e.). This inequality states that the supremum of the d
∞ metric between L
t,x(u) and L
t, y(u) for
u in in the interval [0, t] is bounded by the scaled integral of the d
∞ metric between X(s) and Y(s) for
s in the interval [0, t] with the scaling factor depending on the value of p.
The L.H.S. represents the supremum (maximum) of the d∞ metric between Lt,x(u) and Lt,y(u), where t is a fixed value and u ranges from 0 to t. Here, dp∞(a, b) denotes the d∞ metric between two fuzzy sets or elements a and b. dp∞(Lt,x(u), Lt,y(u)) is the d∞ metric between the fuzzy sets Lt,x(u) and Lt,y(u) at each u. It measures the maximum difference in membership degrees between corresponding elements of the fuzzy sets at a given u. The R.H.S. represents an integral involving the d∞ metric between the fuzzy sets X(s) and Y(s), where s ranges from 0 to t. The integral is scaled by tp–1, where p is a positive real number. It is the integral of the d∞ metric between X(s) and Y(s) at each s. It measures the cumulative difference in membership degrees between corresponding elements of the fuzzy sets over the interval [0, t]. tp–1 scales the integral by a factor that depends on the value of p. The value of p determines how the integral contributes to the overall inequality. The inequality holds almost everywhere (a.e), meaning it holds for all values of t except possibly for a set of measure zero.
We will now define the fuzzy stochastic Itô integral by using fuzzy random variable , where W is a Wiener process. Let X ∈ L2(I × Ω, N; R), then {}, where t ∈ I. This represents a fuzzy stochastic process defined over the interval I. For each fixed t, is a fuzzy set. The process captures the evolution of these fuzzy sets as t varies. Here, dW(s) represents the differential of a Wiener process or Brownian motion. The next part indicates that the fuzzy stochastic process belongs to the L2 space over the product space I × Ω. The fuzzy sets generated by the integral have finite second moments or variances and take values in the space of fuzzy sets over the real numbers, denoted as F(R). The last proposition states that if X belongs to a family of square integrables then the next integration will become fuzzy stochastic differential equation and then that integration also belongs to the family of square integrables. Formally, let X ∈ L2(I × Ω, N ; R), then X belongs to the L2 space, which consists of processes that have finite second moments or variances. The process X takes real-valued values and is d∞-continuous. In other words, as t varies within the interval I, the fuzzy sets exhibit continuity with respect to the d∞ metric. As discussed earlier, the d∞ metric, also known as the supremum or uniform metric, measures the maximum difference in membership degrees between corresponding elements of fuzzy sets. Thus, the statement asserts that the fuzzy sets , obtained by integrating X with respect to the Wiener process, show continuity with respect to the d∞ metric as t varies. This continuity property is significant because it indicates that the fuzzy stochastic process has a smooth and well-behaved evolution with respect to the d∞ metric. It suggests that the variations in the fuzzy sets, induced by the integral, are gradual and predictable as t changes within the interval I. In summary, the given statement states that if the stochastic process X belongs to the L2 space, then the fuzzy stochastic process , obtained by integrating X with respect to a Wiener process, exhibits continuity with respect to the d∞ metric. This property implies a smooth and predictable evolution of the fuzzy sets as t varies.
3.1.3. Fuzzy Stochastic Differential Equation when H> ½
We will now look at the special case of H> ½ and use Liouville form fBm [22] [27]. This is a type of fractional Gaussian process that is defined by a covariance function that exhibits a power-law scaling. Specifically, the covariance function is given by:
, where 0 < H < 1 is the Hurst exponent. Now, in this case, H > ½ for the process to have finite moments of order greater than or equal to two. If H < ½, the process has infinitely many moments and the covariance function becomes singular. This means that the covariance function doesn't satisfy the conditions of positive definiteness that are typically expected for covariance functions. When H> ½, Liouville fBM does not have mean-reversion. When H< ½, the fBm process does not possess the self-similarity and stationarity properties required for the Liouville form. A fractional Brownian motion with H < ½ is often referred to as "rough" or "pathologically irregular." It exhibits strong long-range dependence, which means that distant points in the process are highly correlated. In the Liouville form, fBm is expressed as a stochastic integral involving a deterministic function and a standard Brownian motion. It has the form [22] :
where the FSDE is given by equation (10), X(t) is the solution of the equation at time t, X
0 is the initial condition of X at time 0, f(s, X(s)) is a function that describes the drift of X, and g(s, X(s)) is a function that describes the diffusion of X, and B
H is the Liouville fBm that drives the equation. The integral term in equation represents the stochastic integral of g(s, X(s)) with respect to B
H(s), which is a type of integral that is defined in terms of the Itô integral and the covariance structure of the fBm. The angle brackets around the integral indicate that it is taken in the sense of the Stratonovich integral. The corresponding approximation equation to (10) is given by X
ε(t), where X
ε(t) is an approximation to X(t) that is obtained by discretising the integral term using a numerical scheme. The integral term in the approximation equation is approximated using a discrete version of the stochastic integral, which is defined in terms of the Riemann sum and the covariance structure of the discretised fBm, denoted by BH
ε(s) [27].
This section is discussing a class of stochastic differential equations that are driven by a Liouville fBm, which is a type of fractional Gaussian process with a power-law covariance structure. Building on [16], we will now make a couple of assumptions to prove that (11) has a strong unique solution. First assumption(A1) is that for every u, v F(R) and every t I, there exists L > 0 such that max{d∞(f(t, w,u), f(t, w, v)), |g(t, w, u) – g(t, w, v)|} ≤ Ld∞(u,v). This assumption helps estimate upper bound using constant L. The second assumption (A2) is the Cauchy-Schwarz inequality. Suppose there is C > 0 such that max{d∞(f(t, w,u⟨0⟩), |g(t, w, u)| } ≤ C(1+d∞(u,⟨0⟩). Last assumption (A3) is that the mapping f: I x Ω x F(R) -> F(R) and g: I x Ω x F(R) -> F(R) are measurable with respect (w.r.t) to Standard Brownian motion. Formally, N Bds|Bds - measurable and N Bds|B(R) – measurable, respectively. The product sigma-algebra is a way to combine two sigma-algebras (sets of events) from different sources. In this work, N represents the sigma-algebra associated with the underlying probability space, which captures the randomness or uncertainty inherent in the system. Bds|Bds denotes the sigma-algebra generated by the Brownian motion process Bds. It contains all the events or information that can be described solely in terms of the Brownian motion process. The symbol ⊗ represents the product operation between sigma-algebras, which creates a larger sigma-algebra that combines the information from both sources. The product sigma-algebra N ⊗ Bds|Bds captures the combined information from the underlying probability space and the Brownian motion process. It allows us to consider events or random variables that depend on both the inherent randomness of the system (represented by N) and the randomness introduced by the Brownian motion process (represented by Bds|Bds).
This construction is important in the theory of stochastic processes and stochastic differential equations because it provides a way to define and work with stochastic integrals, which are integrals with respect to Brownian motion or other stochastic processes. The measurability condition (A3) in the context provided ensures that the drift and diffusion functions are well-defined and can be integrated with respect to the Brownian motion process, taking into account both the underlying probability space and the Brownian motion itself.
Next, we will consider the bound on the distance between two stochastic integrals involving Wiener process W(s). Let X(s) and Y(s) be two stochastic processes and let d
2∞ (·, ·) denote L2 distance between two processes. Then by Burkholder and Gundy inequality [28]:
Here, 〈·〉 denotes the quadratic variation of a stochastic process, which is a measure of the total variation of the process. The integral
represents the stochastic integral of X(s) with respect to the Wiener process up to time u. The inequality states that the distance between the stochastic integrals of X(s) and Y(s) with respect to W(s) is bounded by a multiple of the integral of their L₂ distance over time. The constant 4 in the inequality is a universal constant that arises from the properties of the Wiener process. (12) is a special case of the Burkholder-Davis-Gundy (BDG) inequality which we will explore further down the line and on which we will rely in other proofs. We will be using (12) and the assumptions above to show that (11) has a strong unique solution. We need to establish two points: 1. Existence of a strong solution. There exists a process X(t) that solves the FSDE a.s. for all t ≥ 0. 2. Uniqueness of the strong solution. The solution X(t) is unique a.s. Using equation (5) and integration by parts technique, we can re-write (11):
where
,
.
The proof begins by introducing the Picard iterations [24], denoted as
, which are obtained by iteratively applying the equation
, almost everywhere (a.e.,). These iterations aim to find an approximation for the solution of the equation. To approximate X(t), the Picard iteration scheme defines a sequence of iterates
that should converge to X(t) as n goes to infinity. Starting with
, we calculate
by plugging X
0 into the right-hand side. Then
is calculate using
and so on. The key idea is that as n increases, the sequence
should get closer and closer to the true solution X(t), provided some condition on f, g and the initial data are satisfied to ensure convergence. The parameters
ε and
a are introduced to control the convergence behaviour, while φ is some approximating kernel. The Picard iterations provide a recursive way to get successively better approximations to the original equation’s solution by iteratively updating an integral equation approximation. Define the sequence j
n(t) =
. The sequence measures the distance between two consecutive iterations. When we take n=1, and by the assumption that there is C>0 such that
) and from Theorem 4.1 in Llendeto (2007), we know that:
Therefore, we have a constant 3 in:
Using proposition (9)
, proposition (12), linearity of the expectations operator, and since our metric is
, so we can pull out constants as square, we can restate:
Using assumption
for C >0, and properties of the expectation operator, we substitute this into the following integral expression to obtain the following. Also remember that
measures the squared maximum distance between some fuzzy set
at a specific time s and zero fuzzy set, where all membership values are zero and utilise metric inner product. For a different version of this proof see [16].
This works only for
which was our assumption earlier and which is why it only works for H > ½. For H < ½ we will be using different integration methods. Next, we will utilise the following results. The
distance, or Hausdorff distance, measures the maximum discrepancy between two sets by considering the minimum distance from each point in one set to the other set. This means that the
distance accounts for the smallest possible distance between the two sets. On the other hand, the
distance, or supremum distance, measures the maximum difference between the values of two functions or sets. It calculates the maximum absolute difference between corresponding points of the two functions or sets. Since the
distance considers the minimum distances, it takes into account the smallest possible difference between the two sets. Therefore, the
distance will always be greater than or equal to the
distance, which measures the maximum difference. Using the above and by metric inner product:
We will now use the famous Hlder inequality [19]:
and re-arrange terms r with s.
Since the integration variables are independent, we can use Fubini’s theorem [24] to rearrange the integration variables. This is the rule for swapping independent integration variables. If the function being integrated is continuous over a rectangular region and the integral is of the form , then the order of integration can be swapped, resulting in or . For example, consider the following double integral: where R is the region defined by 0 ≤ x ≤ 1 and 0 ≤ y ≤ 2. If we integrate with respect to x first, we have If we integrate with respect to y first, we have: In this example, the integrand is continuous over the rectangular region R, and therefore, we can swap the order of integration without affecting the result. It is important to note that not all integrals allow for swapping the order of integration. The conditions for applying Fubini's theorem include continuity of the integrand, boundedness of the region of integration, and absolute integrability. If these conditions are not met, swapping the order of integration may lead to incorrect results.
We will now expand the work of [16]. So first we will use Fubini to re-arrange r with s in this integral. Then we will apply Itô isometry or the Itô integral rule. It is a key tool in stochastic calculus and is used to calculate the expected value of stochastic integrals involving Brownian motion or other forms of stochastic processes. The identity essentially says that the square of the difference between two points in a Brownian motion path, raised to the power of (α–1), can be expressed as the double stochastic integral of (α–2) powers of the differences between the same points and the endpoints of the integration range. Intuitively, this identity can be understood as follows: the Brownian motion process is continuous and has infinite variation, meaning that the difference between two points in a path can take on any value in a certain range with positive probability. The Itô isometry [29] allows us to calculate the expected value of the square of these differences, which is a key step in deriving stochastic integrals and differential equations. In the rearrangement of terms given in the original question, the Itô isometry is used to express the term (s – r + ε)a-1 as a double stochastic integral, which allows for the interchange of the order of integration and ultimately results in the desired form of the expression. . Use this identity with t=r and s=s, we have, substitute this into the integral, we can now interchange the order of integration between s and r. There are three integrals in the intermediate step because the stochastic integral involving the Brownian motion process requires two integrals: one with respect to time and one with respect to the Brownian motion itself. There are two stochastic integrals involving Brownian motion: one over the range [u, s] and one over the range [r, t]. This requires a total of three integrals: one over the range [u, t] with respect to time, and two with respect to Brownian motion over the ranges [u, s] and [r, t], respectively. When the order of integration is interchanged between s and r, the resulting expression involves a double stochastic integral over the ranges [u, s] and [r, t], respectively. This requires two integrals with respect to Brownian motion and one integral with respect to time, which is consistent with the three integrals in the original expression.
The identity introduced with t = s and s = v allows for the simplification of the expression by expressing the term (s – r + ε)
a-1 as a double stochastic integral over the ranges [s, u] and [r, v], respectively. This ultimately leads to the desired form of the expression with only two integrals involving Brownian motion and one integral with respect to time.
Using Itô isometry [29] i.e., that (dW)
2 = dt, and for upper bound we simply multiply the expectation by 4 [24], we can simplify further the stochastic integral:
Because this is a linear integral, we can take the term outside the integral. Now using Cauchy-Schwarz inequality [24]:
we can evaluate the last expression:
We will now substitute this result in J
1(t), simplify and take exponential out of the bracket.
We will explain what this all means and why we need it. The first term J
1(t) in the estimate is an upper bound on the error between the true solution X(t) and the numerical solution X
ε(t) obtained at the nth time step. It represents the contribution to the error due to the approximation made at the current time step. The inequality for J
1(t) follows from the error estimate for the numerical method used to compute X
ε(t) at the current time step. This estimate indicates that the error in X
ε(t) at the current time step is proportional to the time step size t and to a constant factor that depends on the parameters of the numerical method and on the regularity of the solution X(t). Similarly, we can obtain the same result for J
n+1(t) which in the estimate is an upper bound on the error between the numerical solution X
ε(t) obtained at the (n+1)th time step and the numerical solution X
ε(t) obtained at the nth time step. It represents the contribution to the error due to the propagation of the error from the current time step to the next time step. The inequality for the next time step which we will prove now is:
It follows from a stability estimate for the numerical method used to compute Xε(t) at the (n+1)th time step. This estimate indicates that the error in Xε(t) at the (n+1)th time step is proportional to the time step size t, to a constant factor that depends on the parameters of the numerical method and on the regularity of the solution X(t), and to the norm of the difference between the numerical solutions for and obtained at the nth and (n-1)th time steps respectively. Heuristically, this makes sense because as the time steps are reduced, the numerical method is able to capture more accurately the behaviour of the true solution X(t) at each time step. Second, let's consider the constant factor that depends on the parameters of the numerical method and on the regularity of the solution X(t). This constant factor represents the accuracy of the numerical method itself and is determined by the choice of the numerical scheme, the order of accuracy of the numerical method, and other parameters specific to the method. The regularity of the solution X(t) also plays a role in determining the accuracy of the numerical method, since more regular solutions can be approximated more accurately than less regular ones. Finally, let's consider the norm of the difference between the numerical solutions and obtained at the nth and (n-1)th time steps, respectively. This term accounts for the propagation of the error from the previous time step to the current time step. If there is a large difference between the numerical solutions at the two-time steps, this indicates that the error has been propagated and has accumulated over time, leading to a larger error in the numerical solution at the current time step. The norm of this difference is used to quantify the magnitude of the error that has been propagated from the previous time step. Therefore, the estimate for the error in at the (n+1)th time step takes into account the time step size, the accuracy of the numerical method, and the propagation of the error from the previous time step. In summary, the term Jn+1(t) accounts for the propagation of the error from the current time step to the next time step and provides an upper bound on the total error in the numerical solution Xε(t) at the (n+1)th time step. By summing up the contributions from all time steps, we obtain an estimate for the total error in the numerical solution Xe(t) over the entire time interval [0, T].
This is our proof of the above. Let be the numerical solution obtained at n-th time step and be the numerical solution obtained at the (n-1)th time step. Then the error between these two solutions can be written as:
, using the difference equation for numerical method, we can write: , where is the true solution at (n+1)th step and is the time step size. Then the error in the numerical solution Xε(t) at (n+1)th time step can be written as: .
Taking the norm ||.|| of both sides and using triangle inequality property of norms in vector spaces, ||u + v|| ≤ ||u|| + ||v|, we obtain:
A function f(x) is said to be Lipschitz continuous if there exists a constant K such that:
for all x
1 and x
2 in the function's domain [24]. Intuitively, this means that the function's rate of change is bounded by a constant factor K, which is independent of the size of the interval over which the function is measured. Using Lipschitz continuity of the function f in the differential equation, we can write:
Where L
2 is the Lipschitz constant. It is the smallest possible value of K that satisfies the Lipschitz condition
In other words, it is the smallest possible upper bound on the rate of change of the function over its entire domain and it provides a measure of how rapidly the function changes over its domain. We will use it in the following context. If the right-hand side of a differential equation is Lipschitz continuous with a constant K, then numerical methods that a time step
t, will converge to the true solution at a rate proportional to t/K. Using the assumption on the distance between the initial condition u
0 and the true solution X(t), we can write:
Therefore, we can write the following and integrate both sides over time interval
Taking the expectation on both sides and using the fact that the expectation operator is linear, we obtain:
Use the fact that the square of the norm of the difference between two solutions is bounded by 3 times the sum of the squares of their individual norms. This is a special case of Cauchy-Schwarz inequality ||u - v||
2 ≤ 3(||u||
2 + ||v||
2) [24]. This is true because of the dot product and the norm. The dot product is u · v = ||u|| ||v|| cos(b), where b is the angle between u and v. Using this expression, we can write the square of the normal of the difference between u and v as: ||u - v||
2 = (u - v) · (u - v) = ||u||
2 - 2(u · v) + ||v||
2 = ||u||
2 + ||v||
2 - 2||u|| ||v|| cos(b), where we have used the distributive property of dot product. Note that the cos(b) is bounded by -1 and 1 so: -2||u|| ||v|| ≤ -2||u|| ||v|| cos(b) ≤ 2||u|| ||v||. Adding this inequality to the expression ||u-v||
2, we get: ||u - v||
2 + 2||u|| ||v|| = ||u||
2 + ||v||
2 - 2(u · v) + 2||u|| ||v|| = ||u||
2 + ||v||
2 + 2||u|| ||v|| - 2(u · v) we obtain: ||u - v||
2 + 2||u|| ||v|| ≤ ||u||
2 + ||v||
2 + 2||u|| ||v||. Then take the square root we get: ||u - v|| ≤ ||u|| + ||v|| and use the fact that ||u-v|| is non-negative, and that (||u|| + ||v||)
2 ≤ 3(||u||
2 + ||v||
2) for all real numbers ||u|| and ||v||, to get ||u - v||
2 ≤ (||u|| + ||v||)
2 ≤ 3(||u||
2 + ||v||
2). Now we continue with the original proof.
Therefore, we have derived the expression for j
n+1(t) term in the error estimate:
This inequality holds because the function jn(s) is defined as the supremum distance between values of the process X over the interval from n to n+1, as measured by the metric function Therefore, integral of jn(s) over the interval t0 to s gives an upper bound on the supremum distance between values of the process X over the same interval. The left-hand side of the inequality gives an upper bound on the function jn+1(t) while the right hand side gives a lower bound.
We will now apply Chebyshev inequality to the metric function [24]. The Chebyshev inequality is a probabilistic inequality that relates the probability of a random variable deviating from its mean to it variance. Specifically, the Chebyshev inequality states that for any random variable X with finite mean μ and finite variance σ2, the probability that X deviates from its mean by more than k standard deviations is bounded by 1/k2. P(|X-μ|>kσ) ≤1/k2. We will apply this inequality to the metric function to obtain a probabilistic bound on the supremum distance between values of the process X over the interval n and n+1.
Proof. Define Y to be the random variable such that:
Then we have:
Since the metric function is a distance measure, it is non-negative, and therefore
Y is also non-negative. We can choose k=2
n to obtain P(Y>2
nμ) ≤ Var(Y)/(2
nμ)
2, and since
Y is non-negative, we have E(Y)=μ ≥0, therefore (2
nμ)
2 ≥ 0. This means that the denominator of the fraction is non-negative. We can multiply and invert the inequality to obtain:
We will now use Borel-Cantelli lemma [30]. Let {E
n} be a sequence of events in the sample space Ω then if the sum of the probabilities of the events in the sequence is finite then the probability of the intersection of all the events is zero i.e.,
. If the events {E
n} are independent, then:
Conversely, if the events are independent and the sum of their probabilities diverges, then the probability of the intersection of all the events is 1. To understand this, consider the first case where the sum probabilities is finite. We can define a new sequence of events {F
n} as follows:
. That is the union of all the events in the sequence E
n starting from index n. Intuitively, F
n represents the event that at least one of the events E
n occurs starting from index n. Since the sum of the probabilities of the events in the sequence is finite, we have:
. This implies that the sum of the probabilities of the events in the sequence is also finite since F
n is the union of a finite number of events E
n. Therefore, we have:
The last step follows from the fact that the events En are non-negative and therefore the sum can be rearranged. Now, let's consider the event . Intuitively, Es represents the event that infinitely many of the events En occur. We can write Es as a countable intersection of the events {Fn}, i.e.: where lim sup is the limit supremum of the sequence {Fn}. In other words, Es is the set of all outcomes that belong to infinitely many of the events Fn. Since the sum of the probabilities of the events in the sequence {Fn} is finite, we can apply the first Borel-Cantelli lemma to conclude that: Pr(lim sup Fn) = 0. Therefore, we have shown that Pr(Es) = Pr(lim sup Fn) = 0, which completes the proof of the first case. The second case we don’t need here because Jn(T) is convergent. Having shown the convergence, we will now apply the Borel-Cantelli lemma. Let {fn(x)} be a sequence of measurable functions on [0,1] with |fn(x)| < ∞ for a.e. x ⊂[0,1]. Then there exists a sequence {cn} of posive real numbers such that fn(x)/cn -> 0 a.e. x ⊂[0,1]. Suppose there is a sequence such that m(En) ≤ 2-n. Then from lemma it follows that P(|f(x)|> N/n) > 2-n. Going back to our proof, and using Borel-Cantelli lemma:
This means that the probability of the event that the supremum of the d
2 distance between
is greater than
for infinitely many values of n is zero. Following from this, we can say that:
We will now formally prove this important result. The value 1/√2 comes from the fact that we are interested in the probability of the event that the supremum of the d2 distance between is greater than (1/√2)n for infinitely many values of n. The value 1/√2 is chosen because it is a convenient threshold that ensures the convergence of the series in the statement of the problem. In particular, the convergence of the series implies that the sequence converges almost surely in the d2 metric, which is a stronger condition than the convergence almost surely in the Euclidean metric since the latter is a special case of d2 metric. The choice of the threshold (1/√2)n is motivated by the fact that it ensures that the rate of growth of is bounded by √2 times the logarithm of n. The logarithm comes from the asymptotic behaviour of the harmonic series, which is related to the rate of growth of the sum of the variances of the random variables in the sequence The sum of the variances of the random variables is equal to n, since they are independent standard normal random variables. Therefore, the sum of the variances of the random variables in the sequence up to time n is equal to: .
We can use the fact that is bounded by 1 for all n, since u belongs to the unit ball in Rd. Therefore, we have . Using this result, we can re-write:
, where the last inequality follows from the fact that the sum of the variances of the random variables in the sequence
up to time n is bounded by n. We also use a well-known result in mathematics originally proved by Euler that the sum of the harmonic series i.e., the sum of the reciprocals of the positive integers is asymptotically bound by the natural logarithm of the number of terms in the series. 1 + ½ + 1/3 + …+1/n ≃ln(n) + ɣ as n->∞, where ɣ is the Euler-Mascheroni constant. So, in this context above the sum of √n grows slowly compared to the harmonic series and is bounded by ln(n). So
which follows from the fact that the integral 1/√x from 1 to ∞ is infinite, and the comparison test for series. Heuristically, the numerator grows faster than the denominator, so the sum of the series is infinite. This is why we choose the threshold
in the statement of the Borel-Cantelli lemma to ensure that the rate of growth of
is bounded. Now, recall that the sequence {
represents independent standard normal variables. Therefore, the distance between
. Now we will use the fact that these are independent random variables with mean 0 and variance 1. Therefore Var(
is equal to the sum of their variances which is 2 and the covariance term is zero. Using this we can write that:
taking square root, we have:
Therefore, by the Borel-Cantelli lemma, we can conclude that the probability of the event that the supremum of the d
2 distance between
is greater than
for infinitely many values of n is zero. So we have proven that:
We will now complete the proof of the existence of strong and unique solution. We know that as n goes to infinity then
so
as d(a,a) = a - a = 0. In other words,
become the same. Let’s take
as some continuous fuzzy stochastic process. Then, by the above
. Hence, as n goes to infinity:
This shows the existence of strong solution.
Now, we will show the uniqueness of the solution. For this we need Gronwall’s lemma and inequality [31]. Let y(t), f(t), g(t) be non-negative functions on a closed interval [a, b], where a < b. If f(t) satisfies inequality: for all ] then it follows that:
for all A special case of Grownwall’s lemma arises when g(t) is identically zero on the interval [a,b] which then simplifies to: which provides an upper bound on f(t). Grownall’s inequality provides an upper bound on the function f(t) in terms of the constant A and the exponential of the integral of g(s) over the interval [a, t]. The inequality guarantees that the growth of f(t) is controlled by the exponential function. The general form of the inequality is as follows: Let f(t) be a non-negative, continuous function on a closed interval [a,b], where a<b. If f(t) satisfies the inequality: , where A is a constant and g is a non-negative, continuous function, then it follows that: . Going back to our prove of uniqueness. Consider that:
, where we already have an estimation or upper bound which is:
In the context of the j(t) function, we apply Gronwall’s inequality with and . Using , we can state that,
. Therefore, j(t) = 0 for all t within the given interval. This bound shows that the function j(t) grows exponentially in time, with rate of growth determined by the constant
. However, the growth of j(t) is limited by its initial value j(0). If J(t)=0 for all t in I, this means that the distance between the two solutions of the stochastic differential equation is zero for all times in the interval I. But since j(t) is non-negative, this implies that j(t) =0 for all t≥t
0 where t
0 is the endpoint of the interval I. Therefore, the two solutions of the equation coincide for all times t≥t
0 which means they become arbitrarily close to each other as time goes to infinity. Prove of uniqueness is completed and
We will next prove that X
ε(t) converges to the solution X(t) in L
2 space as ε->0 and t ⊂ [0, T] which is fundamental if we want to apply Itô’s lemma later on. However, first we need to prove that for every ε > 0 and 0 < a < ½,
We will need this inequality to use in the former proof. We will employ the finite-increments formula in the proof. Let f(x) be a differentiable function defined on an interval [a,b] and let δ be a small increment. The formula states that:
where f’(x) represents the first derivative of
f evaluated between some point x and x + δ, and θ is a value between 0 and 1. In other words, the change in the function’s value over the interval [x, x+δ] is approximately equal to the derivative of the function times the increment δ, plus an additional term θδ accounting for the difference between the true change and the linear approximation. In the context of this proof, we will use the finite-increments formula to analyse the function f(x) = x
a-1 to obtain (x+ε)
a-1 – x
a-1= [(a-1)(x+θε)
(a-2)ε]. By applying formula to this function, we can derive relationships between the increments in the function’s values and the derivative of the function evaluated at intermediate points. These relationships are then used to establish the inequality for the integral in the proof. In summary, the finite-increments formula allows us to approximate the change in a function over small increments and provides a useful tool for bounding the behaviour of the functions. Here are steps of the proof. (i) Apply the finite-increments formula to obtain an upper bound for the absolute value of the following: |(
. Using this inequality, we will split the integral over the time interval [s,t] into two parts, one from s to s+ε and the second from s+ε to t.
Notice that can be written as .
Just to clarify (r-s) ≤ ε comes from the fact that we are integrating over the interval [s,t] and splitting it into two parts: [s, s+ε] and [s+ε, t]. Since r is in the interval [s, s+ε],
, so
and since r is in the interval [s+ε, t], we have
Since we are interested in bounding the integrand over the entire interval [s,t], we can take the maximum of the upper bounds for (r-s) over both parts, which gives
We can now use these bounds to re-write the integrals:
We will now use this result to prove the convergence of the solution and show how to apply it for the purposes of this paper. We want to prove that
converges to
. We will utilise equation (12) and Euler-Maruyama numerical method for that. This is a reminder of the equation (12):
Euler-Maruyama is a modification of the standard Euler method for solving ordinary differential equations (ODEs) to solve stochastic differential equations (SDEs). The solution of an SDE is itself a random process, and therefore cannot be computed exactly in closed form. The Euler-Maruyama approximates a solution of an SDE as a piecewise linear function. At each step, the approximation is updated by taking a small step in time and adding a random perturbation. Specifically, it takes the form: dX(t) = f(t,X(t))dt + g(t,X(t))dBH(t). Choose a time step size Δt and define the time points t
K = kΔt for k = 0,1,2,...,n. Initialise the approximation by setting X
0 = X(t
0). For k = 1,2,...,n, compute the approximation at time t
K as: X(t
K) =X(t
K-1) + f(t
K-1 ,X(t
K-1))Δt +g(t
K-1,X(t
K-1))(B
H(t
K) – B
H(t
K-1)), where B
H(t
K) – B
H(t
K-1) is the increment of the Brownian motion over the time interval [t
K-1 , t
K]. Now, to obtain a bound on the distance between X(t) and X
ε(t), we use the triangle inequality and the fact that the distance between the two drift functions f(X) and f(X
ε) can be bounded by the integral of the distance between them over time. Specifically, we have:
To get this result, we applied triangle inequality to the distance between X(u) and X
ε(u) to bound the first term.
Similarly, we can bound the second term and substitute these bounds we obtain (25). Then, we will apply the Burkholder-Davis-Gundy (BDG) inequality in our proof. BDG is a tool used to bound the moments of stochastic integrals involving martingales. It provides for a way to control the growth of moments of stochastic integrals. In general form, the inequality states that:
, where M
t is a continuous square-integrable martingale, p>1 is a real positive number and
is the maximum value of the absolute value of the martingale up to time t. In other words, BGD inequality relates the supremum of the pth power of a continuous square-integrable martingale M
t to the pth moment of its maximum value
. To apply BHG, we first need to express the stochastic integrals in terms of the increments of the Brownian motion (so they are martingale). In particular, we use equation (5) derivation of the fractional Brownian motion
:
where
and B(t) is a standard Brownian motion.
Using this representation, we can express the stochastic integral involving the difference between X and X
ε as:
We can similarly express the other integrals involving X and Xε in terms of the increments of the Brownian motion. Now that we have our representations, we can apply BDG inequality to bound the supremum of the stochastic integrals involving Brownian motion increments.
Recall that a special case of BDG that we used earlier:
We will now substitute (5)
where
so
into above and use the (BDG) inequality in our proof. We need to replace the fBm with normal Brownian motion. We will use the following properties of the fBm. The increments of the fractional Brownian motion
are stationary with mean zero and variance|t-s|
2H. The increments of the fractional Brownian motion
) are also stationary and have mean zero and variance |t|
2H. Using these properties we can write:
We will now apply the Doob’s inequality for Brownian motion to the 2
nd, 3
rd, 4
th and 5
th term which states that
, Holder’s inequality to the 3
rd and 4
th term f⊂L
P, q⊂L
Q,
and Itô isometry to the 2
nd, 3
rd, 4
th and last terms E
Using (17), (18), (23) and assumptions A1-A3:
We will now show the convergence of the solution of a fuzzy stochastic differential equation driven by fBm. The first part of (26) provides a bound on the supremum of the distance between the solution X(u) and a perturbed version of the solution Xε(u) over the interval [0,t]. Here denotes the supremum distance between the two processes, L is a Lipschitz constant, T is the final time, ε is small perturbation parameter, and a is the Hurst parameter of the fBm. The second part of (26) provides a bound on the second moment of the perturbed solution Xε(s). Here C is a constant and is the initial value of the perturbed process. We will apply Gronwall’s lemma to the inequality to show that the supremum of the distance between
converges to zero as ε approaches zero, which implies the convergence of the solution X(u) to a limit process as the perturbation parameter ε goes to zero. The equation provides an estimate for the convergence rate of the solution of the fuzzy stochastic differential equation driven by a fBm. To apply Gronwall’s lemma to the inequality, we first define and rewrite the inequality as: .
We can now apply Gronwall’s lemma, which states that if f(t) and g(t) are nonnegative continuous functions on [0,T] and if there exists a nonnegative constant K such that
then
. Applying this lemma we have:
Then, by lemma
, we obtain:
We can use the fact that the space of continuous functions equipped with the supremum norm is complete i.e., all Cauchy sequences converge, to conclude that the solution X(u) converges to a limit process as the perturbation parameter ε goes to zero. As ε approaches zero, the second and thirds terms in the exponent tend to zero faster than the first term which dominates for large t. Therefore, the exponent approaches zero as ε approaches zero, which implies that the supremum of the distance between X(u) and Xε(u) also approaches zero. This completes the proof.