In order to adequately apply methods of non-asymptotic expectation and allocation to stable Pareto distributed outcomes, we must examine the history of all three of these facets. First, we will review the history of fat-tailed distributions in economic time series, leading to the proposal of the Generalized Hyperbolic Distribution. Next, we will explore the history of mathematical expectation as well as optimal allocation methods which we will apply to stable Pareto distributed outcomes.
1.1. Fat-Tailed Distributions in Economic Time Series – History
While explicit mentions of fat-tailed distributions in financial price context may have been limited prior to the cited references, observations and discussions regarding market fluctuation irregularities and extreme event occurrences existed in financial history.
Historical examples like the financial crises and market panics occurring throughout history provide insights into understanding market dynamics and extreme event presence. Documented observations of events such as the 17th century Tulip Mania and 18th century South Sea Bubble offer glimpses into the irregular, non-normal behaviors exhibited in financial prices.
The field of speculative trading and risk management, with roots in ancient civilizations, involved assessing trading risks and managing uncertainties in commodity prices and exchange rates. While formal conceptualization of fat-tailed distributions may not have been explicitly articulated during these early periods, experiences and observations related to market irregularities and extreme price movements contributed to gradually understanding the complexities and non-standard behaviors present in financial markets and asset prices.
Exploring historical accounts of financial crises and the evolution of trading practices in ancient and medieval civilizations provides valuable insights into the early recognition of irregular market behaviors and consideration of extreme events in financial transactions and risk management.
One of the early explicit mentions of fat-tailed distributions in financial context is found in the work of economist Alfred Marshall. His contributions to supply-demand theory and goods pricing included discussions on market price variability and economic outlier occurrences. While Marshall’s work focused on microeconomic principles, his insights into market fluctuation nature provided preliminary understanding of heavy-tailed distribution potential presence in economic phenomena.
The field of actuarial science and insurance, predating modern finance theory, involved risk analysis and extreme event occurrence, laying foundations for understanding tail risk and accounting for rare but significant events in risk management.
While these earlier references did not explicitly focus on fat-tailed distributions in finance, they contributed to understanding variability and risk in economic/financial systems and provided insights leading to modern heavy-tailed distribution theories and finance applications.
Marshall's 1890 “Principles of Economics” [
1] laid foundations for modern microeconomic theory and analysis. While focused on general economic principles, it included discussions on market dynamics, supply-demand, and price fluctuation factors.
Later, in his 1919 work “Industry and Trade” [
2], Marshall examined industrial organization dynamics and market functioning, providing insights into price variability and factors influencing economic fluctuations.
Marshall's works provided insights into economic principles and market dynamics, laying groundwork for modern economic theory development. While not explicitly discussing finance-related fat-tailed distributions, they contributed to understanding market variability and economic analysis, providing context on risk and uncertainty ideas in economic systems.
Louis Bachelier’s 1900 thesis “Théorie de la spéculation” [
3] delved into mathematical stock price modeling and introduced random walks, playing a significant role in efficient market hypothesis development and understanding financial price movements. While not focused on fat-tailed distributions, Bachelier's work is considered an early reference laying groundwork for understanding financial market stochastic processes.
The work of French mathematician Paul Lévy [
4,
5] made significant contributions to probability theory and stochastic processes, including those related to finance. Specifically, Lévy’s research on random variable addition and general stochastic processes greatly influenced modern probability theory development and understanding stochastic dynamics across fields. Lévy’s contributions to Brownian motion theory and introduction of Lévy processes provided groundwork for understanding asset price dynamics and modeling market fluctuations. Through foundational work on topics like Brownian motion, random processes, and stable distributions, Lévy helped establish mathematical tools and probability frameworks later applied to quantitative finance and modeling.
While explicit mentions of finance-related fat-tailed distributions may have been limited up to this time, Bachelier and Lévy’s contributions provided important insights into the probabilistic nature of finance and asset price movement governing processes.
In Feller [
6], we begin to find discussion of heavy-tailed distributions and their applications across fields. Although not explicitly mentioning stable Pareto distributions in finance-related context, Feller's work contributed to heavy-tailed distribution understanding.
Additionally, the notion of fat-tailed distributions in economics/finance was discussed by prominent statisticians and economists. Notably, Vilfredo Pareto [
7] introduced the Pareto distribution in the early 20th century, closely related to the stable Pareto distribution, though may not have been directly applied to price data then.
The concept of stable Pareto distributions in finance context was notably introduced by mathematician Benoit Mandelbrot in his influential fractal and finance modeling work. Mandelbrot extensively discussed heavy-tailed distribution presence and implications for finance modeling.
In particular, Mandelbrot [
8]discussed stable Pareto distributions as a potential model for certain financial data. This seminal paper laid foundations for his research on financial market fractal nature and non-Gaussian model applications describing market dynamics.
Additionally, Mandelbrot [
9]expanded on these ideas, delving into stable distribution concepts and their significance in understanding market fluctuations and risk dynamics.
Differences between Pareto distribution, originally described by Vilfredo Pareto, and stable Pareto distribution, referred to by Benoit Mandelbrot in 1963 work, lie in their specific characteristics and properties.
The Pareto distribution, introduced by Vilfredo Pareto in early 20th century, is a power-law probability distribution describing wealth and income distribution in society. It exhibits a heavy tail, indicating a small number of instances with significantly higher values than the majority. It follows a specific mathematical form reflecting the “Pareto principle” or “80-20 rule,” suggesting roughly 80% of effects come from 20% of causes.
The stable Pareto distribution, referred to by Benoit Mandelbrot, represents an extension of the traditional Pareto distribution. Mandelbrot’s 1963 work marks a pivotal point in applying stable distributions, including stable Pareto, to financial data and market behavior modeling. It incorporates additional parameters allowing greater flexibility in modeling extreme events and heavy-tailed phenomena across fields like finance and economics. The stable Pareto distribution accounts for distribution shape and scale stability across scales, making it suitable for analyzing complex systems with long-term dependencies and extreme fluctuations.
While both distributions share heavy-tailed characteristics and implications for analyzing extremes and rarities, the stable Pareto distribution, as conceptualized by Mandelbrot, provided a more nuanced and adaptable framework for modeling complex phenomena, especially in finance and other dynamic systems where traditional distributions typically fell short capturing real-world data intricacies.
1.1.1. Symmetry in the stable Pareto
Regarding skewness, the stable Pareto distribution is generally considered to be symmetric, meaning that it does not exhibit skewness. However, modifications or specific applications of the stable Pareto distribution may allow for asymmetrical characteristics or variations in skewness to better accommodate the properties of the data being modeled. While the traditional stable Pareto distribution is symmetric, its extensions or adaptations can incorporate asymmetry if required to capture the underlying characteristics of the data more accurately. Often, in modeling financial price data or trading results, we are dealing with data which is lognormal.
1.1.2. The Stable Pareto Distribution and the Meaning of "Stable"
The stable Pareto distribution derives its name in part from the property of stability. In the context of this distribution, the term "stable" refers to the stability of the distribution's shape and scale parameters across different scales or time periods. More specifically, it implies that the statistical properties and characteristics of the distribution remain consistent and do not significantly change over time or as the scale of the data varies.
This stability concept signifies that the stable Pareto distribution retains its fundamental shape and scaling properties, even when subjected to data transformations, aggregation, or changes in data range. This feature is essential for capturing the heavy-tailed behavior and extreme value characteristics often observed in complex systems like financial markets. Here, the distribution parameters need to remain relatively constant to accurately represent the underlying data dynamics.
While "stable" does not mean the parameters are entirely fixed or immune to any changes, it does suggest the distribution exhibits a degree of robustness and resilience to variations in data scaling or time. This enables the stable Pareto distribution to reliably model rare events, extreme fluctuations, and long-term dependencies. It provides insights into tail event behaviors and their impacts on overall system dynamics.
1.1.3. Relationship to Stationarity
The "stable" terminology does not directly imply the stable Pareto distribution is stationary in the statistical sense. Rather, it primarily refers to the stability of the distribution's shape and scale parameters. This stability property relates to the distribution's resilience to transformations and scaling, allowing it to retain its fundamental shape and scaling properties under various conditions.
While not inherently associated with stationarity like some time series, the stable Pareto does exhibit certain stable shape and scaling characteristics. This provides a reliable framework for modeling heavy-tailed data and extremes in various domains including finance, risk management, and statistical analysis. However, for time-varying data or financial time series, additional assessments may be required to evaluate stationarity and determine appropriate modeling approaches for capturing data dynamics over time.
1.1.4. Accounting for Potential Non-Stationarity
The stable Pareto distribution itself is often considered stationary. However, in applications like financial markets or economic data, the data generating process may deviate from strict stationarity assumptions. Factors such as market shifts or underlying process changes could lead to variations in the stable Pareto parameters over time, making the distribution appear non-stationary in certain contexts.
In such cases, appropriate statistical tools and time series analysis techniques are essential to accurately assess data stationarity or non-stationarity. These tools can identify any time-dependent patterns, trends, or structural breaks that may affect the stability of the distribution. They enable informed decisions regarding suitable modeling and analysis approaches for the data characteristics and context. While generally associated with stationarity, the specific data properties and analysis objectives should determine the appropriate stable Pareto modeling framework.
Testing for non-stationarity in probability distributions, including the stable Pareto distribution, typically involves analyzing the statistical properties of the data to identify any time-dependent patterns, trends, or structural breaks that indicate a departure from stationarity. Several statistical tests and methods can be employed to assess non-stationarity in probability distributions depending on the nature of the data and the specific characteristics of interest. Some common approaches include Unit root tests (including the ADF test, for an exposition of results see [33]), Structural Break Tests, Time Series Decomposition, and Cointegration Analysis. (See
Appendix A for a description of these.)
The jury is out as to whether economic data is stationary or not. Numerous papers have been written on the matter, and there are presently conflicting claims on the argument of stationarity in time series economic data. For our purposes, we shall assume stationarity, and point that in its absence, the same techniques may be employed after adjusting for non-stationarity.
1.1.5. Emergence of other fat-tailed distribution modeling techniques and the Generalized Hyperbolic Distribution (GHD)
With time, many other fat-tailed modelling techniques found their way into quantitative departments. The generalized hyperbolic distribution (GHD) had been introduced by [43]in examining aeolian processes.
The GHD constitutes a flexible continuous probability law defined as the normal variance-mean mixture distribution where the mixing distribution assumes the generalized inverse Gaussian form. Its probability density function can be expressed in terms of modified Bessel functions of the second kind, commonly denoted as BesselK functions in the literature. The specific functional form of the GHD density involves these BesselK functions along with model parameters governing aspects like tail behavior, skewness, and kurtosis. While not possessing a simple closed-form analytical formula, the density function can be reliably evaluated through direct numerical calculation of the BesselK components for given parameter values. The mathematical tractability afforded by the ability to compute the GHD density and distribution function underpins its widespread use in applications spanning economics, finance, and the natural sciences.
Salient features of the GHD include closure under affine transformations and infinite divisibility. The latter property follows from its constructability as a normal variance-mean mixture based on the generalized inverse Gaussian law. As elucidated by [44], infinite divisibility bears important connections to Lévy processes, whereby the distribution of a Lévy process at any temporal point manifests infinite divisibility. While many canonical infinitely divisible families (e.g. Poisson, Brownian motion) exhibit convolution-closure, whereby the process distribution remains within the same family at all times, Podgórski et al. showed the GHD does not universally possess this convolution-closure attribute.
Owing to its considerable generality, the GHD represents the overarching class for various pivotal distributions, including the Student's t, Laplace, hyperbolic, normal-inverse Gaussian, and variance-gamma. Its semi-heavy tail properties, unlike the lighter tails of the normal distribution, enable modeling of far-field behaviors. Consequently, the GHD finds ubiquitous applications in economics, particularly in financial market modeling and risk management contexts, where its tail properties suit the modeling of asset returns and risk metrics, and by 1995, the GHD appears in financial markets application in [34]. They applied the hyperbolic subclass of the GHD to fit German financial data.
This work was later extended by Prause in 1999 [35], who applied GHDs to model financial data on German stocks and American indices. Since then, the GHD has been widely used in finance and risk management to model a wide range of financial and economic data, including stock prices, exchange rates, and commodity prices. See [36], [37], [38], [39], [40], [41], [46], [42] for applications of the GHD to economic and share price data.
The GHD has been shown to provide a more realistic description of asset returns than the other classical distributional models, and has been used to estimate the risk of financial assets and to construct efficient portfolios in energy and stock markets.
The generalized hyperbolic distribution (GHD) has emerged as an indispensable modeling tool in modern econometrics and mathematical finance due to its advantageous mathematical properties and empirical performance. Salient features underscoring its suitability for analyzing economic and asset price data include:
Substantial flexibility afforded by multiple shape parameters, enabling the GHD to accurately fit myriad empirically observed non-normal behaviors in financial and economic data sets. Both leptokurtic and platykurtic distributions can be readily captured.
Mathematical tractability, with the probability density function, cumulative distribution function, and characteristic function expressible in closed analytical form. This facilitates rigorous mathematical analysis and inference.
Theoretical connections to fundamental economic concepts such as utility maximization. Various special cases also share close relationships with other pivotal distributions like the variance-gamma distribution.
Empirical studies across disparate samples and time horizons consistently demonstrate superior goodness-of-fit compared to normal and stable models when applied to asset returns, market indices, and other economic variables.
Ability to more accurately model tail risks and extreme events compared to normal models, enabling robust quantification of value-at-risk, expected shortfall, and other vital risk metrics.
Despite the lack of a simple analytical formula, the distribution function can be reliably evaluated through straightforward numerical methods, aiding practical implementation.
In summary, the mathematical tractability, theoretical foundations, empirical performance, and remarkable flexibility of the GHD render it exceptionally well-suited to modeling the non-normal behaviors ubiquitous in economic and financial data sets. Its rigor and applicability render it a standard apparatus in modern econometrics.
1.2. Mathematical Expectation – History
The genesis of mathematical expectation lies in the mid-17th century study of the "problem of points," which concerns the equitable division of stakes between players forced to conclude a game prematurely. This longstanding conundrum was posed to eminent French mathematician Blaise Pascal in 1654 by compatriot Chevalier de Méré, an amateur mathematician. Galvanized by the challenge, Pascal commenced a correspondence with eminent jurist Pierre de Fermat, and the two deduced identical solutions rooted in the fundamental tenet that a future gain's value must be proportional to its probability.
Pascal's 1654 work [
10] compiled amidst his collaboration with Fermat, explores sundry mathematical concepts including foundational probability theory and expectation calculation for games of chance. Through rigorous analysis of gambling scenarios, Pascal delved into mathematical quantification of expected values, establishing core principles that seeded probability theory's growth and diverse applications. The "Pascal-Fermat correspondence" offers insights into the germination of mathematical expectation and its integral role in comprehending the outcomes of stochastic phenomena.
Contemporaneously in the mid-17th century, the Dutch mathematician, astronomer, and physicist Christiaan Huygens significantly advanced the field of probability and study mathematical expectation. Building upon the contributions of Pascal and Fermat, Huygens' 1657 [
11] incisively examined numerous facets of probability theory, with particular emphasis on games of chance and the calculation of fair expectations in contexts of gambling. He conceived an innovative methodology for determining expected value, underscoring the importance of elucidating average outcomes and long-term behaviors of random processes.
Huygens' mathematical expectation framework facilitated subsequent scholars in expanding on his ideas to construct a comprehensive system for disentangling stochastic phenomena, quantifying uncertainty, and illuminating the nature of randomness across diverse scientific realms.
Essentially, the calculation for mathematical expectation, or "expectation" remained unchanged for centuries until [
12]. It was Vince who decided to look at a different means of calculating it to reflect an individual or group of individuals who are in an "existential" contest of finite length. Consider a young trader placed on an institution's trading desk, and given a relatively short time span to "prove himself" or be let go, or the resources used by a sports team in an "elimination-style" playoff contest, or such resources of a nation state involved in an existential war.
Such endeavors require a redefinition of what "expectation" – what the party "expects" to have happen, and can readily be defined as that outcome where half the out-comes are better or the same and half are worse or the same at the end of the specified time period or number of trials.
Thus, the answer to this definition of "expectation" is no longer the classic, centuries-old definition of the probability-weighted mean outcome, but instead is the mean sorted cumulative outcome over the specified number of trials.
For example, consider the prospect of a game where one might win 1 unit with probability of .9, and lose -10 units with probability of .1. The classical expectation is to lose -0.1 per trial.
Contrast this to the existential contest which is only 1 trial long. Here, what one "expects" to happen is to win 1 unit. In fact, it can be shown that if one were to play and quit this game after 6 trials, one would "expect" to make 6 units (that is, half the out-comes of data distributed such would show better or same results, and half would show worse or same results.) After 7 trials, with these given parameters, however, the situation turns against the bettor.
Importantly, as demonstrated in [
12], this non-asymptotic expectation converges to the asymptotic, centuries-old "classic" expectation in the limit as the number of trials approaches infinity. Thus, the "classic" expectation can be viewed as the asymptotic manifestation of Vince's more general non-asymptotic expression.
This redefined "expectation" then links to optimal resource allocations in existential contests. The non-asymptotic expectation provide a modern conceptualization of "expectation" tailored to modeling human behavior in high-stakes, finite-time scenarios, and, in the limit as the number of trials gets ever-large, converges on the classical definition of mathematical expectation.
1.3. Optimal Allocations – History
The concept of geometric mean maximization originates with Daniel Bernoulli, who made the first known reference to it in 1738. Prior to that time, there is no known mention in any language of even generalized optimal reinvestment strategies by merchants, traders, or in any of the developing parts of the earth. Evidently, no one formally codified the concept. If anyone contemplated it, they did not record it.
Bernoulli's 1738 paper [
13]was originally published in Latin. A German translation appeared in 1896, and it was referenced in John Maynard Keynes' [
14]. In 1936, John Burr Williams' [
15] paper pertaining to trading in cotton, posited that one should bet on a representative price. If profits and losses are reinvested, the method of calculating this price is to select the geometric mean of all possible prices.
Bernoulli's 1738 paper was finally translated into English in Econometrica in 1954. When game theory emerged in the 1950s, concepts were being widely examined by numerous economists, mathematicians, and academics. Against this fertile backdrop, John L. Kelly Jr. [
16] demonstrated that to achieve maximum wealth, a gambler should maximize the expected value of the logarithm of his capital. This is optimal because the logarithm is additive in repeated bets and satisfies the law of large numbers. In his paper, Kelly showed how Claude Shannon’s information theory [
17] could determine growth-optimal bet sizing for an informed gambler.
Maximizing the expected end wealth is known as the Kelly criterion. Whether Kelly knew it or not, the cognates to his paper are from Daniel Bernoulli. In all fairness, Bernoulli was likely not the originator either. Kelly's paper presented this as a solution to a technological problem absent in Bernoulli's day.
Kelly's paper makes no reference to applying the methods to financial markets. The gambling community embraced the concepts, but applying them to various capital markets applications necessitated formulaic alterations, specifically scaling for the absolute value of worst-case potential outcome,
which become particularly important given fat-tailed probabilities. In fairness, neither Kelly nor Shannon were professional market traders, and the work presented didn't claim applicability to finance. However
, in actual practice, scaling to worst-case outcomes is anything but trivial1. The necessary scaling was provided in [
18].
In subsequent decades after [
16], many papers expanded on geometric growth optimization strategies in capital markets contexts by numerous researchers, notably Bellman, Kalaba [
19], Breiman [
20], Latane [
21,
22], Tuttle [
22], Thorp [
23,
24] and others. Thorp, a colleague of Claude Shannon, developed a winning strategy for Blackjack using the Kelly Criterion and presented closed-form formulas to determine the Kelly fraction [
23].
The idea of geometric mean maximization was also well-critiqued. Samuelson [26,27], Goldman [28], Merton [29] and others argued against universally accepting it as the investor criterion. Samuelson [26] highlighted the Kelly fraction is only optimal asymptotically, as the number of trials approaches infinity, not for finite trials, whereas, in fact, it would always be sub-optimal.
The formulation that yields the growth-optimal allocation to each component in a portfolio of multiple components for a finite number of trials is provided in [
12]. This approach incorporates the net present value of amounts wagered and any cash flows over the finite timespan, acknowledging that real-world outcomes typically do not manifest instantly.
Growth-optimality, however, is not always the desired criterion. The formulaic framework for determining it can be used to discern other "optimal" allocations. In Vince & Zhu [30], we find two fractional allocations less than the growth optimal allocation for maximizing the various, catalogued, return-to-risk ratios. This is further expanded upon in de Prado, Vince, and Zhu [31].
Finally, [32] provides the case of using the formula for geometric growth maximization as a "devil's advocate" in contexts where the outcome from one period is a function of available resources from previous periods, and one wishes to diminish geometric growth. Examples include certain biological/epidemiological applications and "runaway" cost functions such as national debts.