Heavy-Tailed Probability Distributions: Some Examples of Their Appearance

Lev b. Klebanov; Yulia V. Kuvaeva; Svetlozar T Rachev

doi:10.20944/preprints202305.1198.v1

Submitted:

16 May 2023

Posted:

17 May 2023

You are already at the latest version

Abstract

We give two examples of the appearance of heavy-tailed distributions in applications to social sciences. Among these distributions are the laws of Pareto, Lotka, and some new ones. The examples are illustrated by constructing suitable toy models.

Keywords:

heavy-tailed distributions

;

Pareto law

;

Lotka law

;

Zipf law

;

probability generating function.

Subject:

Computer Science and Mathematics - Probability and Statistics

1. History of the problems

Distributions with heavy (power-like) tails have been used in the social sciences for more than one hundred years.

Relevant studies include the following:

1.: Distribution of big capital. Pareto, 1896 (see [7])). The density is $p (x) = \frac{α}{x_{o}} {(\frac{x_{o}}{x})}^{α + 1}$ , for $x \geq x_{o}$ , $α > 0$ .
2.: Scientific production. The number of scientists who published one, two and so on x papers (the number x published by scientist papers). Lotka (1926) (see [5]) showed that $n (x) = n_{1} / x^{a}$ , where $n_{1} > 0$ , $a \leq 2$ (in many cases a is close to 2).
3.: Lotka’s law approximately holds for the number of citations of a paper by a scientist.
4.: For a specific artistic text, the sequence of all words is written in descending order according to the frequency of their occurrence. Comparing the frequency of the word and the place in this sequence (rank) leads to $x = B / r$ , $B = c o n s t$ (see [8]).

Why do these patterns emerge? Probably, Laws 1–3 refer to some individual human abilities, while Law 4 refers to the memory or other functions of the human brain.

We will not consider the fourth law in this paper and will focus on Laws 1 and 3, more precisely on their qualitative explanation. This is because Zipf explained his law based on the least effort principle. Although there are no rigorous results on the existence of a mechanism related to this principle in the human brain, not wasting memory seems natural. However, the application of the the least effort principle in Laws 1–3 does not seem to be related to the essence of the issues under consideration.

At first glance, everything looks quite simple. The population of a country is heterogeneous. There are people more capable in business (for Law 1) or scientific work (for Laws 2 and 3) and people who are not (or are less capable) of such activities.

But how big are differences in ability, and are all differences in ‘success’ determined by ability?

Is there an effect of chance? First, let’s focus on the first law. Let’s try to build a model that explains the reason for its occurrence.

However, the distributions of income and capital are subject to many factors not fully accounted for. Our interest is not in the whole mechanism of accumulation and distribution of capital but only in the roles of human talent and chance in this process. How essential are these roles? Therefore, we have to use a toy model which assumes all people have identical abilities. If the role of chance is small, then there will not be many variations in the model between different investors. On the other hand, if we see a large difference between investors, this will indicate a significant role of chance.

As noted, we want to give examples of the possible occurrence of distributions with power tails in connection with classical empirical facts. The presentation of modern results related to the use of such distributions does not belong to the scope of the problems considered here. The reader interested in studying the modern use of heavy-tailed distributions in financial problems is referred to [4] and the literature cited there.

2. A toy model for the distribution of capital

Let us consider the first toy model of the distribution of capital leading to the Pareto law.

Suppose for simplicity that there exists only one business. All possible investors are equal in their talents and initial capital. Consider the case when each investor invests one unit of capital in the business. After one unit of time, the business outcome is

X_{1}

, where

X_{1}

is a random variable. Suppose the investor leaves all this sum in the business, and the conditions on the market remain the same during the following time interval. Then the outcome after the second time interval is

X_{1} \cdot X_{2}

, where

X_{1}

and

X_{2}

are independent identically distributed (i.i.d.) random variables. In the same way, the outcome after the n-th time interval is

\prod_{i = 1}^{n} X_{j}

, where

X_{1}, X_{2}, \dots, X_{n}

are i.i.d. random variables. Let us suppose that the conditions on the market will change radically at a random moment

ν_{p}

so that investing in that business becomes not profitable. Therefore, the final outcome is

\prod_{j = 1}^{ν_{p}} X_{j}

. We are interested in the outcome behavior for large values of

ν_{p}

. More precisely, we suppose that

$X = {X_{1}, X_{2}, \dots, X_{n}, \dots}$ is a sequence of i.i.d. positive random variables, $a = I E log X_{1}$ ;
$ν = {ν_{p}, p \in Δ \subset (0, 1)}$ is a family of positive integer-valued random variables independent of the sequence $X$ , $I E ν_{p} = 1 / p$ .

Generally, no information on the $ν$ -family is available. We shall consider a few cases starting with a simple one.
$I P {ν_{p} = k} = p \cdot {(1 - p)}^{k - 1}$ , $k = 1, 2, \dots$ , i.e., $ν_{p}$ has a geometric distribution.

Define

Z_{p} = \prod_{j = 1}^{ν_{p}} X_{j}^{p}

.

Theorem 2.1.

Suppose that 1–3 hold. Let

a \neq 0

. Then

lim_{p \to 0} I P {Z_{p} < x} = 1 - x^{- 1 / a}, for x \geq 1, a > 0

and

lim_{p \to 0} I P {Z_{p} < x} = x^{1 / a}, for x \leq 1, a < 0 .

In the case of

a > 0

(a profitable business), we have a Pareto distribution, which Pareto had proposed on the basis of empirical study (see [7]). For the proof of Theorem 2.1 see [2]. In [2], this result is obtained for

a = 0

. For this case,

Z_{p}

must be changed to

Z_{p}^{'} = \prod_{j = 1}^{ν_{p}} X_{j}^{\sqrt{p}}

. Under the condition of the existence of the logarithmic second moment of

X_{1}

, the product

Z_{p}^{'}

converges in distribution to a mixture of the distributions given in Theorem 2.1. It is well-known that the Pareto distribution has heavy tails. This implies that capital belongs to a relatively small number of people. Now we see the Pareto distribution appears in a very natural way, described as a limit distribution for a product of a random number

ν_{p}

of random variables

X_{j}

. The value of

ν_{p}

,

p \in (0, 1)

in 3 had a geometric distribution. What will happen with other (‘natural’) distributions? Below we consider two additional cases:

4.: $ν_{p}$ has a probability generating function

$P (z, p, m) = \frac{p^{1 / m} z}{(1 - (1 - p) z^{m}))^{1 / m}}, p \in (0, 1), m \in N .$
5.: $ν_{p}$ has a probability generating function

$P (z, n) = \frac{1}{T_{n} (1 / z)},$

where $T_{n} (u)$ is Chebyshev polynomial of the first kind and $n = 1 / \sqrt{p}$ is its degree. $I E ν_{p} = 1 / p$ .

Let us consider case 4. The following result holds.

Theorem 2.2.

Suppose that the 1, 2, and 4 hold. Let

a \neq 0

. Then

lim_{p \to 0} I P {Z_{p} < x} = \int_{1}^{x} \frac{1}{b^{1 / m} Γ (1 / m) u^{1 + 1 / m} {log}^{1 - 1 / m} (u)} d u, for x \geq 1,

where

b > 0

is a parameter.

Proof.

Consider

log Z_{p} = p \sum_{j = 1}^{ν_{p}} Y_{j}

, where

Y_{j} = log X_{j}

. From the result of [6] it follows that the limit distribution of

log Z_{p}

as

p \to 0

has the density

exp {- u / b} / (u^{1 - 1 / m} b^{1 / m} Γ (1 / m))

,

u > 0

. Now it is sufficient to pass to the limit distribution of

Z_{p}

from its logarithmic density. □

Theorem 2.3.

Suppose that 1, 2, and 5 hold. Let

a = 0

and suppose that the second logarithmic moment of

X_{1}

exists. Then

lim_{p \to 0} I P {Z_{p}^{'} < x} = \frac{2}{π} arctan (x^{b}), for x > 0,

where

b > 0

is a parameter.

Proof.

Similarly to the proof of the previous theorem, we have to pass from

Z_{p}^{'}

to its logarithm, apply the corresponding result from [3], and go back to the limit distribution for the initial random variables. □

None of the three models constructed above take into account any abilities of the people investing in the given enterprise, but lead to heavy-tailed distributions. The difference between investors is only in the occurrence of some unfavorable event for them (the moment

ν_{p}

). An objection is that this moment is the same for the whole store, i.e., it is insolvent for all investors at once because the investors invested in the business at different times. Therefore, the period for which the investment was made is different for each investor. So we see that the dependence on the moment and the case are really very high. We do not deny that the dependence on the talent of the investor is indeed significant, but it would be very difficult to separate this component from random factors.

3. Distribution of the number of citations

A similar situation occurs when studying the distribution of the number of citations of scientific publications. Let us make some assumptions.

Assumption 1.

All scientists under consideration are equal in their scientific and literary abilities.

Assumption 2.

The citations of a paper occur independently.

Assumption 3. The probability that an article will be repeatedly cited depends on the number of previous citations. It is increasing in the number of citations. More precisely,

Assuming the probability that an article having

k - 1 (k \geq 1)

citations will have no further citations is

p_{k} = \frac{a}{(k + b)},

(3.1)

where

a > 0

and

b > a - 1

.

Let Y be a random variable describing the number of citations during the considered period. Assumption 1 implicitly de facto implies that Y has the same distribution for different papers because the scientific abilities of the authors are supposed to be the same.

In view of the independence of the citations, the probability that a paper is cited exactly n times is

I P {Y = n} = p_{n} \prod_{k = 1}^{n - 1} (1 - p_{k}) = \frac{{(\frac{a + b - 1}{a})}_{n - 1}}{(a n + b) {(\frac{a + b}{a})}_{n - 1}},

where

{(a)}_{n} = a (a + 1) \dots (a + n - 1)

is the Pochhammer symbol.

It is not difficult to calculate this probability

I P {Y \geq m} = \frac{{(\frac{a + b - 1}{a})}_{m - 1}}{{(\frac{a + b}{a})}_{m - 1}} \tilde{m \to \infty} \frac{Γ ((a + b) / a)}{Γ ((a + b - 1) / a)} \frac{1}{m^{1 / a}} .

(3.2)

The distribution of the number of citations.

The relation (3.2) shows that the distribution of the number of citations has a heavy tail, the severity of which depends on the value of the parameter a responsible for the degree of influence of previous citations. Therefore, a larger value of a corresponds to a heavier tail. In any case, the presence of such a tail makes it possible to conclude that the citation intensity of almost identical scientists can differ significantly, which leads to a significant stratification of the scientific community through various random circumstances that have nothing to do with research abilities. Thus, the number of citations seems meaningless as an indicator of scientific value.

Comments on Assumption 3. At first glance, the relation (3.1) seems to be not too natural. However, it seems almost unique asymptotically, leading to a heavy-tailed distribution. We will consider this in more detail, but without complete proofs (obtaining general mathematical results is not an aim of this paper).

Let Y be the (random) number of citations of a paper. Suppose that the distribution of Y has a power tail. In other words,

I P {Y \geq n} = \frac{C}{n^{α}} (1 + ϰ (n)),

(3.3)

where

ϰ (n) \underset{n \to \infty}{\to} 0

and has “regular” behavior in a sense. The symbol C is used for constants, possibly different. From (3.3) it follws that

I P {Y = n} = I P {Y \geq n} - I P {Y \geq n + 1} =

= \frac{C}{n^{α}} (1 - {(\frac{n}{n + 1})}^{α}) + \frac{C ϰ (n)}{n^{α}} (1 - \frac{ϰ (n + 1)}{ϰ (n)} {(\frac{n}{n + 1})}^{α}) .

(3.4)

Suppose that

\frac{ϰ (n + 1)}{ϰ (n)}

is bounded from above. Then the equality (3.4) implies that

I P {Y = n} = \frac{α C}{n^{α + 1}} (1 + ϰ_{1} (n)),

(3.5)

where

ϰ_{1} (n)

possesses the same properties as

ϰ (n)

.

If

I P {Y = n} = p_{n} \prod_{k - 1}^{n - 1} p_{k},

(3.6)

where

p_{k}

is the probability of the termination of citations, then

I P {Y \geq n} = \prod_{k = 1}^{n - 1} (1 - p_{k}) .

Under some restrictions on the behavior of

p_{k}

as

k \to \infty

, we have

\prod_{k = 1}^{n - 1} (1 - p_{k}) \sim exp {- \sum_{k = 1}^{n - 1} p_{k}} .

The symbol ∼ is used here for asymptotic equivalence as

n \to \infty

. Therefore, from (3.3) we must have

exp {- \sum_{k = 1}^{n - 1} p_{k}} \sim \frac{C}{n^{α}} as n \to \infty .

Taking logarithms of the both sides of the last relation yields

\sum_{k = 1}^{n - 1} p_{k} \sim α log (n - 1)

and

p_{n} \sim α log (n - 1) - α log (n - 2) \sim \frac{α}{n} .

It is clear that Assumption 3 leads to the same asymptotic behavior. However, the presence of the parameter b may make the asymptotics more precise if we fix not only the tail index

α

, but the corresponding constant C in (3.3).

There remains the question of how many distributions may be represented in the form (3.6)? Suppose that Y is a random variable taking positive integer values and such that

I P {Y = n} > 0

for any

n \in N

. Then there are probabilities

p_{n}

such that (3.6) holds. Indeed, write

κ_{n} = I P {Y = n}

and

p_{n} = \frac{κ_{n}}{1 - \sum_{k = 1}^{n - 1} κ_{k}} .

Then (3.6) holds.

Note that

p_{n}

represents intensity rate for the distribution of Y.

From the considerations given above it follows that, under mild restrictions, the distribution of a positive integer random variable possessing power tails has a representation (3.6) with

p_{k}

asymptotically equivalent to that of (3.1). The indicated method of the occurrence of heavy-tailed distributions on the set of positive integers turns out to be quite universal and probably can be applied for considerations of some classes of applied problems.

We now make some remarks on the Impact Factor distribution.

Let us now consider the possibility of using the impact factor of a journal as an indicator of the scientific significance of a paper published in it. The impact factor of a journal is calculated as the ratio of the number of citations of papers published over a certain period to the number of these papers themselves. The idea of considering such an average value is connected with the idea that, according to the law of large numbers, the influence of chance will be leveled. However, we shall show, this is not true.

We mention that there exists a rather large literature stating a scientific journal’s impact factor has essential value. Based on the observed data, the presence of asymmetry in the distribution of the impact factor and the presence of a heavy tail has been noted. However, these circumstances have not been analyzed from a theoretical point of view, and only comments are made on the advisability of replacing the arithmetic mean with some other statistics for the purpose of statistical data analysis. We note one of the typical works of this kind: [1]. True, the author notes the similarity of the distribution of some data with the Pareto distribution, but a mathematical analysis of the reasons for this is not carried out. In addition, the mathematically strict definition of a distribution is not considered, but only its ‘naive’ form. Below we will try to clarify the appearance of heavy tails of the impact factor distribution.

We assume that the number of papers submitted to the journal has a Poisson distribution. For simplicity, let us assume that the number of citations for each of the submitted papers has a Sibuya distribution. Then the citation distribution for all papers has a probability generating function that is a superposition of the generating functions of the Sibuya and Poisson laws. The probability generating function of this superposition is

P (z) = e^{- λ {(1 - z)}^{p}}

for fixed

λ > 0

and

p \in (0, 1)

. Clearly, this distribution has a heavy tail with index p. In view of the fact that

p < 1

, the law of large numbers is inapplicable in this situation. Moreover, in this case, the impact factor increases with the number of publications without increasing their scientific significance. The observed increase (over time) in the impact factors of leading journals confirms this circumstance.

Now we can conclude that the impact factor distribution has a heavy tail again and cannot be used as an indicator of scientific significance.

4. Conclusions

I.: It is shown that distributions with heavy tails can arise in some manifestations of social inequality (the distribution of capital, the number of citations, the impact factor) due to purely random reasons. In this case, the spread in the magnitude of inequality is significant.
II.: The circumstance specified in 1 makes it impossible to use such indices as the number of citations and/or the impact factor of a journal as an indicator of the scientific significance (scientific quality) of a published work.
III.: We do not need any proof of the existence of heavy tails for the distributions under consideration. Their presence follows from the mentioned papers by Lotka, Pareto, and Zipf published many years ago and has withstood the test of time.

References

Blanford C. F. (2016) Impact factors, citation distributions and journal stratification. Journal of Materials Science volume 51, 10319–10322. [CrossRef]
Klebanov L.B., Melamed J.A., Rachev S.T. (1987) On the products of a random number of random variables in connection with a problem from mathematical economics. In: Stability Problems for Stochastic Models, Lecture Notes in Mathematics, 1412, 103–109. [CrossRef]
Klebanov L.B., Kakosyan A.V., Rachev S.T., Temnov G. (2012) On a class of distributions stable under random summations. Journal of Applied Probability, 49, 303–318. [CrossRef]
Lindquist W.B., Rachev S. T., Hu Y., Shirvani A. (2022) Advanced REIT Portfolio Optimization. Innovative Tools for Risk Management, Springer. [CrossRef]
Lotka A. J. (1926). "The frequency distribution of scientific productivity". Journal of the Washington Academy of Sciences. 16 (12): 317–324.
Melamed, J.A. (1989). Limit theorems in the set-up of summation of a random number of independent and identically distributed random variables. In: Stability Problems for Stochastic Models, Lecture Notes in Mathematics, 1412, 194–228. [CrossRef]
Pareto V. (1964) Cours d’Économie Politique: Nouvelle édition par G.-H. Bousquet et G. Busino, Librairie Droz, Geneva, pp. 299–345.
Zipf G.K. (1949) Human Behavior and the Principle of Least Effort. Cambridge. Addison–Wesley.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Heavy-Tailed Probability Distributions: Some Examples of Their Appearance

Abstract

Keywords:

Subject:

1. History of the problems

2. A toy model for the distribution of capital

3. Distribution of the number of citations

4. Conclusions

References

MDPI Initiatives

Important Links

Subscribe