Preprint
Article

Generalized Pareto Distribution of Firm Sizes: Evidence from China

Altmetrics

Downloads

72

Views

34

Comments

0

This version is not peer-reviewed

Submitted:

26 April 2024

Posted:

28 April 2024

You are already at the latest version

Alerts
Abstract
It has been empirically observed that the upper tail of the firm size distribution follows either the Pareto distribution or the Zipf distribution, with both patterns being explained by Gibrat’s law (Gabaix, 1999). This article analyzed firm revenue data from China, spanning from 2005 to 2013, to examine the whole range of the firm size distribution. Our empirical analysis revealed that firm revenue data over these years is well fitted by a three-parameter generalized Pareto distribution, with the fitted parameters indicating a dichotomy: The size distribution of large-sized firms, namely the upper tail, is asymptotically characterized by a Pareto distribution or a Zipf distribution, whereas the size distribution of smaller and medium-sized firms is approximated by an exponential distribution. This finding suggests that Gibrat’s law should be extended to account for the emergence of a generalized Pareto distribution.
Keywords: 
Subject: Business, Economics and Management  -   Economics

1. Introduction

It has been empirically observed that the upper tail of the firm size distribution follows either the Pareto distribution or the Zipf distribution (Axtell, 2001; Gabaix, 1999, 2009, 2016). Both laws state that the fraction of firms with a size greater than x is inversely proportional to x itself; that is,
P X > x = x x m a ,
where a 1 and x x m .
Equation (1) represents the Pareto distribution. In particular, when a = 1 , it corresponds to the Zipf distribution, which has been proposed to characterize the upper tail of the size distribution of firms and cities (Axtell, 2001; Gabaix, 1999, 2009, 2016; Malevergne et al., 2013). In contrast, the Pareto distribution with a > 1 is commonly used to describe the upper tail of the distribution of income and wealth (Nirei and Souma, 2007; Gabaix, 2009; Benhabib et al., 2011; Aoki and Nirei, 2017; Jones and Kim, 2018). To account for the origin of the Pareto distribution (1), Gabaix (1999), applying Gibrat’s law, proposed the following random growth model:
d X t = X t μ d t + σ d Z t ,
where X t denotes the size of a firm (or city) at the time t , μ and σ are two parameters, and d Z t ~ N 0 , d t denotes the standard Brownian motion that is independent of the X t .
The stochastic differential equation (2) is a mathematical representation of Gibrat’s law (Gibrat, 1931), which posits that the growth rate a firm’s (or city’s) size, r = d X t d t / X t , is independent of its size X t . Equation (2) has been widely used to understand the inequality in the upper tail of the distribution (Gabaix, 2009; Benhabib et al., 2011; Aoki and Nirei, 2017; Jones and Kim, 2018). However, the validity of the Pareto distribution (1) in estimating the threshold parameter has been recently questioned (Jenkins, 2017). This challenge may result in a biased estimation of inequality (Charpentier and Flachaire, 2022). To address this problem, the generalized Pareto distribution (GPD) has been shown to provide more reliable results (Jenkins, 2017; Charpentier and Flachaire, 2022). Therefore, there is a need to consider applying the GPD rather than the Pareto distribution.
Empirical observations have suggested that Gibrat’s law (2) does not hold exactly but only asymptotically for large-sized firms (Almus, 2000; Becchetti and Trovato, 2002; Daunfeldt and Elert, 2013). Based on this observation, Tao (2024) proposed an extension of equation (2) by taking smaller-sized firms into account, as follows:
d X t = 1 + η X t μ d t + σ d Z t ,
where η is a parameter.
Equation (3) asymptotically approaches the functional form of equation (2) as X t . Tao (2024) has demonstrated that if firm size x evolves according to equation (3), the resulting size distribution conforms to the GPD. Since equation (3) extends equation (2) by incorporating smaller-sized firms, the GPD is anticipated to encompass the entire range of the firm size distribution. In this paper, we employ firm revenue data from China, ranging from 2005 to 2013, to examine the whole range of the firm size distribution. For this purpose, our dataset includes a large sample from both large-sized firms and small and medium-sized firms.

2. The Model

Tao (2024) has shown that, if the dynamics of firm size x obey the stochastic differential equation (3), the resulting density distribution f x , t satisfies a Kolmogorov forward equation as follows:
f x , t t = μ 1 + η x f x , t x + 1 2 2 σ 2 1 + η x 2 f x , t x 2 .
The solution of equation (4) can be written as (Tao, 2024):
f x = η + 1 θ 1 + η x 0 1 + η x 1 + η x 0 1 θ η 2 x x 0 ,
where θ = σ 2 2 μ with μ < 0 .
Here, we use F X x = x 0 x f z d z to denote the cumulative distribution. Thus, by equation (5), one has (Tao, 2024):
F X x = 1 + η x 1 + η x 0 1 θ η 1 ,
where x x 0 .
Equation (6) represents a generalized Pareto distribution (GPD) with three parameters. To see this, one can rewrite it in the standard form of the GPD:
F X x = 1 + A x x 0 B 1 A ,
where A = θ η 1 + θ η and B = θ 1 + η x 0 1 + θ η .
It is easy to check that, when η > 0 , the GPD (6) has an asymptotic Pareto tail on the right side of the distribution. In particular, when η approaches 0 sufficiently, the GPD (6) can be decomposed into a two-class pattern as follows (Tao, 2024):
F X x e x p x x 0 θ x 0 x < x m 1 η x x m 1 θ η 1 x > x m 1 η ,
where x m = 1 + η x 0 / η .

3. Data Analysis

We employ firm revenue data from China, which spans from 2005 to 2013, to fit the GPD (6). However, due to the lack of data for the year 2010, this year has been excluded from our analysis. The raw data were collected from the Chinese Industrial Enterprises Database (CIED), which includes a large sample consisting of both state-owned and non-state-owned firms. In existing literature, firm size is typically measured by either the number of employees (Gabaix, 2009) or revenue (Chen et al., 2023). In this paper, we choose to use the firm’s revenue as the index for measuring its size.
To mitigate the potential impact of outliers within the dataset, we have chosen to exclude firm samples with negative or zero revenue, as these may indicate instances of bankruptcy. This implies that we are not considering the birth and death processes of firms. In fact, when these processes are taken into account, the resulting distribution of firm sizes follows a generalized double-Pareto distribution (Tao, 2024). The descriptive statistics of the raw data are presented in Table 1, displaying the number of observations, minimum, maximum, average, and median of each year’s data. To fit the GPD (6), we organized the firm revenue data into cumulative percentages at various revenue quantiles.
Figure 1a shows the fit of the GPD (6) to the data in China from 2005 to 2013. One can observe that the agreement between the GPD (6) and the data is very good. In Table 2, we used nonlinear least squares fitting in MATLAB to estimate three parameters in the GPD (6) and R 2 for each year, with all R 2 values exceeding 0.999. According to Table 2, we find that η is approximately in the order of 10 5 . This implies that η sufficiently approaches 0; therefore, by equation (8), the firm size distribution can be decomposed into a two-class pattern, where the size distribution of large-sized firms is asymptotically characterized by the Pareto distribution, while the size distribution of smaller and medium-sized firms is approximated by an exponential distribution. Figure 1b displays the two-class pattern.
The large-sized firms correspond to the upper tail in equation (8), where the Pareto exponent is denoted by a = 1 + 1 / θ η . We have listed the estimated values of a in Table 2, which are in the vicinity of 1 for all years, roughly agreeing with the Zipf distribution. However, smaller and medium-sized firms follow an exponential distribution rather than a Pareto distribution. This suggests that, unlike large-sized firms, smaller and medium-sized firms face fiercer competition and hence generate significantly less economic profit, aligning with the conditions of competitive market theory, where the exponential distribution is indicative of a spontaneous order in the firm size distribution, as described by Tao (2016).

4. Conclusion

Existing research has shown that the upper tail of the firm size distribution follows either the Pareto distribution or the Zipf distribution, with both patterns being explained by Gibrat’s Law in the form of equation (2). Using firm revenue data from China (2005-2013), our empirical analysis suggests that the firm size distribution is adequately represented by a three-parameter generalized Pareto distribution, which is explained by an extension of Gibrat’s Law presented by equation (3). The fitted parameters reveal a dichotomy: The size distribution of large-sized firms, namely the upper tail, is asymptotically characterized by a Pareto distribution or a Zipf distribution, and the size distribution of smaller and medium-sized firms is approximated by an exponential distribution. Our findings suggest extending the version of Gibrat’s law given in equation (2) to the form in equation (3) to accommodate the emergence of a generalized Pareto distribution.

Author Contributions

Yong Tao designed research; Ruoxi Liu organized raw data for the research; Ruoxi Liu analyzed data; Yong Tao wrote the paper.

Data Availability Statement

This study analyzed publicly available datasets from the Chinese Industrial Enterprises Database.

Acknowledgement

This work was supported by the Social Science Planning Project of Chongqing (Grant No. 2019PY40) and the Research project on education and teaching reform in Southwest University (Grant No. 2021JY045)

Author Information

The authors declare no competing interests. Correspondence and requests for materials should be addressed to Yong Tao (taoyingyong@swu.edu.cn) or Ruoxi Liu (liuruoxi64@163.com).

References

  1. Almus, M. Testing “Gibrat’s Law” for Young Firms – Empirical Results for West Germany. Small Business Economics 2000, 15, 1–12. [Google Scholar] [CrossRef]
  2. Aoki, S.; Nirei, M. Zipf’s Law, Pareto’s Law, and the Evolution of Top Incomes in the United States. American Economic Journal: Macroeconomics 2017, 9, 36–71. [Google Scholar] [CrossRef]
  3. Axtell, R. Zipf Distribution of U.S. Firm Sizes. Science 2001, 293, 1818–1820. [Google Scholar] [CrossRef] [PubMed]
  4. Becchetti, L.; Trovato, G. The Determinants of Growth for Small and Medium Sized Firms. The Role of the Availability of External Finance. Small Business Economics 2002, 19, 291–306. [Google Scholar] [CrossRef]
  5. Benhabib, J.; Bisin, A.; Zhu, S. The Distribution of Wealth and Fiscal Policy in Economies With Finitely Lived Agents. Econometrica 2011, 79, 123–157. [Google Scholar]
  6. Charpentier, A.; Flachaire, E. Pareto models for top incomes and wealth. Journal of Economic Inequality 2022, 20, 1–25. [Google Scholar] [CrossRef]
  7. Chen, Y.; Hsu, W.; Peng, S. Innovation, firm size distribution, and gains from trade. Theoretical Economics 2023, 18, 341–380. [Google Scholar] [CrossRef]
  8. Daunfeldt, S.; Elert, N. When is Gibrat’s law a law? Small Business Economics 2013, 41, 133–147. [Google Scholar] [CrossRef]
  9. Dragulescu, A.; Yakovenko, V.M. Exponential and power-law probability distributions of wealth and income in the United Kingdom and the United States. Physica A 2001, 299, 213–221. [Google Scholar] [CrossRef]
  10. Gabaix, X. Zipf’s Law for Cities: An Explanation. Quarterly Journal of Economics 1999, 114, 739–767. [Google Scholar] [CrossRef]
  11. Gabaix, X. Power Laws in Economics and Finance. Annual Review of Economics 2009, 1, 255–294. [Google Scholar] [CrossRef]
  12. Gabaix, X. Power Laws in Economics: An Introduction. Journal of Economic Perspective 2016, 30, 185–206. [Google Scholar] [CrossRef]
  13. Gibrat, R. Les inegalites economiques; Librairie du Receuil Sirey: Paris, 1931. [Google Scholar]
  14. Jenkins, S.P. Pareto Models, Top Incomes and Recent Trends in UK. Economica 2017, 84, 261–289. [Google Scholar] [CrossRef]
  15. Jones, C.I.; Kim, J. A Schumpeterian Model of Top Income Inequality. Journal of Political Economy 2018, 126, 1785–1826. [Google Scholar] [CrossRef]
  16. Malevergne, Y.; Saichev, A.; Sornette, D. Zipf’s law and maximum sustainable growth. Journal of Economic Dynamics and Control 2013, 37, 1195–1212. [Google Scholar] [CrossRef]
  17. Nirei, M.; Souma, W. A Two Factor Model of Income Distribution Dynamics. Review of Income and Wealth 2007, 53, 440–459. [Google Scholar] [CrossRef]
  18. Tao, Y. Spontaneous economic order. Journal of Evolutionary Economics 2016, 26, 467–500. [Google Scholar] [CrossRef]
  19. Tao, Y. Generalized Pareto Distribution and Income Inequality: An extension of Gibrat’s law. AIMS Mathematics. 2024, Forthcoming. [Google Scholar] [CrossRef]
Figure 1. a. Fitting results of generalized Pareto distribution (6). b. Fitting results of the two-class distribution (8).
Figure 1. a. Fitting results of generalized Pareto distribution (6). b. Fitting results of the two-class distribution (8).
Preprints 104979 g001
Table 1. Descriptive statistics of data.
Table 1. Descriptive statistics of data.
Year Obs MIN MAX AVG MEDIAN
2005 269553 10 1.49 × 108 92094 19270
2006 299704 10 1.88 × 108 104827 21735
2007 335018 10 1.96 × 108 119572 25580
2008 368529 10 2.28 × 108 121711 26143
2009 335535 10 2.1 × 108 139400 32344
2011 302591 150 2.6 × 108 271600 77667
2012 323960 10 4.13 × 108 277935 79831
2013 344831 145 4.77 × 108 292601 84302
Table 2. Fitting parameters.
Table 2. Fitting parameters.
Year η x 0 θ a R 2
2005 7.34 × 10-5 4.07 × 103 1.10 × 105 1.12 0.999
2006 5.89 × 10-5 4.07 × 103 1.22 × 105 1.14 0.999
2007 4.41 × 10-5 4.34 × 103 1.14 × 105 1.20 0.999
2008 4.47 × 10-5 4.26 × 103 1.40 × 105 1.16 0.999
2009 3.27 × 10-5 4.52 × 103 1.56 × 105 1.20 0.999
2011 1.43 × 10-5 1.90 × 104 1.79 × 105 1.39 0.999
2012 1.47 × 10-5 1.84 × 104 2.17 × 105 1.31 0.999
2013 1.58 × 10-5 1.93 × 104 2.94 × 105 1.22 0.999
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2024 MDPI (Basel, Switzerland) unless otherwise stated