It is well known that when analyzing whether there are fixed-effects in a panel data, only the values of for each independent variable j () in each () series data in the time period t () are usually given without knowing the distribution functions.
Therefore, there are two main works should do. First, we need to know what type of distributions , , , and satisfy (e.g., polynomial distribution, linear distribution, or constants). Second, we need to estimate their parameters according to their distribution types, so as to realize the fixed-effects prediction. Of course, if there are no fixed-effects in the original panel data, such speculation and estimation should be proved not to be accepted. Otherwise, the model is invalid.
2.2.1. Curve Fitting
The idea of curve fitting is to find a mathematical model that fits a series of data points, possibly subject to constraints. It is assumed that users have theoretical reasons for picking a function of a certain form in general. The curve fitting finds the specific coefficients (parameters) which make that function match the data as closely as possible. In detail,raw data and a function with unknown coefficients are given in curve fitting. The main target is to find values for the coefficients such that the function matches the raw data as well as possible. The best values of the coefficients are the ones that minimize the value of chi-square. Chi-square is defined as
where
is a fitted value (model value) for a given point
on the function
f with the estimated parameter vector
.
is the measured data value for
and
is an estimate of the standard deviation for
. In other words, for raw data points consisting of
, the minimum chi-square with the estimated parameter set
for the given function
f is expected. To get best paramters
, nonlinear least square is used.
For the nonlinear built-in data fitting functions and for user-defined functions, the operation must be iterative. Igor tries various values for the unknown coefficients. For each try, it computes chisquare searching for the coefficient values that yield the minimum value of chi-square.
Given some initial guess of the parameter values
, hold
constant and form the first order Taylor approximation
and the nonlinear least square problem has become a linear least square promblem [
28].
2.2.2. Detailed Steps
(1)Curves fitting in series datas
A panel data includes a group of time series datas. For each time series data, the industry expert can give a function to describe the data distribution between independent variables including time period and dependent variable according to their prior knowledge. First, the distribution type of the function marked as
can be guessed based on the theoretical knowledge held by the expert where
represents the funtion
whose paramter vector is
. For example, the threshold effect may be related to the sigmoid function, and that U-shaped curves usually correspond to quadratic functions. Second, the initial parameters
can be guessed as
by the expert. Third, given the function
and initial parameters
, truth parameters can be estimated by the curve-fitting with the nonlinear least square following the equations
10 and
11. According to the equation
4, the ideal result is
However,
is usually unknown. Thus, the smaller the residual(i.e., the equation
10), the better. If the residual is too big(e.g., greater than a give threshold),
should be adjusted to meet the requirements, or the fitting should be rejected if they are not satisfied after many adjustments.
Particularly, can be either linear or nonlinear. That is to say, the functions satisfy the nonlinear leads an extension of curve fitting in classical fixed-effects analysis.
Similarly, if the time series data are mixed and the overall panel data distribution characteristics(marked
) are estimated. The ideal result is
as shown in the equation
4. The smaller the residual(i.e., the equation
10), the better. If the residual is too big(e.g., greater than a give threshold),
should be adjusted to meet the requirements, or the fitting should be rejected if they are not satisfied after many adjustments.
Therefore, if both
and
are acceptable, the heterogeneity(i.e., fixed-effect) of the
time series data in the panel data
is
,or else the fixed-effect isn’t acceptable.
(2)Time-effects and individual fixed-effects estimate
If the independent variables and the time periods are independent of each other when there are fixed-effects in the panel data
, separate
and
with time and other independent variables into two parts, namely,
where
fits the effects of independent variables except the time and
fits the effects of time on the dependent variable in each series data.
Similarly,
where
fits the effects of independent variables except the time and
fits the effects of time on the dependent variable in the global panel data.
Combining the equation
15 and
16 and comparing with the equation
7, we can see that the closer
,
,
,
are correspondingly to
,
,
,
, the better. Thus, following the equation
7, the time-effect of the
series data in the panel data can be initialized as
, and the corresponding individual fixed-effect can be initialized as
However, either
or
may be offset by estimating together with other independent variables. In order to correct this bias, we use the principle of least squares to correct
and
based on the loss function over the the function similarity. The loss function is defined as
instead of the equation
10. And then, the function with the best parameter vector for
can be catched.
Similarly, the loss function is defined as
instead of the equation
10. And then, the function with the best parameter vector for
can be catched.
Along this line, the time-effects can be corrected as
, and the corresponding individual fixed-effect can be corrected as
If
but
, it means there is an absolute heterogeneiry affected by time
t (i.e., absolute time-effect) in the
series data. Additionally, if
with most likely
because series stochasticity affected by
t most likely leads to panel stochasticity affected by
t, it means the series data is randomly distributed and the significance of heterogeneity affected by
t (i.e., time-effect) is not explored. If
, that is to say, the
series data distributes the same affected by
t with the whole panel data. In other words, there is no heterogeneity affected by
t (i.e., no time-effect) between the
series data and the panel data when
.
where
represents the time-effect of the
series data in the the panel data.
If
but
, it means there is an absolute heterogeneiry (i.e., absolute individual fiexed-effect) affected by
X in the
series data. Additionally, if
with most likely
because series stochasticity affected by
X most likely leads to panel stochasticity affected by
X, it means the series data is randomly distributed and the significance of heterogeneity affected by
X (i.e., individual fixed-effect) is not explored. If
, that is to say, the
series data distributes the same affected by
X with the whole panel data. In other words, there is no heterogeneity affected by
X (i.e., no individual fixed-effect) between the
series data and the panel data when
.
where
represents the individual fixed-effect of the
series data in the the panel data.