3.1. Supply and Demand Models
It is assumed that the demand for housing in region
at time
is
and that the supply is
. These are multivariate functions that are driven by a variety of dynamic factors. Let us initially define the demand function,
, and the supply function,
, which are represented by Equations 1 and 2.
In this model, the variables represented by are demand-side variables, including population growth rate , income level , employment rate , migration rate , and cost of living . In contrast, the variables represented by are supply-side variables, including land availability , construction cost , policy environment , and permit approval time . Both sets of variables are modelled comprehensively through supply and demand models, which reflect the state of the market more comprehensively.
The limitations of simple linear models in capturing the intricate interdependencies between supply and demand are well-documented. To address this challenge, we propose the incorporation of interaction terms, which are defined as nonlinear interactions between demand-side and supply-side variables. These interactions are employed to elucidate the dynamics of demand and supply, such as the impact of population growth on housing supply within a specified policy context. The interaction item
is defined as Equations 3.
where
represents the interaction coefficient, which reflects the intensity of the interaction between the
variable on the demand side and the
variable on the supply side. To illustrate, the interaction of construction cost
with income level
may demonstrate that regions with higher incomes are more capable of bearing higher construction costs and, consequently, demonstrate greater sensitivity to income growth on supply side. The incorporation of these interactions into the demand and supply functions allows for the expression of the integrated model as Equation 4 and 5.
By means of the demand and supply functions, we define the regional supply-demand deficit measurement
as Equation 6.
In order to gain further insight into and predict the dynamic changes of supply and demand imbalance, we minimise the squared error of supply and demand imbalance in order to obtain the optimal solution of the model parameters
,
and
. The objective function is given by Equation 7.
By solving this function, the optimal combination of parameters can be estimated in order to minimise the imbalance between supply and demand. In practice, certain variables in the supply and demand model may be constrained by external factors. For instance, land availability
and construction costs
are frequently contingent upon local government policies and market conditions. These constraints are then introduced into the model, whereupon the Lagrange multiplier method is employed to solve the nonlinear optimisation problem with constraints. The aforementioned constraint can be expressed as Equation 8.
The introduction of the Lagrange multiplier
,
,
allows us to define Lagrange objective function as Equation 9.
By solving the aforementioned Lagrangian function, the optimal supply-demand equilibrium solution, taking into account the constraints, can be obtained.
3.2. Cluster Analysis
The initial step in the classification process is the utilisation of cluster analysis, which facilitates the grouping of regions with analogous characteristics pertaining to the housing market. Let us suppose that the supply-demand characteristics of region
at time
are represented by the pair of values
, where
denoting the demand-side feature and
denoting the supply-side feature. In order to facilitate the interpretation of complex multidimensional data, the k-means clustering algorithm is employed to divide the
regions into
clusters. This is done with the objective of minimising intra-class variance, as illustrated in Equation 10.
where
denotes the centroid of class
, while
represents the indicator function. If region
is deemed to belong to class
, then
is assigned a value of
; otherwise, it is assigned a value of
. By employing an iterative approach and adjusting the centroid position, it is possible to group regions with analogous supply and demand dynamics into a single class.
This classification not only facilitates the identification of the supply and demand characteristics of specific regions, but also allows for the implementation of disparate models or strategies for varying categories of regions in subsequent predictive analysis. In order to predict future imbalances between supply and demand, we have employed the use of machine learning algorithms based on regional clustering. The measurement of the supply-demand deficit is defined by the following Equation 11.
where
represents the housing demand of the
region at the time
, while
denotes the housing supply of the region. The objective is to forecast a de-measured
at a future point in time based on historical data. In order to capture the complex non-linear relationships between supply and demand variables, machine learning methods such as gradient boosted trees (GBMs) are employed.
The input features of the machine learning model comprise demand-side feature
and supply-side feature
, in addition to the results of cluster classification. By utilising the training set, the model is able to discern the historical trends associated with imbalances in supply and demand. The predictive model is expressed by the following Equation 12.
where
represents the prediction function of the machine learning model, whereas
denotes the cluster classification result for the
region. The clustering results,
, provide specific supply and demand dynamics for different categories of regions. Consequently, similar prediction models are employed for regions of the same class.