1. Introduction
1.1. Urban Form and CE
Carbon emissions (CE) from fossil fuels (e.g., paraffin, gas, coal, and natural gas) have driven the global climate change (Du and Li, 2019; Qian et al., 2022) which result in more frequent natural disasters (Shi et al., 2022), causing societal crisis such as the insecurity in portable water (Huang and Tao, 2020) and energy (Ryu et al., 2014). China, as one of the main emitters (Liu, Li and Ji, 2021), generates ~10 billion tons CE annually – roughly 1/3 of all nations (Joint Research Centre (European Commission) et al., 2023). In reaction, China is committing to achieve the “3060” goal with CE reduction measures across many sectors (CSC, 2021; He, Liu and Wang, 2022). Notably, the residential sector is the second largest emitter which accounts for 23% of the Total Final Consumption (TFC) of fossil fuels (Fan et al., 2013; Yuan, Wang and Zuo, 2013). Considering the rapid growth of urbanization, the numerous population of urban dwellers and the corresponding lifecycle energy consumption of the residential buildings play a crucial role (Park and Heo, 2007; Baiocchi, Minx and Hubacek, 2010; Cao et al., 2020).
Consequently, for China to successfully transit to a low-carbon economy, the neighborhood level CE reduction measures become essential (Cheng et al., 2022). It’s the basic spatial unit in China that includes urban dwellers, their traffics, industry productions – a microcosm of the urbanization process (Zhang, Song and Yang, 2021). Therefore, street block level urban form reflects a city’s efficiency regarding the allocation and utilization of energy resources (Wang et al., 2019).
Along this line, this study hypothesizes that the neighborhood level urban form directly and indirectly influences CE through its multi-dimensional variables such as the land use and building density. Fully understand the interlinkages between the two can inform a more sustainable urban development to achieve the reduction goal (Zheng et al., 2023).
To understand how the urban form affect CE requires a capability to accurately model greenhouse gas concentration, as well as a comprehensive dataset to capture factors influencing CE at the individual and regional levels (Kumar et al., 2023). However, it has long been challenging to model the complex urban environmental phenomena which are highly variable in time and space (Jordan and Mitchell, 2015; Helm et al., 2020). Specifically, this study aims to tackle with the following three gaps.
1.2. Knowledge Gap
First, the data sources for CE models are limited. Traditionally, predicting residential CE relies on multifaceted GIS data – the energy-consumption as well as socioeconomic and demographic datasets (e.g., the census, the household economics survey) to build a regression model. However, detailed energy consumption data are not available in many cities – it does not even exist for some small cities due to the deficiency of funding for CE data collecting; nor do fine-grained population data exist everywhere (Cai et al., 2021),other data related to energy consumption are often only at the city scale rather than at the mesoscale (Du, Liu and Li, 2024). Additionally, another challenge is that socio-economic data are usually on the different time scale with the energy consumption data. Therefore, conventional CE prediction models are not immediate applicable to a new region nor a different period (Zheng et al., 2022).
Second, the accuracy is often limited, given the increased complexity of urban form variables. Oftentimes, multiple sources are deployed to generate the multifaceted independent variables (e.g., land use, residential density, travel mode choice, traffics). However, the built environment and the corresponding residential activities are perpetually evolving such that the dataset for some variables will not be up-to-date (Ou et al., 2013, 2019; Fang, Wang and Li, 2015; Shu et al., 2018; Wang et al., 2019; Shi et al., 2020; Qiu et al., 2023). That said, building a timely-effective model at the urban scale is desirable however difficult. By contrast, street view imagery data (SVI) which is frequently updated and open source (Qiu et al., 2022; Dong et al., 2023, 2023; Su, Li and Qiu, 2023), can describe the timely changes of the built environment at least on a yearly basis. In addition, scholars use complex data sources hoping to cover more social situations related to carbon emissions, but this does not mean better model accuracy, it can be counterproductive (Bolón-Canedo and Remeseiro, 2020; Kabir and Garg, 2023).
Third, the traditional model is generally built based on satellite image and GIS data that ignoring the street-level information which is more capable to model neighborhood-level activities that consume fossil energies. For example, satellite image is not fully capable to describe the urban form at a fine granularity – there are just many sight obstructions, e.g., tree canopy or the view angles. Taking transportation CE (Xia et al., 2020) as an example, driving trajectory data is often the source of insight to estimate traffic flows and the corresponding CE. However, satellite images lack the traffic information for many residential blocks due to the obstructions from tree canopies. While SVI is capable to infer traffic information for the neighborhoods, therefore is promising to improve the accuracy of CE modeling.
1.3. Hypothesis and Research Design
The built environment consists of various factors that influence residential CE (Shen et al., 2022), ranging from the urban greening (Vaccari et al., 2013; Shen et al., 2022; Dong et al., 2023), density (Liu et al., 2019), building height and building quality (Tranchard, 2017), to the public infrastructures (e.g., road, bus stop) (Zhang et al., 2020). Notably, most of the factors can be extracted from SVIs. For example, the green view index is a proxy of the greenery (Lu et al., 2023) which is important to carbon sequestration (Dwyer et al., 2000; Nowak and Crane, 2002; Birge et al., 2019), while the building view index is a proxy to building density and building height (Carrasco-Hernandez, Smedley and Webb, 2015; Gong et al., 2018) that significantly affect CE (Resch et al., 2016). The adequate public infrastructure and convenient transportation (e.g., road, streetlights, bus stop) may suggest a more walkable and bikeable neighborhood whose residents would have higher tendency for active travel (Li and Joh, 2017; Dong et al., 2023), resulting in lower CE (Zhang et al., 2020). A more developed economy with adequate infrastructure also relates to better maintained buildings whose dwellers exhibit stronger awareness and obligation of low-carbon measures. For example, the streetscapes such as wall and fences can imply the quality of the building – a more complex composition of the façade suggests a higher quality building whose likelihood of HVAC installation is higher – and whose residents’ income is higher, tending to consume more energy. In other words, streetscape features extract from SVIs can imply abundant dweller behavior information which can outweigh the impacts of the geometry itself to model energy use (Quan et al., 2016).
The micro-scale built environment described by SVI is also related to other indicators of residential behaviors, including walkability (Ha et al., 2023), bikeability (Ito and Biljecki, 2021; Qiu and Chang, 2021; Song et al., 2023), running (Dong et al., 2023), public transit ridership (Su et al., 2022), therefore the mode choice (Koo et al., 2023; Wu, Yao and Wang, 2023) and active living (Sallis et al., 2006; Steinmetz-Wood et al., 2019). Moreover, SVI can infer the urban forms like street canyons and density (Middel et al., 2019; Qiu et al., 2021) that explains local climate zones (Cao et al., 2022; Ignatius et al., 2022; X. Xu, Qiu, Li, Huang, et al., 2022) – an effective indicator for modeling neighborhood microclimate, outdoor comfort, and urban heat island effects (Stewart and Oke, 2009, 2012; C. Xu et al., 2022) which ultimately influence energy usage and CE.
In terms of the feasibility of SVI data source, Google provide publicly available API access to obtain the frequently-updated SVIs, while Baidu and Tencent are dominant suppliers in China. SVIs have become a common method to replace the time-consuming and costly field auditing (Rundle et al., 2011; Griew et al., 2013; Kelly et al., 2013; Queralt et al., 2021), being easily implementable at the urban scale (Salesses, Schechtner and Hidalgo, 2013; Dubey et al., 2016). However, despite SVI’s large potential, little has been empirically tested to justify its effectiveness. To fill in the gap, this paper proposes an image-based framework to directly predict residential CE based on the micro-level streetscape features extracted from SVI dataset.
2. Literature Review
2.1. Conventional Urban Energy Models
Conventional urban CE models can be classified into three families based on methodology: 1) models directly measure the CO2 concentration from remote sensed satellite data, for example, the TanSat Satellite (Hong et al., 2022); 2) models aggregate sectoral emission data collected from sensors monitoring viable spatial grids ranging from a city to a household, among which “one square kilometer” is the most common resolution (Gregg and Andres, 2008); 3) models relate the global CE data to human societal indicators in smaller spatial units (Huang et al., 2022).
The first approach mainly translates observed spectral data into the distribution of carbon dioxide, thereby obtaining global or regional scale carbon flux information. It becomes a key source for observing global and regional CO2 distribution (Crisp, 2010; Yoshida et al., 2011). Publicly accessible satellite datasets include the Europe’s SCIAMACHY, the USA’s OCO-2 and OCO-3, Japan’s GOSAT and GOSAT-2, and China’s TanSat (Hong et al., 2022). Recent studies have showcased the capability to map and estimate regional CO2 emissions (Hakkarainen, Ialongo and Tamminen, 2016) as well as facility-scale CH4 fluxes in urban and complex areas (Thompson et al., 2016; Frankenberg and Berry, 2018). This method exclusively yields CO2 emission data based on advancements in satellite technology, its disadvantages are as evident as its merits: it offers frequent updates for the global coverage in atmospheric CO2 levels.
The second approach collect carbon data from sensors (Christen, 2014; Feng et al., 2016) or simulated energy consumption and CE (Pao and Tsai, 2011) including the fuel consumption conversion based on prior sensor data (Shao et al., 2016). It often determines the total CE of a given region based on fossil energy consumption information disaggregated by sectors – this is particularly prevalent in China. For example, China’s National Greenhouse Gas Inventory is a created by experts from various fields with the National Development and Reform Commission. They developed the “Provincial Greenhouse Gas Inventory Compilation Guidelines (PGGICG)” in 2011, comprising sectors including waste disposal, land-use changes, forestry, agriculture, production processes, industrial and energy activities. In the US, (Gurney et al., 2019) quantified CE from all fossil fuel consumptions by sector with a bottom-up method – hourly emissions from citywide industrial/electricity facilities, road segments and individual buildings were measured. Notably, various datasets, such as building energy simulations, electricity production data, traffic insights, and local pollution reports were merged to build the dataset. City sub-regions can also be modeled. For example, (Wu, Guo and Peng, 2003) measured the energy use intensity (EUI) for each building type using the building energy efficiency monitoring platform in Shanghai. (Zhang, Pu and Zhu, 2013) incorporated a traffic allocation model to mimic traffic situations using a gasoline consumption function – the User Equilibrium (UE). Although their method versatility suits major cities in the more developed world, it’s not immediately applicable to medium to small size cities in many developing countries where no similar data source exists.
The third approach disaggregate global CE data to a finer resolution relating to the indicators describing the built environment and industrial activities. It’s because there was a strong alignment between surface fluxes of atmospheric CO2 and bottom-up inventories (Schuh et al., 2013; Ogle et al., 2015) or urban activities indicators like land use (Jain, Meiyappan and Richardson, 2013; Chuai and Feng, 2019) and road length (Song et al., 2021). On the one hand, nighttime light (NTL) image is found to reflect human activities correlated with energy consumption. Therefore, the brightness of NTL pixels significantly correlates with CE, enabling the prediction across spatial and temporal scales. On the other hand, various urban layers, such as transportation network (Ehsani, Ahmadi and Fadai, 2016; Sun et al., 2017), buildings (Boehme, Berger and Massier, 2015; Peng, 2016; Ahmad et al., 2018), and households (Pachauri, 2004; Druckman and Jackson, 2008) were related to the CE prediction (Kaya, 1989). Others explanatory factors include population (Ribeiro, Rybski and Kropp, 2019) and living standards (Baiocchi, Minx and Hubacek, 2010). This approach is particularly useful for alternative urban scenarios’ ex-ante assessment to support decisions like urban retrofit aiming at achieving low carbon goals (Gately and Hutyra, 2017; Zhang, Song and Yang, 2021).
2.2. Street View Image and AI to Model Urban Forms
Multifaceted natural, socio-economic, and human behavior forces have made the neighborhood level residential CE prediction challenging (Berkhout, Hertin and Jordan, 2002). Fortunately, With the rapid improvements of AI and multi-source big data application for urban studies, many urban form characteristics that are used to model CE become more accessible for researchers (Li et al., 2022). Some focus on the complex relationships between total urban CE and the industrial/economic development level or urban sprawl trend of the region (Du et al., 2018; Wen and Shao, 2019). Some other studies consider the regularity of historical data (Zhou et al., 2021) – the cyclical trends in CE. For example, (Wilson and Dowlatabadi, 2007) studied the influence of household members’ environmental perceptions and the energy consumption behavior on household CE. More recently, (Jiang et al., 2019) model household travel patterns from neighborhoods’ urban forms to evaluate CE. Increasing number of models start to address the interplay between people’s energy use habits and the environment they live in.
Meanwhile, SVI data is publicly available and frequently updated to capture ground-level panorama street scenes (Seiferling et al., 2017). SVI is an ideal dataset to comprehensively describe the urban environmental variability. For example, it has been used to model buildings (Gurney et al., 2012) including building height (Yan and Huang, 2022), streetscape features (Wang, Liu and Gou, 2022), green and water systems (Jiang, Jiang and Shi, 2020), land use classification (Jain, Meiyappan and Richardson, 2013; Tian, Han and Xu, 2021; Fang et al., 2022), the openness (Xia, Yabuki and Fukuda, 2021), road network (Zhang et al., 2023), mobile monitoring (Sun et al., 2017) and POI (Gao, Janowicz and Couclelis, 2017; Huang et al., 2022; Song et al., 2022; X. Xu, Qiu, Li, Liu, et al., 2022). However, as a new dataset receiving a lot of attention in urban studies, only few studies attempted to parse SVIs to reflect the state of urban CE. For example, (Yu et al., 2022) considered SVIs as one of the data sets to model household travel CE in Jinan, China. However, SVIs only represent the road and road-building relationship (i.e., urban canyon) in their model. To fill in the gap, this study sets to address the effectiveness of using SVI data to represent urban forms relate to the energy use behaviors of residents, to predict the residential CE.
4. Results and Discussions
4.1. Spatial and Temporal Distribution of Residential CE in Street Microenvironment
In general, high values of CE happen in densely populated area such as the center of the city. The CE of residents in diverse microenvironments show significant spatial heterogeneity. For example, the unit CE of suburban areas around Beijing are the lowest, with the CE in July ranging from 106 to 211 t/km2/months, while the unit CE are higher when closer to the center of the city where the density of residents is high. The CE in July is between 950-1,056 t/km2/months. In the eastern urban districts of Beijing, such as Chaoyang and Dongcheng, the overall CE in residential area in summer are higher than those in the western urban districts, such as Changping and Haidian. This is probably because the eastern urban area is an old urban area, with more residents’ activities and a higher population density, resulting in more CE.
Therefore, the CE in Beijing residential area presents spatial heterogeneity distribution obviously. Meanwhile, the density of residents and their activity frequency can be directly reflected from the street view. That’s because residents’ activities largely shape the street view images. For example, in general, a place with a higher population density has more residents’ activities, more residential buildings and higher building density, which then demonstrates as less greenery and more bounding walls. In addition, a place with more residents’ activities and more population has more vehicles in the street view images. Therefore, the street map can be used to predict residents’ CE and reflect the spatial heterogeneity of residents’ CE accordingly.
4.2. Co-linearity Check for the Independent Variables
We plotted the heatmap of pairwise correlation coefficients (
Figure 6) to show the relationship between the streetscape visual features to examine the potential co-linearity issues. Highly correlated variables will be further discussed with reference to Importance Features (IF) score and theory to decide whether to be removed to reduce the redundancy. For example, our results show that “earth” and “road” are highly related to prompt concerns about potential redundancy. However, in this case, both “earth” and “road” are important as they indicator different aspects affecting the residential energy use: while “road” indicate travel models and mobility/accessibility related to travel frequency, earth can affect permeability of the land surface and the micro-climate. Therefore, both were kept.
4.3. The Roles of Micro-Level Built Environment Visual Features
The impact factor (IF) and feature importance (FI) analysis reveal big divergence regarding what visual features are more important determinants in predicting the CE. On the one hand, the IF ranking based on linear regression coefficients indicates that the bridge, streetlight, van, signboard, ashcan, chair, minibike, grass, earth, railing was the most impactful (
Figure 7). On the other hand, the FI analysis, by contrast, highlights divergent visual elements are more effective when using tree-based ML models (
Figure 8). The top 10 features regarding FI are earth, sidewalk, tree, sky, road, building, fence, wall, chair, and grass. Given the OLS has a significant poorer performance (
Table 2), the relationship between visual features and the CE is more likely to be non-linear. The FI analysis is more reliable.
The Pearson Correlation analysis shows that built environment features such as sidewalk, road, fence, building, and wall are in strong correlation with residential CE (
Table 3). One possible explanation for this result is that the high ratio of these elements means the high density of residents in the area, which results in the high frequency of activities that emit carbon and other greenhouse gasses. For instance, the high ratio of buildings in SVIs might indicate the high frequency of usage of air-conditioners in the buildings near the streets. Such a phenomenon is even more obvious in this study since the data of CE used in our model was collected in July, the time when the monthly average temperature in Beijing was 29℃ and air-conditioners and other household appliances were widely used. The high density of residents will also lead to a growth in the frequency of vehicle use. Therefore, the factor of road also indirectly determines the CE of the area, the higher the traffic volume, the larger the CE in the site.
In addition, better infrastructure (e.g., walls, fences) exists in higher-density residential areas. Elements like walls, fences and buildings might reduce wind speed and slow down the diffusion of carbon-containing gasses, thus keeping the carbon content in the streets at a relatively higher level than those in open streets with few obstacles. Meanwhile, (Choi et al., 2016) found that the block-scaled UFP (ultrafine particle) concentrations have a close connection with the surface turbulence and built environment of buildings in urban areas. And CE are also in the form of particles in the air and are related to the constructions in the streets.
Natural features such as trees and grass are reversely related to residential CE since plants can absorb carbon dioxide through photosynthesis, thus reducing the carbon content concentration in the street. In streets near parks and other green areas, where the microclimate is adjusted by trees and other natural elements, the carbon concentration is relatively low.
4.4. Model Visualization and Model Application Scenarios
To better visualize the CE predication results, ArcGIS was used to illustrate the difference between actual and predicted residential CE values within each 1KM urban grid (
Figure 9). The actual CE value ranges between 177-748 t/km
2/month, therefore the estimated CE is also visualized in the same scale, to be more immediate comparable.
Figure 10 clearly depicts a relatively reliable prediction of CE values, as overall there are not distinct divergences between the predicted and actual CE values. However, certain deviations were observed across the Beijing urban area. Notably, a significant portion of the city registered lower predicted CE than the actual recorded values. Interestingly, this trend shifts at the urban fringes, where our model consistently predicts higher emissions than what’s been observed. This variance could be indicative of underlying complexities in the urban-peripheral dynamics that may not be fully encapsulated by the current model. These findings are invaluable, highlighting potential areas of refinement in our predictive mechanisms, especially concerning the nuanced interplay at the city’s outskirts.
Figure 9 indicates that prediction accuracy is higher when the ground truth value falls in a certain range (350-550 t/km
2/month). When the actual CE are low and high, the accuracy of the predicted values will be low. The range of actual CE is 177.72-748.10 t/km
2/month, while the predicted range is 210.77-627.19 t/km
2/month.
Given the spatial heterogeneity of prediction residual, we selected six areas of 16-square-kilometer urban areas to investigate the divergence between the actual and predicted data. These six areas are distributed in various parts of Beijing (
Figure 10). Among them, the MAEs in
Figure 10a,b,d are smaller, indicating better prediction accuracy. It can be seen from the comparison of
Figure 10c,e that there exist quite great gaps in the prediction of extremely high value and extremely low value, and the accuracy performs not that good. In
Figure 10f, which is similar to the average level, but there is still a certain gap when predicting higher CE values.
To the best of our knowledge, currently there are just few mesoscale residential CE models (Jiang et al., 2022). As a cross-reference validation, we selected three similar studies that also focus on household residential and travel CE to compare with our CE model. Compared with these prior studies, our model achieves a similar level accuracy with simply one publicly available input variable, while others normally use more than five types of data inputs (
Table 4).
That said, this study not only proposed a model that can better predict residents’ carbon emissions on a small scale. More importantly, we verified the possibility of using street view, a simple data source, to predict residents’ carbon emissions, supporting simpler data sources for a wide geographical region. A more timely and finer-grained carbon emission prediction model can be potentially established for cities where data availability is limited, especially those in the developing countries.
5. Conclusions and Limitations
5.1. Effects of Micro-Level Streetscape Attributes
Our SVI based prediction model is a novel tool to predict the residential CE in meso-scale urban area according to the street view images publicly available (i.e., Google/Baidu map). The model can follow the temporal and spatial changes well to predict. Because our model can be used for data visualization and data prediction, it can provide effective CE data for policy makers and urban planners in public environmental protection. The model reflects the influences of specific features on residential CE, so it could provide urban designers with simulation experiments on specific environmental influencing factors. For researchers, our approach presents a new perspective for predicting data and increases the application of machine learning in multi-disciplines. In addition, the visual expression of the model also provides the possibility for ordinary citizens to participate in public decision-making and living place choosing.
This study makes the following contributions. First, compared to other elements of street view, elements like sidewalk, road, fence, building facade and wall are highly relative with the residential CE. Second, using one data of a district in Beijing can measure relatively precisely another district’s residential CE in Beijing. The training results from this mode could be used not only to suburban areas but also to urban areas, with the development potential of universality and generalization. The transferability of the model can provide reference for more research on regional CE in the long term. Third, study on the connection between CE and streetscape elements can be conducive to the creation of urban environment under the concept of low-carbon design. Let the goals of sustainable development and carbon neutrality have a foothold to be promoted and optimized on a large scale.
5.2. Limitations
In the previous discussion, we compared the different SVIs features and the differences in CE of the specific areas in Chaoyang, including the prosperous areas with high population density, CBD areas, suburbs, industrial areas, etc. to discuss the model’s transferability in different urban scenarios. Although the experimental area selected in this study is the one that has the most rich and diverse urban forms in Beijing, the model’s transferability in different cities remains to be verified, especially for those whose energy consumption composition and residents’ living habits are very different from those of Beijing. In addition, the model of this study is affected by the features and uncertainties of the input data in the estimation of residential CE. In the process of training and data screening, the quality and representativeness of the data will directly affect the model performance. In this study, through screening and comparison, 22 variables of street features were finally adopted, but these variables do not necessarily represent all relevant variables at the street level. In the meanwhile, the street scenes we study mainly focus on urban arterial roads. There is a lack of certain data for pedestrian blocks. For pedestrian blocks or commercial blocks blocked by other elements (such as canopies, billboards, etc.), there may be certain errors. In addition, since the residential CE data grid is 1km, the CE corresponding to the street observation points in the same grid are of the same value, so the obtained residential CE value cannot fully represent the location of CE. Ideally, each street observation point should have a corresponding accurate residential CE value.