1. Introduction
In recent years, the world has witnessed a significant surge in demand for energy, which can be attributed to a combination of factors, including a growing global population and rising living standards [
1]. While these advancements have undoubtedly enhanced the quality of life, they have also substantially increased carbon emissions. This rise in emissions is a significant contributor to climate change, resulting in severe consequences like extreme weather events, alterations in weather patterns, and global warming [
2]. As the global population continues to grow, the energy demand is expected to escalate further, posing a considerable challenge in balancing energy requirements with environmental sustainability. Therefore, enhancing energy efficiency is a widely discussed topic in energy. The focus is discovering methods to use less energy while still maintaining or improving the quality of the services provided. This concept applies to various sectors, including buildings, transportation, and industry, where systems are designed with energy efficiency in mind [
3]. By striving for optimal energy efficiency in these sectors, significant economic benefits can be achieved by reducing the expenses associated with energy consumption.
Buildings are responsible for approximately 40% of the world’s energy consumption and release around 36% of the total carbon dioxide emissions [
4]. Within building systems, the heating, ventilation, and air conditioning (HVAC) system is the most considerable portion, accounting for 40% of the energy usage [
5]. Notably, the HVAC system plays a vital role in regulating indoor temperature [
6,
7], ensuring thermal comfort [
8], and enhancing indoor air quality (IAQ) [
9]. Buildings are carefully designed to provide occupants with a pleasant and comfortable environment. However, it is essential to recognize that the number and presence of occupants can directly impact energy consumption. Various studies have explored the connection between occupants’ behavior and energy consumption, aiming to understand their impact on energy consumption. In work [
10], a scientific approach was adopted to quantify occupants’ behavior consistently. This approach encompassed factors such as occupants’ presence, movement, and interactions with the energy systems installed in buildings. The objective was to integrate these behavioral aspects into building performance simulation programs for a comprehensive analysis. A comprehensive review conducted by [
11] highlighted that the need for adaptation to occupancy variations and insufficient consideration of irregular or partial occupancy are primary sources of inefficiency in building systems. Prior research has demonstrated significant reductions in building energy usage by aligning HVAC systems with actual occupancy patterns [
12,
13]. Such as based on the simulation results presented in the research conducted by [
14], incorporating occupancy information has the potential to achieve energy savings ranging from 11% to 34% across different climatic regions, all while maintaining occupant comfort levels.
Review article [
15] on occupancy prediction research highlights the existence of two distinct categories: "occupancy detection/estimation" [
16,
17] and "occupancy forecast." To effectively understand these categories, it is essential to consider the concept of the prediction window. In the context of occupancy prediction, "occupancy detection" pertains to predicting the occupancy for the current time step, providing real-time information about the current occupancy state. On the other hand, "occupancy forecast" involves predicting the occupancy for a future time step, enabling insights into future occupancy patterns. The research on occupancy prediction specifically focused on forecasting occupancy for future time windows needs to be improved and often conflated with detection methods. As a result, the significance of occupancy forecasting may need to be noticed or mixed with real-time occupancy detection.
Predicting future occupancy holds promise in facilitating building operations and energy efficiency [
18]. Nevertheless, accurately predicting occupancy is complex due to the stochastic nature of occupant presence and the inherent variability in individual behavior [
19,
20]. The ongoing advancements in sensor technologies, data analytics, and prediction algorithms offer promising avenues for enhancing the accuracy and reliability of future occupancy predictions. In the past decade, there have been substantial advancements in forecast algorithms, leading to significant improvements in the accuracy of occupancy predictions. These forecast methods can be categorized into four main groups [
15]: conventional statistical approaches (i.e., Markov chain-based [
21] and recursive models [
22]), unsupervised machine learning approaches (i.e., k-means, k-nearest neighbor techniques, and support vector clustering [
23,
24]), supervised machine learning approaches (i.e., gradient boosting [
25], support vector regression [
26], decision tree [
27], random forest [
28], and deep neural networks [
29]), and hybrid approaches. Each group employs different techniques and methodologies to forecast occupancy patterns and behaviors.
In the realm of literature, predicting occupancy has been an area of significant interest for researchers. However, a common trend in this field is that most researchers rely on single sources to collect input data [
15]. While this approach may seem expedient and convenient, it carries inherent risks. If a sensor fails, the collected data may miss out on valuable information, resulting in incomplete and potentially unreliable predictions. On the other hand, every source of data collection has its limitations. For example, PIR occupancy sensors are widely used in lighting controls, but they have limitations in detecting stationary occupants [
30]. CO2-based approaches have constraints such as low sensitivity to occupant mobility and slow response to drastic occupancy changes [
31]. WiFi-based monitoring systems may also face challenges such as connection problems, limited battery life of connected devices, and poor connections for accurate occupancy detection in large-scale buildings with many occupants [
32]. These factors can significantly affect the performance of the data collection approaches. Therefore, a better solution is to utilize multi-source data collection approaches. By adopting this approach, researchers can reduce the risk of missing out on valuable information in case one source fails. This strategy is especially critical in occupancy prediction situations where the presented information carries significant weight and is of utmost importance. Researchers can use multiple sources to ensure that their predictions are accurate, reliable, and comprehensive. However, the multi-source data for occupancy prediction needs to be improved in buildings.
To address the above problems, we introduce an Occupancy Prediction Transformer network (OPTnet) for building occupancy prediction. It can robustly predict occupancy presence in diverse rooms and time horizons. We fuse and feed multi-sensor data into a Transformer model to obtain the future occupancy presence in multiple zones. We provide experimental analysis and comparison between existing occupancy prediction methods and diverse time horizons. The main contributions of this paper are as follows:
We introduce OPTnet, a Transformer-based multi-sensor building occupancy prediction network to learn an effective fused representation.
To predict accurate occupancy, we process two-week real operating sensor data from a multi-zone office building, including building occupancy, indoor environmental conditions, and HVAC operations.
Through experimental analysis and comparison, we found that the OPTnet method outperformed existing algorithms (e.g., Decision Tree (DT), Long Short-Term Memory networks (LSTM), Multi-Layer Perceptron (MLP)).
Considering the long or short occupancy prediction applications, we provide a comprehensive analysis and comparison of diverse time horizons to highlight the importance of choosing the suitable time horizon.
2. Methods
This research aims to predict occupancy using a dataset from our previous work [
33]. The dataset was collected in an office building in Hebei Province, China, from August 9th to August 21st, 2021, spanning two weeks. Further details regarding the dataset can be found in
Section 3.
The research framework in this study consists of four key steps. Firstly, sensor data and corresponding occupant information were collected from the office building, serving as the ground truth for occupancy. Secondly, data normalization techniques were applied to standardize all features to a single scale or range. Thirdly, occupancy prediction was performed using the OPTnet, and its performance was compared with various machine learning algorithms, including DT, LSTM, and MLP. Lastly, performance evaluation metrics were employed to assess the accuracy and effectiveness of the prediction algorithms. The comprehensive methodology, which included each step, is thoroughly explained in the following subsections and visually illustrated in
Figure 1.
2.1. Data collection
The dataset employed in this study was constructed by gathering 829,440 data points derived from the system’s operational data from August 9 to August 21, 2021. To ensure the dataset’s relevance, it was filtered to exclusively include instances when the HVAC system was active, specifically during weekdays (Monday to Friday) from 9:00 am to 7:00 pm. Consequently, the dataset encompasses data collected at a resolution of 1 minute during the operational hours of the HVAC system. The dataset spans two weeks: August 9 to 13th (week 1) and August 16 to 20th (week 2), amounting to 10 days. Each day, 600 data samples are recorded, comprising 54 numerical values (6 zones and 9 features). Therefore, the complete dataset encompasses 324,000 numerical values (10 × 600 × 54).
2.2. Data preparation
The dataset utilized in the system consists of raw data with numerical values spanning diverse ranges. For instance, the indoor temperature across each zone fluctuates between 22-32 , the number of occupants varies from 0 to 10, and the control signal for the FCUs (Fan Coil Units) ranges from 0 to 3.
Such disparate distributions in the raw data introduce complexity during the training process. To address this challenge, data normalization techniques are employed. Normalization mitigates the risk of gradient explosion in deep learning, accelerates convergence, stabilizes training, and enhances the model’s overall performance [
34]. Prior to inputting the data into the algorithms, all raw data undergo normalization through the following steps:
where
denote the true number of occupants at time
t, while
and
respectively denote the minimum and maximum number of occupants.
Furthermore, when prediction algorithms generate the predicted occupancy, the output value is converted to the corresponding number of occupants through the following transformation:
where the predicted value of algorithms is denoted as
, while the predicted number of occupants is represented by
.
2.3. Algorithms
This research paper presents OPTnet, which is designed explicitly for building occupancy prediction. Furthermore, a comprehensive comparative analysis is conducted to assess the performance of this algorithm in comparison to established machine learning techniques, including DT, LSTM, and MLP. The subsequent sections of the paper delve into detailed explanations and insights into these algorithms’ underlying principles and operational mechanisms.
2.3.1. Decision Tree
The DT models categorize and generalize datasets into predefined classes in data analysis and machine learning. The primary objective of a decision tree is to construct a classification model that can predict the value of a target attribute (response) based on multiple input attributes (predictors). Each internal node or leaf node within the decision tree corresponds to one of the predictors, and the number of branches emerging from a categorical internal node (leaf node) is equivalent to the possible values of the associated predictor. The leaf nodes represent specific values of the response variable and are reached by traversing the path from the root node, which is the starting point of the tree, to the final leaf (possible answers) [
35].
2.3.2. Long Short-Term Memory networks
The LSTM network is a specialized variant of recurrent neural networks (RNNs) developed in 1997 [
36]. Designed to address the challenges posed by vanishing and exploding gradients in standard RNNs, LSTM networks leverage the Back-propagation Through Time (BPTT) algorithm to train and excel in tasks involving long-term dependencies. In contrast to conventional neuron-based architectures, LSTM networks feature memory blocks consisting of memory cell units capable of retaining state values over extended periods. Moreover, these memory blocks incorporate three distinct gate units responsible for learning how to preserve, utilize, or discard states as needed. The connectivity between memory blocks is established through layers, facilitating the overall functionality and effectiveness of LSTM networks [
37].
2.3.3. Multi-Layer Perceptron
The MLP is a feed-forward artificial neural network (ANN) that draws inspiration from the functioning of the human brain [
38]. This network comprises at least three layers of neurons, specifically the input, hidden, and output layers. MLP can effectively capture non-linear relationships between predictor variables and labels by employing activation functions, except for the input layer. In this study, the rectified linear unit (ReLU) activation function is implemented in the hidden layers, as it is commonly recommended for developing neural networks [
39]. Additionally, linear and sigmoid functions are adopted in the output layers for regression and classification models. To facilitate the learning process, backpropagation, a supervised learning technique, determines optimal weights and bias values for each neuron.
2.3.4. Occupancy Prediction Transformer network
With the rapid development of Chatgpt and visual foundation models, the Transformer has become a state-of-the-art (SOTA) deep learning method. Transformer, a neural network, is adequate for dealing with sequence-to-sequence (seq2seq) tasks and learning a deep understanding of sequential data. Inspired by RNNs, the Transformer follows the encoder-decoder architecture to learn aggregated hidden-layer features. Unlike RNNs, the Transformer does not perform data processing in sequential order but processes the sequential input data in parallel. In particular, encoders and decoders, composed of multiple self-attention layers, are stacked to extract multi-layer features. Multi-head attention mechanisms are applied to learn the correlation between tokens.
In this paper, we develop an OPTnet for the occupancy prediction model. We formulate the occupancy prediction as a sequence prediction problem. In
Figure 2, we treat the history occupancy and environmental factors as the OPTnet’s inputs while treating future occupancy information (presence or number) as the OPTnet’s outputs. The structure OPTnet is shown in
Figure 2.
2.4. Performance Evaluation
We define two evaluation indicators for occupancy prediction as follows:
where
is the true number of occupants in the time
t, and
represents the predicted number of occupants. N is the time length. The Mean Squared Error (MSE) indicates the difference between the predicted value and the ground truth, while Accuracy indicates the hit rate of occupancy prediction. The performance is better when the MSE is smaller and the Accuracy is bigger.
3. Experiments
3.1. Experimental environment
Our experimental system in Hebei, China, represents a multi-zone office environment. It comprises seven distinct working zones, a refrigeration station, and an activity room. Within the system, we employ various components to facilitate efficient operation. These include an air source heat pump (HP) for cooling purposes, variable frequency water pumps for circulation, Fan Coil Unit (FCU) for indoor HVAC control, and cameras for video capture to monitor the environment.
The building is divided into nine regions: an activity room, a refrigeration station, and seven working zones. Zones 4 to 6 are virtually separated from a more extensive zone, following the specifications of the IoT system deployment.
Figure 3 provides an overview of the entire office building system, showcasing the arrangement and functionalities of each zone. It is important to note that data from zone 3, which is the financial office, is not publicly accessible. Therefore, our experiments focus on the remaining zones (1, 2, and 4-7).
Each zone has an indoor temperature sensor, a device management panel for controlling the FCU, a video recording camera, and a Jetson Nano for video processing and zone occupancy estimation.
Table 1 lists relevant sensors and actuators. Please refer to our previous work in [
33] for more detailed information on the experimental system and IoT architectures.
3.2. Experimental data
Given the substantial influence of office occupants on the energy performance of an office, the experimental data for the occupancy prediction model is based on the routines and behaviors of these occupants. These data are categorized into four groups: calendar, occupancy, indoor environment, and HVAC control.
The calendar information: We collected the sensor data from 9:00 to 19:00 during the five working days (from Monday to Friday) and weekends (Saturday and Sunday).
The occupancy information: We captured videos from our cameras. Then, we analyzed and estimated occupancy presence (1 or 0) in each room using advanced artificial intelligence technologies. The time resolution is 1 min. The duty ratios of occupancy in multi-zones are shown in
Table 2. The duty ratios are various, showing that the practical dataset is diverse and complete.
The indoor environment information: We collected the indoor temperature and relative humidity data, directly affecting the occupants’ thermal comfort. The temperature and relative humidity data can be used to predict future occupancy.
The HVAC control information: Our HVAC system employs FCUs for control. The control signs (FCU temperature feedback, FCU control mode, FCU on/off feedback, and FCU fan feedback) are considered for occupancy prediction.
3.3. Experimental parameters
A recommendation in [
40] emphasizes the importance and effectiveness of occupancy prediction models from occupancy-based HVAC control systems.
In our experiments, We chose the historical multi-sensor data (including occupancy presence, HVAC control, FCU temp feedback, FCU control mode, FCU on-off feedback, FCU fan feedback, room temperature 1 and 2, room relative humidity 1 and 2) as methods’ inputs. We chose occupancy presence (1 or 0) as the methods’ outputs. We compare famous occupancy prediction methods (DT, LSTM, MLP) and OPTnet. To compare LSTM and OPTnet reasonably, we fixed the hyperparameters:
The Adam optimizer trained the LSTM and OPTnet model for 20 epochs.
The learning rate is .
The batch size is 4.
The numbers of LSTM and TOPTnet layers are 6.
The number of fully-connected layers is 5.
The dropout of the last layer is 0.5.
MSE loss function.
Besides, considering the long or short prediction applications, we used the 30-min historical multi-sensor data to predict occupancy presence in diverse time horizons. In other words, we compared the occupancy prediction performance with 1-min, 2-min, 5-min, 10-min, 20-min, and 30-min horizons.
4. Results and Discussion
The DT, LSTM, MLP, and OPTnet algorithms were implemented on a dataset obtained from a multi-zone office building. The performance of these algorithms with different time horizons was evaluated by measuring their accuracy and MSE values for each zone. The results are presented in
Table 3 and
Table 4, respectively, showcasing the accuracy and the corresponding MSE.
4.1. OPTnet V.S. (LSTM,MLP,DT)
In
Table 3, it is evident that the OPTnet showcases exceptional accuracy values for both weeks in Zones 1 and 4. Across various time horizons, the OPTnet consistently outperforms other machine learning algorithms, demonstrating its superior predictive capabilities. Moving to Zone 2, the accuracy values of the DT algorithm are consistently perfect (the accuracy is 1) for week 1 but comparatively lower for week 2 compared to other machine learning algorithms. However, the Transformer algorithm shows high and consistent accuracy values for week 1 across different time horizons, and even higher accuracy values for week 2, surpassing the performance of other machine learning algorithms.
In Zone 5, the OPTnet achieves high accuracy values for both weeks, particularly for smaller time horizons such as 1 minute and 2 minutes. However, as the time horizon increases, the accuracy of the OPTnet decreases. Shifting to Zone 6, the accuracy values of the MLP algorithm are consistently higher than other machine learning algorithms for different time horizons in both weeks. Lastly, in zone 7, the OPTnet demonstrates higher accuracy values than other machine learning algorithms for week 1, while the MLP algorithm outperforms other algorithms for week 2.
Upon careful analysis of the results, specifically the accuracy and MSE values for each zone, it becomes evident that the OPTnet mostly outperforms other machine learning algorithms among the diverse duty ratios in
Table 2. Here are some reasons why the OPTnet outperform these methods:
Occupancy patterns in buildings can exhibit long-range dependencies, where the presence or absence of occupants in one area can impact the occupancy in other areas. The self-attention mechanism in the OPTnet allows it to capture such long-range dependencies effectively. In contrast, DT, LSTM, and MLP struggle to model these dependencies explicitly.
Occupancy patterns often have temporal dynamics, where the presence or absence of occupants at one time influences the future occupancy. The OPTnet, with its self-attention mechanism, can capture these temporal dynamics by attending to relevant past occupancy information at each time step. On the other hand, DT typically considers each time step independently, LSTM focuses on short-term dependencies, and MLP lacks inherent mechanisms for capturing temporal dynamics.
OPTnet can use parallel computation, making them highly scalable and efficient, especially when dealing with large datasets. This scalability allows the Transformer model to handle complex occupancy prediction tasks efficiently. In comparison, DT, LSTM, and MLP may need to improve scalability and computational efficiency, mainly when dealing with longer sequences or large datasets.
OPTnet has shown robustness to noisy data due to its ability to attend to relevant information and suppress noise during the attention mechanism. This robustness can benefit occupancy prediction tasks, where the data may contain missing or noisy observations. DT, LSTM, and MLP are more sensitive to noisy data and require additional preprocessing or regularization techniques to handle such scenarios.
The superior performance of the OPTnet highlights its effectiveness in accurately predicting occupancy patterns within different building zones. This outcome underscores the significance of utilizing the OPTnet as a reliable and robust approach for occupancy prediction in diverse environments. The improved performance of the OPTnet signifies its potential to enhance the efficiency and effectiveness of various applications that rely on accurate occupancy forecasts, such as HVAC control systems, energy optimization strategies, and overall building management.
4.2. Time horizons V.S. Performance
We noticed a clear pattern after analyzing the performance of the OPTnet and other machine learning algorithms across different time horizons. As the time horizon gets longer, the accuracy of each algorithm tends to decrease while the MSE value tends to increase. This finding highlights the importance of selecting an appropriate time horizon based on the specific application requirements. The effect of time horizon on algorithm performance highlights the importance of choosing the right time window for different purposes. Short-term time horizons are beneficial for applications needing instant occupancy predictions or real-time monitoring. This allows better capture and response to short-term occupancy changes with higher accuracy. On the other hand, longer time horizons are better for applications that focus on long-term occupancy forecasting and trend analysis. Even though the accuracy may be slightly lower, having a broader view of occupancy patterns and trends over a more extended period is valuable for tasks like energy planning, resource allocation, and managing occupancy in the long run.
There are two exceptions against the point that the accuracy decreases with the increasing horizons. In Zone 4 and 6, we found the accuracy is increasing. We noticed that in Table.
Table 2, the duty ratios of occupancy in Zone 4 and 6 are very high (0.919,0.816, 0.898, 0.898). With the increasing horizons (1,2,5,10,20,30 mins), in the historical occupancy data, the weight of occupancy presence is becoming bigger while the weight of occupancy absence is becoming smaller. Thus, the OPTnet and machine learning algorithms will become more conservative to achieve high accuracy, even though we use balanced class weights to train our models. When the time horizon grows, conservative models converge to the class with the most duty ratios and predict occupancy presence. It is a limitation of our proposed framework.
5. Conclusion
Buildings are responsible for a significant portion of global energy consumption, accounting for around 40% of the total. Additionally, they contribute to approximately 36% of the total carbon dioxide emissions. The occupancy of buildings plays a crucial role in achieving Occupant-Centric Control for zero emissions and decarbonization. This paper introduced an OPTnet for occupancy prediction to address these challenges. The framework integrated data from multiple sensors, including building occupancy, indoor environmental conditions, and HVAC operations, into an OPTnet to forecast future occupancy presence in multiple zones. Experimental analysis and comparisons are conducted among various occupancy prediction methods, including OPTnet, DT, LSTM, and MLP, across different time horizons (1, 2, 3, 5, 10, 20, and 30 minutes). Our OPTnet method performs better than existing methods when applied to our practical two-week dataset. This improved performance highlights its potential to enhance HVAC control systems and energy optimization strategies. By accurately predicting occupancy patterns, the OPTnet-based approach can contribute to more efficient building management and ultimately facilitate the reduction of energy consumption and environmental impact.