The difficulty of NILM in Quebec can be ascribed to load specifications that lead to a disaggregation scenario with multiple complexities [
36]. NILM complication is mainly rooted in the number of appliances, their power level differences, and the frequency of their state of operation changes that all are exposed by this case. Except for common loads, Quebec dwellings are mostly equipped with electric space and water heating systems due to particular geographical conditions. Each house is equipped with several Electric Baseboard Heaters (EBHs) (8-12 numbers) with high switching frequencies. These loads can have similar power levels not only to each other (almost identical for the same products) but also to other energy-extensive appliances like Electric Water Heaters (EWHs), washing-machine, and dryers. Additionally, EBHs can distort low-power trajectories associated with a wide range of devices such as fridges, freezers, lighting, entertainment equipment, and even kettle and microwave due to their high power usage and lengthy operation time. This set includes more than half of the appliances (three out of five) targeted by load disaggregation in the basic research, as mentioned previously. EWHs deteriorate this situation with their large demand, short duration, and regular presence. These loads share similar issues with EBHs like challenging other loads’ operation detection [
22]. EWHs are also a dominant load in other geographical locations. For instance, in New Zealand households, water heating presents the highest demand with 27% and, thus, is considered a crucial element of demand-side management strategies [
37]. The aggregate load profiled under such conditions yields a monitoring scenario rarely experienced by current disaggregation algorithms. The main reason for this negligence is the focus of the methods on public databases hardly exemplifying cases like Quebec. In [
38], the authors investigate the impact of EBHs on aggregate power consumption by adding the overall demand of only four baseboards to a daily load profile from the ECO dataset. For further analysis of the Quebec context, smart meter data from ten residences with 15-minute sampling intervals is utilized. For these houses, aggregate and circuit-level power consumption at a 1-minute sampling rate is also available for an elaborated investigation, which is not a real-world condition.
3.2. Quebec comparative data statistics
The careful observation of power consumption patterns in the house samples can be elaborated by a comparative analysis based on public databases. For this purpose, UK-DALE and ECO are exploited, which belong to the ten most cited datasets. The former has been broadly utilized in recent studies based on DL and the latter provides challenging cases for a NILM task [
5,
15]. In addition, these popular datasets hold data measurements for an extended period of time that can reveal power consumption patterns impacted by weather conditions and customers’ activities. This feature, as the reason for excluding REDD, is essential for a sensible comparison with the Quebec data since its behavior widely varies in accordance with seasonal changes. Insufficient measurement is also the logic behind omitting House 3 in UK-DALE from the analysis. The statistical exploration is extended by using other data from a load monitoring study funded by the CleverGuard (CG) project
1 in Switzerland [
39]. This confidential information was obtained from a set of customers and recently shared with the authors under collaborative research work. Although CG data is private, it is referred to as public for simplification. For the Quebec case, the statistics are explored for both aggregate and group-level data comprising domestic and TH loads.
Figure 6 depicts the public and Quebec data distribution in corresponding houses at a 15-minute sampling rate to manage its size. The terms in this figure represent data name and house number, for example, QH2 stands for Quebec House 2. Generally, the level of power demands is not comparable between both cases in any region of scattered data from all instances even with the focus on only the Quebec domestic load, which can share similar appliances with the public databases. With regard to the interquartile and whisker range, representing half and all data, respectively, it can be deduced that EBHs, EWHs, and case-specific devices remarkably change the spread of demand in Quebec houses.
With reference to the outlier ranges in public data, it is more likely that this region includes samples related to energy-extensive loads due to their power levels and operation schedules except for ECO Houses 4 and 5 (ECOH4 and ECOH5) and CG House 4 (CGH4). Therefore, it can be stated that in most cases, major appliances operate in distinguishable regions of public load profiles since outliers are data points with significant differences from the rest of the samples. Such a circumstance facilitates identifying these types of devices, such as washing machines, dishwashers, and kettles, as targeted loads in the NILM literature. Indeed, the outlier extent in Quebec data distinguishes no appliances, either targeted or non-targeted.
Figure 7 indicates the frequency histograms of public and Quebec data. For the former, a 1-minute sampling period has been used to better approximate the active power of on operation state of existing appliances and provide insights into probable groups of targeted ones. It can be observed that a substantial portion of samples has a power demand of less than 500W in public data. Such similarities can be challenging for NILM if it represents the demands of several targeted devices. However, the only major operation in this power band relates to the fridge. This can be noted by investigating these databases at the appliance level and reducing power intervals in the analysis. For example, close to 50% of samples carry a load of less than 200W in around 70% of cases. Besides, a minor fraction of instances lies over 1kW which interestingly contains power values of other targeted appliances. Particularly, the washing machine, dishwasher, and kettle operate in this boundary according to appliance-level information of associated datasets. Knowing the fact that these loads advertise an operation schedule, such specificity can assist with their load identification. Indeed, in all cases, a cluster with power quantities over 1300W can be approximated, which stands out of 90% of all the data. On the other hand, none of the above distinctive patterns is manifested by the frequency histogram of Quebec data, specifically regarding the appliance target space. In this case, only 50% of samples cover a wide range of up to 3kW split into several groups with significant frequencies.
The power consumption pattern is another characteristic that can provide sensible insights into data. Exploring this property can help improve demand-side management strategies by understanding customers’ behavior toward utilizing their electrical appliances, especially based on activity cycles and climate conditions [
40]. Although pattern recognition should be an essential service of any load monitoring system regarding energy-saving awareness, popular databases are inadequate to enable such practice due to limited data length and quality. This can be observed in
Figure 8 where it is challenging to determine a common period to draw inferences about behavioral differences among end-users. Notwithstanding three years of data acquisition, the UK-DALE database seriously suffers from missing data, and the available data is scattered across dissimilar time periods, except for House 1. However, a continuous pattern has been extracted by combining relevant houses for six months. House 3 does not offer sufficient readings even for an individual analysis. The ECO dataset is subject to the same problem with less severity since, except for Houses 3 and 6 with notable missing data, power values are available for roughly the entire measurement period. Nonetheless, it can be noticed that diurnal behavior per month is similar for each case study. Slightly higher variations can be detected within time progress in the ECO data.
Since the selected period covers partially warm and cold seasons, this occurrence demonstrates less correlation with environmental factors and more relationship with calendar ones. Such a pattern tendency can be attributed to both the type of in-use appliances, especially heating/cooling systems, and the climate condition. These are the same reasons for which the Quebec data illustrates a significantly different usage pattern, as shown in
Figure 9 (a). Weather and calendar components strongly influence the power consumption behavior in Quebec houses. A major share of this impact can be assigned to the notable load of EBHs according to
Figure 9 (b) [
41]. It should be noted that a total number of eight houses have been exploited for the second study due to the lack of measurement data at the Main circuit for two houses.
Exploring power demand time series according to their systematic and unsystematic components is another useful analysis regarding data characterization. A seasonality study can be employed for this purpose.
Figure 10 exemplifies this exercise for two cases. From public databases, ECO is considered regarding its demand pattern, and House 4 is selected considering its power distribution, which signifies the presence of energy-demanding loads, especially seasonal ones. The examination is carried out by use of the multiplicative model since it is a better choice for time-varying behavior and removes difficulty in interpreting negative values. Furthermore, this means proves to perform better in capturing peaks through its seasonal component. With regard to the residual element, it can be noticed that the public case contains a huge amount of unpredictable/noisy information, which exposes fluctuations almost similar to the main signal. This shows that a notable amount of data is not consistent with the rest of it. The trend and seasonal factors show a general upward slope and a clear recurring/periodic pattern, respectively, that are relatively poor considering the residual. This can be estimated by multiplying these two components. It is observed that the systematic information contributes inadequately in explaining the usage. On the other hand, the Quebec data is characterized by valuable systematic information, particularly in winter, where the model is able to strongly describe the demand. In addition, it can be realized that TH has a great influence on the seasonality of the aggregate load. This impact along with the level of overall systematic information promotes a seasonality-based NILM approach to disaggregating EBHs load. Besides, the results demonstrate that a classic decomposition is not an efficient choice for the Quebec case since the seasonality of data strongly changes within the year. It should be noted that the seasonality of other public instances is inferior to the selected one, specifically for UK-DALE and CG data.
In order to discover the real-world relationship among data instances and suggest generalizable hypotheses, a correlation analysis can be intended.
Figure 11 presents the results of this investigation into existing houses from all datasets, except for UK-DALE due to dissimilarity over available data periods. It can be seen that the correlation between household usage in the public data is not even moderate evidencing a notable difference in their tendency toward operating electrical appliances. Similar behavior can be detected in the Quebec domestic load. However, a moderate correlation can be noticed across the overall load that is rooted in the medium to high correspondence between TH demand, similar to seasonality.
The above analysis has been aimed at revealing statistics that have not been fairly taken into consideration in related research. Indeed, it is evident that Quebec data encounters a hugely higher amount of events compared to public data due to a bigger number of appliances with high switching frequency in relevant operations, i.e. EBHs. From every targeted aspect, the statistical analysis demonstrates massive differences between public and Quebec data. From a practical standpoint, a load monitoring practice should approach this case with a different set of targeted appliances. The group of interest must certainly contain EBHs due to their share of demand, impact on its characteristics, and potential applications for HEMSs. It should also include EWHs due to their major usage and regular presence that makes them responsible for the most rapid rises in household demand all over the year. Besides, it can be acknowledged from the statistical study that disaggregating the appliances choice of the literature, especially fridge, kettle, and microwave, from the Quebec data is a burdensome task. This exercise becomes more difficult knowing the fact that actual readings have a sampling rate of 15 minutes.
Figure 12 shows daily household load profiles recorded by smart meter within warm and cold seasons. Complex operational curves, seasonal variations, and continuous changes at lower demands are the challenging features of these profiles. In summer, the EWH represents the most notable demand contributing to almost all load peaks solely or partially. As a result, it gives the total usage a similar pattern in both shape and magnitude. In winter, the EBHs illustrate their remarkable influence by transforming the domestic load as shown in this figure. Another important property that can be realized is the significant level of unknown demand as the difference between the main circuit and the total loads, domestic and aggregate usages in
Figure 10 (a) and (b), respectively. This underlying issue can seriously impact the performance of a NILM practice. Accordingly, it can be stated that NILM faces a completely different case in Quebec dwellings.