From home energy management systems to energy communities: methods and data

Ruano, Antonio; Ruano, Maria da Graça

doi:10.1038/s41597-024-03184-5

Download PDF

Data Descriptor
Open access
Published: 06 April 2024

From home energy management systems to energy communities: methods and data

Scientific Data volume 11, Article number: 346 (2024) Cite this article

396 Accesses
Metrics details

Subjects

Abstract

This paper introduces the HEMStoEC database, which contains data recorded in the course of two research projects, NILMforIHEM, and HEMS2IEA, for more than three years. To be manageable, the dataset is divided in months, from January 2020 until February 2023. It consists in: (a) consumption electric data for four houses in a neighbourhood situated in the south of Portugal, (b) weather data for that location, (c) photovoltaic and battery data, (d) inside climate data, and (e) operation of several electric devices in one of the four houses. Raw data, sampled at 1 sec and 1 minute are available from the different sensing devices, as well as synchronous data, with a common sampling interval of 5 minutes are available. Gaps existing within the data, as well as periods where interpolation was used, are available for each month of data.

The Plegma dataset: Domestic appliance-level and aggregate electricity demand with metadata from Greece

Article Open access 12 April 2024

The ENERTALK dataset, 15 Hz electricity consumption data from 22 houses in Korea

Article Open access 08 October 2019

The IDEAL household energy dataset, electricity, gas, contextual sensor data and survey data for 255 UK homes

Article Open access 28 May 2021

Background & Summary

Over the last two decades the global electricity consumption market has been growing at an average yearly reported level of 3.1%. One of the largest consumer sector are buildings, and in particular the residential sector. Managing efficiently the flow of electricity in a house is important, not only from the point of view of the owner’s electricity bill, but also from the point of view of global consumption, as well as from the point of view of the electrical grids. In fact, traditional grids find it difficult to cope with this increasing demand, exacerbated by the integration of extensive variable energy resources, such as renewable energy systems.

The present dataset is the result of two projects NILMforIHEM, and HEMS2IEA. The aims of the first project were to improve the performance of existing non-intrusive load monitoring algorithms and the efficiency of energy systems in homes. The second project, using the results of the former, aimed to propose new energy management techniques for local energy communities, managed by an aggregator. It was considered that the aggregator would interface with each residential management system and with the electricity grid, allowing electricity to be managed in accordance with different community contracts. The dataset enables several different topics related to the efficient use of energy in households and communities to be investigated by the research community. In the sequel a brief review of these topics is conducted.

Home energy management systems

The goal of a Home Energy Management System (HEMS) is to manage efficiently the flow of electricity in the house, so that the electric bill is reduced or annulated, maintaining the comfort of its occupants. Despite the large interest of the research community, due to the complexity and diversity of the systems, as well as by the use of suboptimal control strategies, energy consumption is still higher than necessary, and users are unable to yield full comfort in their homes. Excellent reviews detailing HEMS developments in recent years are available; please consult the reviews of Beuadin and Zareipour¹, Leitão and co-workers², Mahapatra Mahapatra and Nayyar³ or Gomes et al.⁴. According to this last reference, HEMS can be broadly divided into four classes: traditional techniques, model predictive control, also known as model-based predictive control (MBPC), heuristics and metaheuristics, and other techniques. The first class comprises methods based on traditional optimization techniques, typically using commercial solvers. Perhaps the most important sub-class within traditional methods is the use of Mixed-Integer Linear Programming (MILP), which refers to optimization techniques where the objective function is a linear function and subject to linear restrictions, but includes mixed, continuous and discrete variables. Examples of household energy management based on MILP are the works of:

a)
Lu et al.⁵, where the results of the proposed HEMS are compared with other energy management systems, showing the effectiveness of the proposed model, through case studies that allow reducing energy costs in both summer and winter;
b)
Baek et al.⁶, where results are compared when demand response is employed and when it is not. They demonstrate that the strategy presented with demand response is superior;
c)
Lyu et al.⁷, where the proposed methodology allows to reduce house costs by 53% and reduce Peak-to-Average Ratio (PAR) by around 70%.

Model-based predictive control is an advanced control technique based on a receding horizon principle, aimed at determining the best sequence of actions while meeting the requirements. The application of MBPC in HEMS has increased significantly in recent years. For instance, in Mirakhorli et al.⁸ a HEMS for a residential building with a Photovoltaic (PV) system, Electric Storage System (ESS), thermal and electric loads, and Electric Vehicles (EV) is proposed. The MBPC problem considered a prediction horizon of four hours for every five minutes. Rao and co-workers⁹ propose a HEMS for a smart home focusing on the energy balance between the three phases to control both active and reactive power. Several case studies are considered, assuming a prediction horizon of twenty-four hours, a control horizon of twenty-four hours, and a simulation horizon of forty-eighty hours. A comprehensive approach of a mixed-integer quadratic-programming MPC scheme based on the thermal building model and the building energy management system is employed by Killian and co-workers¹⁰.

Heating, ventilation and air conditioning systems

It is recognized that near 40% of the energy (see Pérez-Lombard and co-workers¹¹) consumed in buildings is due to the operation of Heating, Ventilation and Air Conditioners (HVAC). For this reason a special care should be devoted to this specific equipment. MBPC is perhaps the most proposed technique for HVAC control since it offers an enormous potential for energy savings. Typically what is sought is the minimization of the energy spent, or the electricity bill, incurred in the HVAC operation, while simultaneously maintaining the room(s) under thermal comfort. Thermal comfort can be assessed in different ways, the most used being temperature regulation. In some cases, the relative humidity is also maintained within user-defined bounds. In the last years, the Predicted Mean Vote (PMV) is increasingly used. The PMV index is based on human thermal sensation which is strongly related with the energy balance of the body when the human body is considered in a heat balance situation, i.e., the heat produced by metabolism equals the net loss of heat. The classical way in which the PMV index can be obtained was presented by Fanger¹² and is dependent on six variables: metabolic rate, clothing insulation air temperature, relative humidity and velocity, and mean radiant temperature.

For HVAC control, MBPC can be applied in several different ways. Donaisky and co-workers¹³ minimized the PMV index, generating a nonlinear PMV model having a Wiener structure. Ma et al.¹⁴ employ a simple thermal mass model to minimize a cost function employing economic costs. Castilla et al.¹⁵ minimize the PMV index, using a PMMPC model. In Chen’s work¹⁶ the energy is (indirectly) minimized, using constraints on the thermal sensation scale, where the use of the PMV index is compared with an Actual Mean Vote index. A simple thermal model is used in this approach. In Huang et al.¹⁷ a neural network is used to optimize a start-stop strategy for temperature-regulated control. Li et al.¹⁸ minimize the energy spent and violations of bounds on air temperature, using a state-space formulation for the prediction of these variables.

Non-intrusive load monitoring

Energy monitoring is a key point of a HEMS; it can be done installing measuring devices at every load of interest or using Non-Intrusive Load Monitoring (NILM) methods, which disaggregate the overall usage, using a measure of the load at the utility service entry. Research, however, is still needed in this field, specially in terms of simple algorithms, without requiring either special-purpose hardware or the use of high-sampling power data.

Excellent reviews on NILM algorithms can be found in the works of Georgios Angelis et al.¹⁹ and Ruano and co-workers^20,21.

The main stages in a NILM application are²¹:

a)
Data collection: electrical data, including current, voltage, and power data, are obtained from smart meters, acquisition boards or by using specific hardware;
b)
Event detection: an event is any change in the steady state of an appliance over time. An event implies variations in power and current, which can be detected in the electrical data previously collected by means of thresholds;
c)
Feature extraction: appliances provide load signature information or features that can be used to distinguish one appliance from another;
d)
Load identification: using the features previously identified, a classification procedure takes place to determine which appliances are operating at a specified time or period, and/or their states.

Regarding step (a), the most important point to consider is the sampling interval applied to the electrical signals. They can broadly be classified into very low (slower than one minute), low (between than one minute than one second), medium (sampling frequency between one and fifty/sixty Hz), high (from fifty/sixty Hz to two kHz), very high (between two and forty kHz) and extremely high (greater than forty kHz). Another point to take in consideration is the hardware used to acquire the data. Commercial devices typically only achieve very low and low frequencies; higher sampling frequencies need specialized hardware. Related with that are data storage and processing capabilities, which obviously increase with the sampling frequency employed.

Focusing now at step (b), according to the work of Anderson et al.²² event detectors typically use three different approaches: expert heuristics, probabilistic models and matched filters. The former consist of the creation of a set of rules for each appliance. Initial NILM works used this approach. Probabilistic models provide a probability, used to make a decision about the occurrence of events. A particularly well-known case is the Generalized Likelihood Ratio (GLR) method (please see Anderson’s work²²). Finally, matched filters are characterized by extracting the signal waveforms and correlating them with known patterns.

The features that can be used to identify an appliance are obviously related to the sampling time employed. For very low and low frequencies, active, apparent and reactive powers are often used, together with Root-Mean-Square values of the current or voltage. Medium rate acquisition allows the use of transient features of the electrical features. High sampling rates allow to employ spectral features such as harmonics (see Meehan et al. work²³), Discrete-Wavelet Transform (Chang and co-workers²⁴), and so on. Very high rate data allows to obtain much more detail about each appliance’s waveform, either from the higher harmonics or from the shape of the raw current and voltage waveforms themselves. Two-dimensional voltage-current (V-I) trajectories were used in Hassan and co-workers investigation²⁵.

Using the features described above, computed from the aggregate load, the objective in step d) is to identify the appliances that are operating at a given time. This can be formulated as a optimization or classification problem, as four appliance types are usually considered:

Type I—On/off devices: most appliances in households, such as bulbs and toasters;
Type II—Finite-State-Machines (FSM): the appliances in this category present states, typically in a periodical fashion. Examples are washer/dryers, refrigerators, and so on;
Type III—Continuously Varying Devices: the power of these appliances varies over time, but not in a periodic fashion. Examples are dimmers and tools.
Type IV—Permanent Consumer Devices: these are devices with constant power but that operate 24 h, such as alarms and external power supplies.

This way, for the case of type II appliances, identification is not only translated into which appliances are active, but also their states.

A very large number of techniques have been proposed for this step. They can be very broadly classified as optimization methods and machine learning (supervised and unsupervised) techniques. Optimization approaches use different methods to perform a combinatorial search. Examples are hybrid programming (Kong et al. work²⁶), genetic algorithms (Egarter, Sobe & Elmenreich paper²⁷) and others. Supervised techniques use offline training to achieve a database of information used to design the classifier(s). These are the most employed class of methods; the works of Chang et al.²⁴, Kelly & Knottenbelt²⁸ and Wu and Wang²⁹ belong to this class. Unsupervised methods do not require any training prior to classification, which is an important advantage. Feature clustering, and the later labelling of each cluster with meaningful appliance names has been applied by Yang and co-workers³⁰. The most recent unsupervised techniques applied to NILM belong to a family of methods that assume that the electrical signal is the output of a stochastic system, maintaining a representation of the whole system state, instead of dealing with individual events. Examples are Hidden Markov Methods (HMM) and variants (please see the works of Cutsem et al.³¹ and Kong et al.³²).

Forecasting

Another important point for HEMS is the ability to forecast the values of important variables for energy management. And several forecasts are necessary, such as the home load demand, either global or appliance-based, the electricity produced by renewable energy sources, if available, weather variables, occupancy, inside climate, for instance. The better the quality of the estimation, the better the electricity management that can be achieved.

Forecasting techniques can be envisaged from several points of view, such as: (a) the time-scales involved; (b) the exogeneous variables used in the model; and (c) the methods applied. Regarding the former, time-scales can vary from horizons of a few seconds or minutes (intra-hour or very short forecasts, for control and adjustment actions), a few hours (intra-day or short/medium, for energy resource planning and scheduling, as well as for the electricity market), to a few days ahead (intra-week or long, for unit commitment and maintenance schedules). The choice of employing exogeneous variables, and in the affirmative case, which variables are used depends essentially on the model application. Finally, looking at the methods, in the general case they can be broadly divided into statistical and machine learning methods (obviously forecasting of specific variables may employ other class of methods). Statistical models are typically linear models such as persistent forecasts, Auto-Regressive (AR), Auto-Regressive–Moving-Average (ARMA), and Auto-Regressive Integrated Moving Average—ARIMA. Machine Learning methods are the most used nowadays and typically comprise several different shallow and deep neural networks, whether isolated or fusing different models.

Regarding PV power forecasting, several reviews exist in the topic. The interested reader can consult, for instance, the works of Alcañiz et al.³³ or Pandžić and Capuder paper³⁴, and the references within. Forecasting PV power will also need the forecasting of atmospheric variables, such as solar irradiation (please see El-Amarty et al. work³⁵), air temperature (Tran et al.³⁶), and possibly others. As examples, Yang and co-workers³⁷ proposed a hybrid scheme, involving classification, training, and forecasting stages. This scheme is used for one-day ahead hourly forecasting of PV output. Fonseca and co-workers³⁸ compare the suitability of a non-parametric distribution and three parametric distributions in characterizing prediction intervals for photovoltaic energy forecasts with high levels of confidence. Mei et al.³⁹ propose an LSTM-Quantile Regression Averaging-based nonparametric probabilistic forecasting model for PV output power.

Households load demand forecasting is an active area of research as, on one hand, it allows the occupants to be aware of the energy consumption of their own house and, consequently, to take measures to reduce this consumption and the energy bill, and, on the other hand to enable a more efficient operation of the HEMS. During the last years, computational intelligence techniques somehow replaced physical-based methods, as the former do not require knowledge of the building geometry and physical phenomena to deduce an accurate prediction model. Several reviews exist on this topic, such as Foucquier’s⁴⁰, Wei et al.⁴¹, Ahmad et al.⁴² and Wen et al.⁴³. As in the case of PV forecasting, different exogenous variables can be applied to the prediction models, such as atmospheric air temperature, number of occupants, codifications of days between, week, weekend, and holidays, to name but a few. Different computational methods can also be applied. For instance, Mynhoff et al.⁴⁴ compared different prediction models: Artificial Neural Networks-Nonlinear Auto-Regressive (ANN-NAR), HMMs, Support Vector Machines (SVM), MultiLayer Perceptrons (MLP) and Deep Belief Networks (DBN) for one-step daily and weekly forecasts. Yildiz and co-workers⁴⁵ compared the forecasting performance of ANNs, SVMs and Least-Squares SVMs, with different data resolutions and forecasting horizons, with several models, each applied to a different load profile, obtained by clustering the load profiles.

Forecasts can also be applied to energy markets. In recent years, in many countries, the acquisition and sale of electricity is traded in energy markets (please see Yildiz and co-workers⁴⁶). Accurate forecasts of the electricity demand and price are therefore a need for the participants in the energy markets. In particular, the one-day ahead hourly forecast, considered a short-term forecast, has received increasing attention from the research community. Comprehensive reviews on load and price forecasting are available in Suganthi & Samuel⁴⁷ and Weron’s work⁴⁸ respectively.

Finally, according to Zhang, He & Yang⁴⁹, existing load and generation forecasting algorithms can be classified into two classes: point forecasts and probabilistic forecasts. The former provides single estimates for the future values of the corresponding variable, which are not capable of properly quantifying the uncertainty attached to the variable under consideration. The latter algorithms are increasingly attracting the attention of the research community due to their enhanced capacity to capture future uncertainty, describing it in three ways: prediction intervals, quantiles, and probability density functions (PDF) (please see Bracale and co-workers⁵⁰).

Communities of energy

Obviously, better and more efficient solutions, not only from each householder’ point of view, but also from the community consumption perspective, are extensions of the tools above described to groups of households that share between them the energy produced or stored, in the form of communities of energy. In this context the local HEMS can be hierarchically controlled by an aggregator, which supervises not only the management of energy in each local prosumer (productor/consumer), but also the flow of energy between the members of the community as a whole, as well as the exchanges between the community and the grid.

It is within this context that this dataset is introduced. It spans more than three years of data, covering different types of variables of high importance to the field of electrical energy and thermal comfort of, either isolated or community-based households. More specifically, it allows, for a single prosumer, to:

a)
Test and validate different control strategies for home energy management systems, as done by us in^51,52. The first reference compares MBPC control implemented with the Branch-and-Bound technique for HVAC control with the house proprietary system. The second reference employs a MILP method in a MBPC framework, controlling not only the inverter, but also appropriately scheduling loads. Both approaches achieve important savings in the electricity bill.
b)
Design forecasting energy consumption models, as discussed in^53,54,55. The first reference employs a design Multi-Objective-Genetic-Algorithm (MOGA)⁵⁶ framework available in our lab, which performs feature selection, topology determination and parameter estimation, to forecast load demand forty-eight-steps-ahead, with a time-step of fifteen minutes. The second one extends the previous approach to an ensemble of MOGA designed models. The third one proposes an hybrid forecasting mechanism to use with⁵².
c)
Design forecasting PV energy generation models⁵⁷. The approach described above is applied to PV power generation, with great success.
d)
Moving from deterministic forecasting to probability forecasting, for both load demand and PV power generation⁵⁸
e)
Test and validate different non-invasive load monitoring (NILM) algorithms, as performed in^59,60. The first reference employs ApproxHull⁶¹, a data selection tool existing in our lab to deep learning models. The second one uses ApproxHull and MOGA to design shallow models to detect appliance operation and energy estimation,
f)
Design forecasting thermal comfort models, as well as test and validate control strategies for Heating, Ventilation and Air Conditioning (HVAC) systems, as in⁶². Very basically, HVAC is controlled so that it guarantees PMV thermal comfort within user-predefined schedules, while minimizing the energy consumed, making use of forecasting models of solar radiation, atmospheric air temperature and relative humidity, inside air temperature, relative humidity and mean radiant temperature, as well as room occupancy.

Additionally, for a community of four houses, it allows to:
g)
Test and validate different control strategies for the community energy management system, which can be found in⁶³, where the MILP-MBPC strategy described above is extended for a community of houses. Different ways to share the produced and stored energy are compared.
h)
Design day-ahead net load point and probabilistic forecasting to work with energy markets, in⁶⁴;
i)
Test and validate transfer learning strategies for NILM, as discussed in D’Incecco’s work⁶⁵.

All the above topics are important, on their own, for future research. What perhaps is most important and should be stressed is that significant improvements on the general field of energy efficiency in buildings and energy communities require the join research of all these topics, to which others can obviously be added. This is an added-value of this dataset in comparison with existing ones, as this includes all the data needed to address all the topics considered, which is not verified in existing datasets.

As the households that were employed in this research are typical Mediterranean detached family houses, the data available in this dataset can be used as representative of that segment of buildings, and climate. By this we mean that it is expectable that methods and techniques applicable to the nine classes of problems identified above, using this dataset, will produce similar results to other households or communities in regions with a similar climatic type.

As both raw data, typically sampled at one second or at one minute (please see below) and curated data, synchronized with a five minutes sampling are available, different sampling intervals can be used for the different methods. The dataset can be found at⁶⁶.

Methods

Data was collected from four residential houses, situated in Gambelas, Faro, in the south of Portugal. All four are detached houses, with two floors and garden, where families live. Two of the houses have triphasic meters, while the others are monophasic. The former will be denoted as TH1 and TH2, while the latter are coined MH1 and MH2. TH1 has a PV system and a energy storage, MH1 has a photovoltaic system, and the others do not have any renewable energy source.

TH1 was used in NILMforIHEM project, that started in 2019. For this reason, and because it was used for objectives a) to f) above, has much more data for a much larger period of time. This house and the three additional houses were employed for project HEMS2IEA, which started in 2021. Only electric consumption data was recorded for these three houses. Recorded data for the four houses spans from November 2021 until July 2022. After this date, as one of the houses had major works, data was reduced to three houses.

TH1 has twenty different spaces (including garden, halls, and so on). The floor plans are shown in Fig. 1.

A photovoltaic system was installed, composed of 20 Sharp NU-AK panels⁶⁷, each panel with a maximum power of 300 W. (please see Figs. 2 and 3) The inverter is a Kostal Plenticore Plus (Fig. 4) converter (KI)⁶⁸, which also controls a BYD Battery Box (Fig. 5) HV H11.5 (with a storage capacity of 11.5 kWh)⁶⁹.

The house electric panel consists of sixteen monophasic circuit breakers, plus a triphasic one. Several electric variables are measured in every circuit breaker, providing approximate ground truth for the NILM identification. Circutor Wibees (WB)⁷⁰ are used as the measurement devices. They are plug and play wireless devices and use Hall Effect technology for the measurement. Because of that, calibrations are required for correct measurements. Voltage, current, frequency, active reactive and apparent power, power factor, active inductive reactive and capacitive reactive energy are measured every second for the every monophasic circuit breakers, the same number for each phase of the triphasic one, together with totalized values. In total, 198 variables are sampled by the WBs every second.

Total consumption data is supplied by a Carlo Gavazzi (EM340) three-phase energy meter⁷¹. This meter is a class X certificated device, and electrical measurement is done using a two-wires Modbus RTU connection. EM340 supplies 37 different electric variables, sampled at one Hz.

Measurements of the energy produced by the PV, stored in the battery and injected in the grid are obtained either from the inverter (KI) or from a Kostal smart energy meter (KEM)⁷². Home electrical consumption variables are also available in the inverter. In total, 78 variables are obtained by KEM and KI, at a sampling interval of one minute (Fig. 9).

For on/off control Smart Plugs Self-Powered Wireless Sensors⁷³ are used (Fig. 7). They are also used to enable sockets belonging to the same CB to be measured individually. They are read/controlled directly using an internal web service. The number of SPs changed with time, enabling the measurement of six variables every second for each plug. In a similar way to the SPs, the Air Conditioner in Room B14 in Fig. 1 can be measured and actuated.

A Weather station (please see Mestre et al.⁷⁴) measures the air temperature and relative humidity, and global solar radiation, at one second intervals (Fig. 6).

Self-Powered Wireless Sensors (please see Ruano et al.⁷⁵) are used for measuring climate room data, such as air temperature and relative humidity, status (open/close) of doors and windows, walls temperature, light and room movement (Fig. 8). They are Ultra-Low-Power devices and communicate via ISM radio band working on 2.4 GHz or 868 MHz frequencies.

Data transmission from/to the measurement devices is available through Gateways and a Technical Network. A technical IP-cabled and a wireless network have been created using a network router, separating the home network from the technical network.

Finally, an IOT platform was created to interactuate with the data acquisition system. For more information on the acquisition system and the IOT platform, please see Ruano et al.⁷⁶.

In the three additional houses, only electric consumption is measured. For this reason, in TH2, a Carlo Gavazzi EM340 meter was installed. In MH1 and MH2, Carlo Gavazzi EM112 (one-phase) meters were installed, providing a subset of variables acquired by the EM340.

Data Records

The data records are available in Zenodo⁶⁶. The datasets are divided in months, starting in January 2020, and ending in February 2023, spanning therefore more than three years. They are Matlab data files, with the format ‘v7’, which can be loaded using the usual ‘load’ Matlab command. Notice that the use of this format enables the data to be read directly by other languages, such as python, using the function loadmat in scipy.io.

The sensing devices are categorized in eight categories, and within each category, there might be different appliances.

The variables measured by the Wibeees are shown in Table 2.

Table 1 Categories and Devices classifications.

Subjects

Abstract

Similar content being viewed by others

The Plegma dataset: Domestic appliance-level and aggregate electricity demand with metadata from Greece

The ENERTALK dataset, 15 Hz electricity consumption data from 22 houses in Korea

The IDEAL household energy dataset, electricity, gas, contextual sensor data and survey data for 255 UK homes

Background & Summary

Home energy management systems

Heating, ventilation and air conditioning systems

Non-intrusive load monitoring

Forecasting

Communities of energy

Methods

Data Records

Technical Validation

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links