This page provides hour-level EULR data, reshaped as described in the June 2022 presentation to the RTF. As described in the presentation, this data version is intended to streamline certain data tasks (especially the estimation of end-use load shapes) and to make the data immediately accessible to analysts working without specialized data management software (e.g., basic Excel skills should suffice).
It also provides EULR data products developed during the 2023 REEDR calibration. For background on the calibration, see the December 2023 presentation to the RTF. For background specific to the data products linked below, see the January 2024 presentation.
Relation to the NEEA public database
The data linked below is based on NEEA’s EULR v7 data release, which includes meter data collected from the beginning of the project (Q3, 2018) through Q1, 2022. NEEA periodically releases database updates that expand to later date ranges and reflect data cleaning updates to amend any data issues that may have been discovered since the previous data release.
Users who download this data thereby agree not to share, sell, or otherwise divulge this information without the express prior permission from NEEA.
To promote the streamlining and accessibility objectives described above, the RTF CAT processed the data in a way that removes or reduces some rich aspects of aspects of full database, in addition to the primary data reshaping. The main examples of this are:
- Limit to kW and OAT. Processed data focuses on real power and outdoor air temperature. In addition to these variables, the full database includes site-level interval measurements for apparent power, indoor air- and vapor line temperature, indoor humidity, and total harmonic demand, plus weather station records for wind speed and direction, dew point temperature, and atmospheric pressure, as available.
- Hour-level aggravation. The full public database provides 15-minute interval data for all site-level measurements (and hour-level data for weather station variables).
- Reshaped by end-use. The full database is organized by UTC calendar day (each flat file contains metered data for all circuits, at all sites, for a single UTC day). The reshaped data is rearranged so that each metered circuit type has a single flat file that contains all kW values for all applicable circuits and timestamps. The reshaped files are arranged so that each row is indexed by a time stamp and each column refers to a single metered circuit of the given type.
- Frozen data version. The linked data will not necessarily be updated to remain current with NEEA data releases. The data file names include a suffix (currently “_v7”) that indicates the corresponding version of the NEEA database.
The latest public version of the full database can always be accessed through the NEEA data request form. NEEA is working to build database capabilities that will allow users to specify and download reshaped data files directly from the database in user-specified forms. In the future, that resource is expected to take the place of the data linked below.
Reshaped data files
The reshaped power and weather data files are here:
Information about circuit types and site characteristics can be found in the following data sets, which are also available directly from the NEEA database site:
In addition, the RTF CAT created the following workbooks to simplify some anticipated analysis tasks:
- Load shape template This workbook demonstrates an approach to using the reshaped data linked above to develop load shapes in the format used by ProCost.
- Heat pump circuit data This workbook provides metadata to help identify heat-pump-related circuits (especially outdoor units versus indoor units) and back-up fuel type.
In addition to the major data differences listed above, the following data-shaping details may be important to some analyses:
- For each metered circuit and each metered hour, hour-level kW is calculated as the simple average of all 15-minute-level observations that are available for that circuit in that hour. Two notable details are:
- Each date-time stamp refers to the beginning of an aggregation hour. For example, if the power entry for a given circuit on date-time “04/07/2021 18:00” is 0.357, this means that 0.357 is the average of that circuit’s power entries for time intervals 18:00-18:14, 18:15-18:29, 18:30-18:44, and 18:45-18:59.
- Power data in the NEEA database is highly complete, so the overwhelming majority of metered hours have 4 fifteen-minute observations, each of which represents 15 one-minute observations. Hours that are not metered at a given site or circuit, or that occur entirely within a (rare) meter failure period, have no 15-minute observations, and are assigned a power value of “NA”. Partial hours, which have some power data but fewer than 4 complete 15-minute observations are recorded as the simple average of whatever power observations are available (there is no warning flag to indicate these cases, but they are extremely rare and unlikely to noticeably impact most analyses).
- All timestamps refer to local clock-time. Each row in the reshaped data is indexed by a single (local) timestamp. Two notable consequences of this are:
- Most rows include power values for sites in Pacific as well as Mountain time-zones (note that time zones are provided in the top row of each reshaped data files). This means that the power values in a given row do not refer to simultaneous power usage because Mountain-zone local time differs from Pacific by one hour.
- Daylight savings clock changes result in a duplicated hour each fall and a skipped hour each spring. These occur in the middle of a weekend night and have very little impact on most analyses.
The code used to generate the reshaped data files can be accessed through the link below. In addition to the reshaping itself, this code also includes on-the-fly data investigations that guided the reshaping process, as well as segments used to generate graphic and summary output included in the June 2022 presentation to the RTF. It is provided here for completeness and transparency.
Data products related to REEDR calibration
RTF CAT developed additional EULR data products to support the REEDR calibration effort.
This includes a series of reshaped data sets with circuit-level data on all HVAC and water heating circuits, plus grouped data for "Other indoor" and "Other outdoor" loads (indoor vs. outdoor designation based on circuit label). Reshaped formats are: year-month-day (365) aggregation, year-month-hour-of-day (12x24) aggregation, and month-day-hour (31x24) slices for select calendar months.
The CAT also evaluated likely or possible HVAC loads that were not indicated on circuit labels and therefore not flagged as HVAC in the EULR database. These are provided in the year-month-day and year-month-hour-of-day aggregated formats.
To evaluate unlabeled HVAC load, the CAT examined multiple data views which may be of interest to analysts seeking to understand site-level EULR data. The following folders include separate graphics for each site:
- Day-level metered kWh by end-use group and by circuit versus outside air temperature
- Day-level unlabeled kWh (estimated) versus outside air temperature and by date.
- Hour-level metered kWh by end-use group and by circuit versus hour of week
Site-specific observations related to unlabeled HVAC loads or other apparent data anomalies are recorded in the file HEMS Sites Unlabeled HVAC.xlsx.
Questions about the reshaped data files or the analysis should be addressed to: