NLR HPC Kestrel Jobs Data

20260 citationsDatasetgreen Open Access

Authors

Abstract

<h2>Dataset Description</h2> This dataset contains anonymized job-level records from the Kestrel high-performance computing (HPC) system. Each record represents a Slurm batch job and includes scheduling metadata, resource requests, resource utilization, energy consumption estimates, and computed efficiency metrics. Personally identifiable fields (user, account, job name, submit line, working directory, submit script, and job type) have been replaced with cryptographic hashes. <h3>Developed by</h3> National Laboratory of the Rockies (NLR), <a href="https://ror.org/036266993">ROR: https://ror.org/036266993</a> <h3>Contributed by</h3> HPC Operations and Data Analytics teams at NLR. <h3>Dataset short description</h3> Anonymized Slurm job records from the NLR Kestrel HPC system, including job scheduling, resource allocation, energy estimates, and efficiency metrics. <h3>Over what timeframe was the data collected or generated? Does this timeframe align with when the underlying phenomena or events occurred?</h3> The sample data covers jobs submitted between <strong>2023-08 and 2025-12</strong>, with timestamps in the Mountain Time zone (UTC-7). The data reflects real-time job scheduling events as they occurred on the Kestrel system, so the collection timeframe aligns directly with the underlying phenomena. <h3>What resources were used?</h3><h4>Facilities:</h4><ul><li><strong>Kestrel HPC System</strong>, National Laboratory of the Rockies (NLR), <a href="https://ror.org/036266993">ROR: https://ror.org/036266993</a></li></ul><h4>Funding:</h4> U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy (EERE). <h4>Other Supporting Entities:</h4> N/A <h2>Sharing/Access Information</h2><h3>Reuse restrictions placed on the data:</h3> The dataset has been anonymized by hashing sensitive fields (user, account, job name, submit line, working directory, submit script, and job type). Reuse is subject to the license specified in this datacard. Users should not attempt to re-identify individuals from hashed fields. <h3>Provide DOIs, and bibtex citations to publications that cite or use the data.</h3> N/A <h3>Provide DOIs, citations, or links to other publicly accessible locations of the data.</h3> N/A <h3>Provide DOIs, citations, or links and descriptions of relationships to ancillary data sets.</h3> This dataset is derived from the Kestrel schema of the NLR HPC job database. <h2>Data & File Overview</h2><h3>List all files contained in the dataset.</h3> Format: <strong>File</strong> | <strong>Description</strong> esif.hpc.kestrel.job-anon.zip | Zipped Hive-partitioned Apache Parquet dataset containing anonymized job records from the Kestrel Slurm scheduler. Each row is a parent job record with scheduling metadata, resource requests/usage, energy estimates, and computed efficiency metrics. datacard.md | This datacard file describing the dataset. <h3>Describe the relationship(s) between files.</h3> The ZIP file is the primary data file. The datacard provides documentation. In the source database, each job record may have associated job_step records (not included here) that contain finer-grained resource usage data per step. <h3>Describe any additional related data collected that was not included in the current data package.</h3> The source database contains additional tables not included in this extract include job_step (per-step resource usage including TRESUsage fields). Raw Slurm slurm_data JSONB fields have also been excluded. <h3>Are there multiple versions of this dataset?</h3> N/A <h2>Methodological Information</h2><h3>How was the data for each instance obtained or generated?</h3> Each instance is a parent job record collected from the Slurm workload manager on the Kestrel HPC system via the sacct command. The data represents real job submissions, scheduling decisions, and resource consumption. Calculated fields (efficiency metrics, energy estimates, shared job information) are derived from the raw Slurm data through database functions and triggers. <h3>For each instrument, facility, or source used to generate and collect the data, what mechanisms or procedures were used?</h3> Data is collected by periodically running the Slurm sacct command with the timestamp format SLURM_TIME_FORMAT="%Y-%m-%dT%H:%M:%S%z" to ensure correct timezone offsets. The output is loaded into a PostgreSQL database via the load_slurm function. Calculated columns are updated by database triggers (set_job_calc) and batch functions (upd_calc_cols, upd_sharednodes). <h3>To create the final dataset, was any preprocessing/cleaning/labeling of raw data done?</h3> Yes. The following preprocessing was applied: <ol><li><strong>Anonymization</strong>: The fields name, user, account, submit_line, work_dir, submit_script, and job_type were replaced with truncated cryptographic hashes (7-character hex strings) to prevent re-identification.</li><li><strong>Column derivation</strong>: Several columns are calculated from raw Slurm fields, including queue_wait (start_time − submit_time), cpu_eff (TotalCPU / CPUTime), max_mem_eff, min_mem_eff, avg_mem_eff, and energy estimates.</li><li><strong>State simplification</strong>: A state_simple column maps detailed Slurm states (e.g., "CANCELLED by 132357") to simplified labels (e.g., "CANCELLED").</li><li><strong>Boolean tagging</strong>: python_job and reframe_job boolean flags were derived (methodology not specified in schema; both are false in this sample).</li><li><strong>Temporal decomposition</strong>: year, month, day, day_of_week, hour, and minute columns were extracted from submit_time.</li></ol><h3>Is the software that was used to preprocess/clean/label the data available?</h3> The data is loaded and processed using PostgreSQL functions. These are internal to the NLR HPC operations database and are not publicly released at this time. <h3>Describe any standards and calibration information, if appropriate.</h3> Timestamps are exported from Slurm with timezone offsets (Mountain Time, UTC-6 or UTC-7 depending on daylight saving). The timestamptz PostgreSQL datatype is used to store correct offsets. Energy consumption values (consumed_energy_joules, consumed_energy_raw_joules) are reported by Slurm from node-level power monitoring. TDP-estimated energy values are calculated from hardware specifications rather than direct measurement. <h3>Describe the environmental and experimental conditions relevant to the dataset.</h3> The Kestrel system is located at the NLR campus. Standard compute nodes have 104 cores and 256 GB of memory; bigmem nodes have 2000 GB of memory. GPU nodes (partition gpu-h100) are equipped with NVIDIA H100 GPUs. Jobs in this sample span the short, standard, debug, and gpu-h100 partitions. <h3>Describe any quality-assurance procedures performed on the data.</h3> The data have been cleaned and validated through the standard data processes used to support Kestrel operations. While these preprocessing and quality-control steps are integral to the dataset, the underlying software and pipelines are not publicly available <h2>Data-Specific Information</h2><h3>What data does each instance within the dataset consist of?</h3> Each instance (row) represents a single parent Slurm job on the Kestrel system. The data includes raw Slurm scheduling fields (timestamps, resource requests, resource usage, state), anonymized identifiers, and derived/calculated efficiency and energy metrics. <h3>Number of variables:</h3> 50 <h3>Number of cases/rows:</h3> Approximately 11,000,000 <h3>Variable descriptions:</h3> Format: <strong>Variable Name</strong> | <strong>Description</strong> | <strong>Unit</strong> | <strong>Value Labels | Slurm sacct Field |</strong> | id | Unique primary key (full job ID string) | N/A | | JobID | | job_id | Numeric job ID in Slurm | N/A | | JobIDRaw | | array_pos | Array index if job array, else null | N/A | | ArrayTaskID | | array_range | Slurm array notation for array jobs | N/A | | ArrayTaskString | | name_hash | Anonymized hash of the job name | N/A | 7-char hex | JobName | | user_hash | Anonymized hash of the submitting user | N/A | 7-char hex | User | | account_hash | Anonymized hash of the allocation account | N/A | 7-char hex | Account | | submit_line_hash | Anonymized hash of the submit command line | N/A | 7-char hex | SubmitLine | | work_dir_hash | Anonymized hash of the working directory | N/A | 7-char hex | WorkDir | | submit_script_hash | Anonymized hash of the submit script | N/A | 7-char hex (null if not captured) | <em>*(not a standard sacct field)*</em> | | job_type_hash | Anonymized hash of the job type | N/A | 7-char hex (null if not captured) | <em>*(not a standard sacct field)*</em> | | python_job | Whether the job is a Python job | N/A | true / false | <em>*(derived)*</em> | | reframe_job | Whether the job is a ReFrame job | N/A | true / false | <em>*(derived)*</em> | | partition | HPC queue/partition requested | N/A | e.g., short, standard, debug, gpu-h100 | Partition | | state | Full Slurm job state string | N/A | e.g., COMPLETED, FAILED, PENDING, RUNNING, CANCELLED by {uid} | State | | state_simple | Simplified job state | N/A | COMPLETED, FAILED, PENDING, RUNNING, CANCELLED | <em>*(derived from State)*</em> | | submit_time | Timestamp when the job was submitted | timestamptz | | Submit | | start_time | Timestamp when the job started (null if PENDING) | timestamptz | | Start | | end_time | Timestamp when the job ended (null if PENDING/RUNNING) | timestamptz | | End | | nodes_req | Number of nodes requested | count | | ReqNodes | | processors_req | Number of CPUs requested | count | | ReqCPUS | | memory_req | Memory requested | string (e.g., "2366M", "85G") | | ReqMem | | wallclock_r

Topics & Keywords

Publication Details

Published in: DOE National Renewable Energy Laboratory (NREL) Repository

DOI: 10.7799/3023270

Command Palette

NLR HPC Kestrel Jobs Data

Authors

Abstract

Topics & Keywords

Publication Details