Search for a command to run...
Automotive Diesel Engine Dataset Including Faults Introduction The dataset was generated using a mean-value engine simulation model (see [1]) of a two-stage charged Diesel engine with high pressure exhaust gas recirculation (HP-EGR). It contains a wide range of standardized powertrain signals, as well as multiple fault implementations relevant for research in areas, such as - anomaly detection- predictive maintenance- time series modeling- machine learning for system health assessment Data was simulated under different speed profiles, i.e. drive cycles: - Public emission drive cycles (e.g. WLTC)- Real-world driven drive cycles Thus, data contains a wide range of operating points. All signals were recorded at 10Hz. There are a total of 15 different drive cycles, each available in healthy conditions. The cycles `Cycle_RDE_Eifel` and `Cycle_RDE_Eifel2` contain additional 3 fault types, each with 3 severity levels. In total, this leads to 1572243 samples:- 368565 Healthy samples (15 drive cycles)- 133742 samples (`Cycle_RDE_Eifel` and `Cycle_RDE_Eifel2`) per combination of fault and it's severity Summary - Sampling rate: 10 Hz- Number of signals: 18- Drive cycles: 15- Cycles including faults: Cycle_RDE_Eifel, Cycle_RDE_Eifel2- Fault types: 3 (with 3 severities each) General information Detailed information on all drive signals, faults & drive cycles are provided in: - `metadata.json`: signals (name, description, original unit) & fault types (name, severity)- `cycles.json`: drive cycles (name, type, duration) Standardization All signals were standardized using a z-score normalization performed per signal: $z = \frac{x - \mu}{\sigma}$ For each signal, the mean ($\mu$) and standard deviation ($\sigma$) were computed from healthy data only. The resulting standardization parameters were then applied to all fault conditions and drive cycles. Exception: `EGR_position_desired`, which is the desired position of the HP-EGR, was not normalized, as the original percentage scale is often meaningful for diagnostic and control applications. Metadata `metadata.json` contains two sections: - signals: all powertrain signals- fault_types: all implemented faults and their severities (applicable for `Cycle_RDE_Eifel` and `Cycle_RDE_Eifel2`) Data Structure All signals are located in `processed/main_df_standardized.parquet`. This parquet file contains: - all standardized signal columns- drive cycle name- fault type- cycle boundary flag The flag column is set to `1` at the first time step of each drive cycle and `NaN` otherwise. ```text feature_1 feature_2 ... drive_cycle fault_type flag0 . . . . .1 . . . . .2 . . . . .``` The first 18 columns correspond to the signals defined in `metadata.json`, followed by the columns `drive_cycle`, `fault_type` and `flag`. Loading the Dataset To load the parquet file in python as a dataframe, do the following: ```pythonimport pandas as pdpath_to_standardized_data = './processed/main_df_standardized.parquet'main_df_standardized = pd.read_parquet(path_to_standardized_data)``` Data for a specific drive cycle, fault and feature can be obtained as follows: ```pythonfeature = "InPrs"cycle = "Cycle_RDE_Eifel"fault = "EGVClog_10" subset = main_df_standardized[(main_df_standardized["drive_cycle"] == cycle) & (main_df_standardized["fault_type"] == fault)][feature]subset = subset.reset_index(drop=True)``` References \[1\] Blanco-Rodriguez, David & Vagnoni, Giovanni & Aktas, Sahin & Schaub, Joschka. (2016). Model-based Tool for the Efficient Calibration of Modern Diesel Powertrains. MTZ worldwide. 77. 54-59. 10.1007/s38313-016-0103-5.