Search for a command to run...
A detailed description of the RODEM Jet Datasets is provided at arXiv:2408.11616. Jet types There are five different types of datasets: Light jets: simulated via QCD dijet events (QCD.tar.gz) Jets from W bosons: simulated via WZ production (WZ.tar.gz) Jets from top quarks: simulated via ttbar production (ttbar.tar.gz) Semi-visible jets: simulated via dark-sector quarks (SIMP.tar.gz) Resonant Higgs boson production: simulated via type-II two-Higgs-doublet models (2HDM.tar.gz) The tar.gz archives contain files in the HDF5 format, compressed using 7z. For types 1 to 4, validation and training splits of 5% of the total event count are provided. The remaining events are split into (decompressed) chunks no larger than 8GB. For the 2HDM models, two production modes (via g-g fusion and b-bbar annihilation) and two decay modes (h --> jj and t --> tb) are simulated. In addition, various heavy-Higgs and light-Higgs mass combinations were produced. Dataset content All HDF5 files contain four dataset objects: jet1_obs – observables for the leading jet jet1_cnsts – constituent array for the leading jet jet2_obs – observables for the subleading jet jet2_cnsts – constituent array for the subleading jet The latter two are not present in the WZ files. The observable dataset objects contain one row per event with 11 entries (in this order): pT, eta, phi, mass, tau1, tau2, tau3, d12, d23, ECF2, ECF3 (for details on the calculation, see arXiv). The constituent dataset objects contain 100 rows per event with seven entries each. The 100 rows represent (up to) 100 jet constituents; if the jet has fewer, the rows are zero-padded. The seven entries per row are (in this order): pT, eta, phi, mass, charge, D0, DZ (for details, see arXiv). Usage Example The following snippet loads 100,000 jets and their constituents from one of the QCD input files, then creates distributions of the jet transverse momenta and the number of constituents: import h5py import numpy as np import matplotlib.pyplot as plt # The input HDF5 file containing the QCD jets. input_qcd = "h5files/QCDjj_pT_450_1200_train01.h5" # The number of jets to load. n_jets = 100_000 def load_jets(ifile: str, n_jets: int): """Load jets and constituents from an HDF5 file.""" with h5py.File(ifile, "r") as f: cnsts = f["objects/jets/jet1_cnsts"][:n_jets] jets = f["objects/jets/jet1_obs"][:n_jets] zeros = np.repeat(cnsts[:, :, 0] == 0, cnsts.shape[2]) zeros = zeros.reshape(-1, cnsts.shape[1], cnsts.shape[2]) cnsts = np.ma.masked_where(zeros, cnsts) return jets, cnsts qcd_jets, qcd_constituents = load_jets(input_qcd, n_jets=n_jets) # Plot the transverse momentum of the jets. plt.hist(qcd_jets[:, 0], label="QCD jets", bins=30) plt.xlabel(r"$p_{\mathrm{T}}$ [GeV]") plt.ylabel("Number of jets") plt.show() # Plot the number of constituents in the jets. plt.hist(qcd_constituents.count(axis=1)[:, 0], label="QCD jets", bins=100, range=(0.5, 100.5)) plt.xlabel("Number of constituents") plt.ylabel("Number of jets") plt.show() Citing this work Please cite the work as follows: K. Zoch, J. A. Raine, D. Sengupta, and T. Golling. RODEM Jet Datasets. Available on Zenodo: 10.5281/zenodo.12793616. Aug. 2024. arXiv: 2408.11616 [hep-ph]. Bibtex entry: @misc{Zoch:2024eyp, author = "Zoch, Knut and Raine, John Andrew and Sengupta, Debajyoti and Golling, Tobias", title = "{RODEM Jet Datasets}", eprint = "2408.11616", archivePrefix = "arXiv", primaryClass = "hep-ph", month = "8", year = "2024", note = "Available on Zenodo: \href{https://doi.org/10.5281/zenodo.12793616}{10.5281/zenodo.12793616}." }