Search for a command to run...
A thunderstorm is a weather system that can trigger severe natural disasters, characterized by sudden onset, short duration, and significant damage. Accurate forecasting of thunderstorms has long been a challenging task. Data-driven artificial intelligence (AI) technologies have provided new solutions, yet AI-driven thunderstorm forecasting still lacks high-quality thunderstorm training datasets. Leveraging lightning data from the China Meteorological Administration’s Advanced Direction and Time-of-Arrival Detecting (ADTD) network and the three-dimensional Very Low Frequency/Low Frequency (VLF/LF) lightning location data of the Institute of Electrical Engineering, Chinese Academy of Sciences, we have constructed an AI training dataset for thunderstorms over China (AITDTS) through four sequential procedures: rigorous data quality control, multi-source integration, thunderstorm-prone area labeling, and feature extraction. The AITDTS encompasses 85,071 thunderstorm events and 3,973,171 corresponding gridded samples at 10 min temporal resolution and 1 km × 1 km spatial resolution across China during 2016–2023. Each sample includes location labels, 38 radar-derived physical parameters with a 10-min temporal resolution and 62 environmental parameters with an hourly temporal resolution. We further quantified predictor information gain for thunderstorm forecasting: radar echo top/base heights, composite reflectivity, vertical integrated liquid water content and reflectivity at 10 km showed high information gain. Atmospheric instability, dynamic uplifting, moisture conditions and vertical wind shear at 1 km exhibited moderate information gain. The AITDTS can be directly applied to training and evaluation of AI-driven forecasting models, offering critical data for thunderstorm nowcasting.