Search for a command to run...
False monsoon onsets involve an early-season wet spell followed by a prolonged dry spell, often resulting in agricultural losses when sowing is initiated during premature rains and farmers are unprepared for the dry conditions. Despite its importance for risk reduction for hundreds of millions of farmers in the tropics, the predictability of these pre-monsoonal wet–to–dry events remains largely unexplored. Here, we benchmark six state-of-the-art artificial intelligence weather prediction (AIWP) models (AIFS, FuXi, FuXi-S2S, GraphCast, GenCast, NeuralGCM) and a numerical weather prediction (NWP) model (IFS) against novel, decision-relevant historical reference forecasts to assess the ability to predict the false monsoon onset at lead times up to 30 days. We find that both AIWP and NWP models exhibit positive predictive skill in the core monsoon zone of India, with ensemble-based probabilistic models retaining positive predictive value relative to these reference forecasts across all lead times. Deterministic skills vary strongly with regions, with good short-lead predictability (0-10 days) and a decrease in skills at longer lead times (11-30 days). We further evaluated the models using well-documented canonical false onset events from the literature and found that skillful forecasts are associated with the ability to reproduce the large-scale circulation evolution characteristic of false onsets, in particular the progression from a transient monsoon-like state to a subsequent circulation collapse that produces a dry spell.We use agriculturally relevant thresholds to define monsoon onset, wet spells, and dry spells. To enable a meaningful assessment of model skill, the reference forecast is constructed from 124 years of gridded rain-gauge observations and quantifies the baseline probability of false monsoon onsets within a decision-relevant framework. We first calibrate model-specific event-definition wet- and dry-spell thresholds using quantile mapping within a leave-one-year-out cross-validation framework, rather than applying bias correction directly to rainfall fields. Forecast performance is evaluated using deterministic and probabilistic metrics, including probability of detection, false alarm ratio, critical success index, and Brier score. Reliability diagrams show systematic overconfidence at higher forecast probabilities, indicating the need for additional calibration and post-processing. Together, this framework establishes a decision-relevant benchmark and evaluates current AI-based and physics-based forecast systems for the sub-seasonal early warning of false onsets involving dry spells.