Search for a command to run...
Flexible behavior requires building internal structured models from experience to support goal-directed action. Although environmental transition statistics accumulate gradually, how they are used to construct a compact goal-directed graph remains unclear. We introduce the Sparse Cognitive Graph (SCG), a reinforcement-learning framework that separates gradual transition learning from the sparse directed graph that governs valuation and action selection. In the SCG, transition statistics accumulate in a dense predictive representation, while nonlinear selection determines which transitions are expressed as graph edges. Consequently, gradual strengthening can trigger discrete reorganization of graph topology and abrupt behavioral shifts. Across human reward and transition revaluation tasks, the SCG explains bimodal and trimodal behavioral regimes as emergent consequences of distinct graph configurations. Across human and mouse two-step tasks, dynamic graph reconfiguration captures canonical reward-by-transition interactions without requiring mixtures of control systems. In mice, transitions preceding reward strengthened more rapidly, biasing graph topology toward reward-directed paths. Temporally precise optogenetic dopamine stimulation produced behavioral effects consistent with accelerated graph edge formation predicted by the SCG. The model further generates a testable prediction: graph topology determines the geometry of low-dimensional population activity. Directed acyclic graphs yield activity concentrated at graph entry and goal states, whereas cyclic graphs produce periodic, grid-like structure. Together, these findings identify reward-dependent graph reorganization as a computational principle that reconciles stable predictive learning with efficient goal-directed control.