Search for a command to run...
Abstract While goal-driven artificial neural networks (ANNs) have successfully modeled important aspects of the primate ventral stream, their efficacy for the dorsal stream remains unclear. Here, we investigated how computational objectives and architectural constraints influence the neural alignment to MSTd, a dorsal area that demonstrates selectivity to complex optic flow patterns and is linked to self-motion perception. We systematically evaluated the neural alignment between 54 ANNs and Non-negative Matrix Factorization (NNMF) against key neurophysiological optic flow tuning properties of MSTd. We optimized these models on either a supervised self-motion estimation task (accuracy-optimized) or an unsupervised input reconstruction task (autoencoding) using both raw optic flow and model MT-encoded signals. Interestingly, accuracy on the self-motion task does not predict neural alignment. Instead, model performance bifurcates based on both objective and input encoding: autoencoders utilizing MT-like input signals consistently achieve superior correspondence with MSTd tuning preferences. Explicitly enforcing sparsity or non-negativity does not improve alignment; rather, these constraints often degrade the match to biological data. Furthermore, we demonstrate that neural alignment remains largely unaffected even when the pressure to generate an efficient code with few units is eased, suggesting that dimensionality reduction may not be a primary driver of MSTd-like tuning. Taken together, our results indicate that the tuning properties of MSTd are better explained by an unsupervised reconstruction-based objective than by supervised task optimization, suggesting a fundamental difference in the computational principles that govern the dorsal and ventral streams. Significance Statement Goal-driven neural networks have revolutionized our understanding of the ventral visual stream, yet their effectiveness in modeling the dorsal stream remains less clear. We systematically evaluated 54 neural network models to identify the computational principles that drive neural-like optic flow tuning in dorsal stream area MSTd. Surprisingly, we find that accuracy-optimized models fail to replicate biological tuning. Instead, models that reconstruct motion inputs from a biologically plausible MT-like representation achieve the highest consistency with MSTd neurons. These findings suggest that the organizational principles of dorsal stream area MSTd may be better explained by an unsupervised, reconstruction-based objective rather than one focused on the accuracy of self-motion estimation, suggesting a fundamental difference in the computational objectives of the two visual streams.