Search for a command to run...
A key limitation of existing feature fusion methods is their static nature: they learn dataset-specific integration policies during training that remain fixed at inference. This restricts adaptability to new domains and input characteristics, often causing brittle generalization, hierarchical information decay, and inefficient relational modeling. We propose Meta-Learned Dynamic Hierarchical Fusion (MDHF), which reframes feature fusion as context-conditioned policy generation. MDHF uses a bi-level meta-learning objective to learn a transferable prior over fusion strategies, enabling sample-specific dynamic integration via a single gradient-free forward pass. This is realized through three co-designed modules: (1) Meta-Learned Dynamic Attention for input-adaptive feature weighting, (2) Bidirectional Hierarchical Integration to prevent information decay, and (3) Adaptively Sparse Graph Fusion for efficient O(n log n) relational reasoning. Extensive evaluation across five benchmarks—including standard (CIFAR-10, ImageNet-200), fine-grained (Stanford Cars, Oxford-IIIT Pets), and robustness (Caltech-101) tasks—shows that MDHF consistently outperforms 15 baselines. It achieves fine-grained gains (e.g., + 1.3% on Stanford Cars, + 1.1% on Oxford Pets over DGFT), uses 60–68% fewer parameters than comparable transformer-based models, and delivers 36% faster inference than FC-Former. Under strong adversarial attack (AutoAttack, ε = 8/255), it retains 85% of clean accuracy. Boundary condition analysis further quantifies its data efficiency, cross-domain generalization, and predictable limits under extreme domain shifts and texture-only tasks. These results establish MDHF as a principled, efficient paradigm for context-aware visual recognition, balancing adaptive generalization with inference-time efficiency for dynamic, multi-domain applications.