Adaptive Inference Acceleration With Fine-Grained Model Partitioning for Mobile Edge Intelligence

20260 citationsJournal Article

Authors

Peng Wang · M-Gen Mobile Technology (China)

Liya Huang · Northwestern Polytechnical University

Wen Sun · Northwestern Polytechnical University

Yi Yang · Northwestern Polytechnical University

Dusit Niyato · Nanyang Technological University

Di Wu · City University of Hong Kong

Abstract

Edge intelligence deploys artificial intelligence models on edge nodes proximal to data sources, and delivers real-time inference support for resource-constrained devices. To realize this vision, inference offloading differs from conventional computation offloading by tailoring offloading strategies to the intrinsic characteristics of AI inference tasks. In this field, existing researchs generally lack fine-grained model partitioning capabilities and long-term resource adaptability, failing to optimize resource utilization and sustain stable performance in mobile environments. To address these issues, we propose an adaptive inference acceleration framework that dynamically partitions inference models into hierarchical subtasks and offloads these subtasks to heterogeneous edge servers. We formulate a joint optimization problem for task partitioning, offloading and resource allocation, which takes queue stability as the constraint and aims to minimize the long-term average task completion time. To realize the optimal trade-off between latency and stability without future state prediction, we adopt Lyapunov optimization to decompose the long-term stochastic optimization into slot-by-slot solvable deterministic subproblems. For these slot-by-slot subproblems, we design a Q-network Mixing (QMIX)-based multi-agent reinforcement learning method to enable collaborative strategy selection across edge servers. Experimental simulations show that, compared with baseline algorithms including the greedy, genetic and MAD2RL methods, our proposed framework achieves a substantial reduction in task completion time while preserving inference accuracy and queue stability.

Topics & Keywords

IoT and Edge/Fog Computing Age of Information Optimization Advanced Neural Network Applications

Publication Details

Published in: IEEE Transactions on Mobile Computing

DOI: 10.1109/tmc.2026.3675369

Field-Weighted Citation Impact: 0.00

Command Palette

Adaptive Inference Acceleration With Fine-Grained Model Partitioning for Mobile Edge Intelligence

Authors

Abstract

Topics & Keywords

Publication Details