Search for a command to run...
Growing environmental awareness and the increasing emphasis on sustainability have heightened the focus on circular economy models, particularly remanufacturing, which extends product life cycles through systematic refurbishment. Remanufacturing operations encompass three key stages: (1) remanufacturing, (2) stock management, and (3) assembly. This study optimises these interconnected processes while explicitly considering timing-related uncertainties in the remanufacturing–assembly system that impact operational performance. A remanufacturing production control model is developed to coordinate the disassembly of end-of-life products, remanufacturing of components, and hybrid assembly of final products. These finished products may integrate both remanufactured and new components, requiring effective material flow management to maintain operational efficiency and quality standards. The objective of this study is to support decision-makers in maintaining efficient production flows, meeting customer demand, and mitigating system uncertainties that may arise throughout the remanufacturing-assembly workflow. To achieve this, key performance metrics such as throughput, machine-related disruptions, and material release are continuously monitored to guide system optimisation. A reinforcement learning (RL)-based approach is proposed to determine the optimal material flow, service level management, and on-time delivery of remanufactured products within a hybrid assembly environment. The assembly process operates under three distinct operational modes, allowing the RL agent to dynamically adapt its decision-making strategy based on real-time system conditions. Additionally, the agent explicitly accounts for machine failures in its decision framework to proactively manage disruptions and maintain a stable material flow. Simulation results demonstrate that the decentralised RL-based production control system outperforms conventional heuristics, achieving an average cumulative reward improvement of approximately 4% compared to the best-performing combination of Constant Work-In-Progress (CONWIP) and the One-Disassembly method, and 33% compared to the combination of CONWIP and the Simple Disassembly method. The findings suggest that RL is suitable for addressing typical timing variability and disruption effects in remanufacturing systems, including stochastic return timing, condition-dependent processing-time variability, and machine failures.