Search for a command to run...
Summary In this study, we investigate the problem of wellbore trajectory optimization under subsurface constraints, including the avoidance of pre-existing wells. Traditional optimization techniques, such as metaheuristic and gradient-based methods, have limited capability in directly handling collision avoidance and typically require iterative recalculations for each new design, resulting in increased computational effort. Reinforcement learning (RL) provides an alternative approach by enabling adaptive policy learning through interaction with the environment. We propose a deep recurrent Q-network (DRQN)-based framework, in which engineering constraints are incorporated into the reward function to support multisegment trajectory planning. The framework accounts for factors such as collision avoidance, dogleg severity (DLS) limits, and target-entry parameters, enabling multiobjective design without explicitly defined optimization equations or repeated replanning. Convolutional and long short-term memory (LSTM) layers are used to support learning across trajectory segments. In offline training, the DRQN model demonstrated more stable convergence than the standard deep Q-network (DQN), which showed inconsistent performance under the same conditions. In online testing scenarios, DQN failed to converge and was unable to generate valid trajectories, whereas DRQN successfully produced feasible solutions. When compared with the particle swarm optimization (PSO) method, DRQN yielded similar trajectory results but required substantially less computational time. These results suggest that the proposed DRQN-based framework is applicable to adaptive and constraint-aware trajectory planning tasks in well construction design.