Search for a command to run...
The complex terrain and diverse management practices in tea-producing regions have resulted in highly fragmented tea plantation plots, posing challenges to precision cultivation, yield estimation, and ecological management. Although remote sensing technology has been increasingly applied to tea plantation mapping, most studies have focused on the overall identification of tea-growing areas, while research on the fine-grained classification and extraction of tea plantation plots within the same spatiotemporal range remains limited. To address this gap, this study proposed a DVIT-UNet model based on ultra-high-resolution unmanned aerial vehicle (UAV) imagery, which integrates a Vision Transformer (ViT) and dilated convolution modules within a UNet framework to effectively capture global semantic dependencies and multi-scale local contextual information. This design specifically targets blurred parcel boundaries, high intra-class heterogeneity, and spectral similarity between tea plantations and surrounding vegetation. Comparative experiments against seven stable deep learning models demonstrated that DVIT-UNet achieved the best performance, with a mean Intersection over Union (mIoU) of 90.48%, F1 score of 94.99%, mean recall (UA) of 94.39%, mean precision (PA) of 95.60%, and a Matthews correlation coefficient (MCC) of 91.13%. Despite its moderate parameter size, the model achieved accurate delineation of small and fragmented tea plots and robustly suppressed false positives in complex backgrounds. The results comprehensively verify the strong capability of DVIT-UNet for fine-grained classification and precise extraction of tea plantation plots from high-resolution UAV imagery, providing a reliable technical foundation for precision tea-plantation management and ecological monitoring.