Search for a command to run...
This project develops a 2D virtual garment try-on system using a multi-stage deep learning pipeline to generate photo-realistic images from a person image and garment image, trained on the VITON-Zalando dataset. It follows a CP-VTON+ inspired architecture with two stages: the Geometric Matching Module (GMM) uses ResNet-18 and Thin Plate Spline (TPS) to align garments via control-point transformations, optimized with appearance loss and grid smoothness loss; the Try-On Module (TOM) is a UNet-based conditional GAN that synthesizes the final image using composition masks, trained with adversarial, L1, and VGG perceptual losses and evaluated using a PatchGAN discriminator. The system leverages agnostic-v3.2 representations, OpenPose key points (18-channel heatmaps), and a 22-channel input for accurate pose and body awareness. Training includes 20 epochs for GMM and 50 epochs for TOM on a Kaggle T4 GPU (~10 hours), with performance evaluated using FID (Fréchet Inception Distance). The central objective of this project is to develop a 2D image-based virtual try-on system capable of generating a realistic composite image of a person wearing a specified garment, without physical trial.