Search for a command to run...
MIRROR (Metrics for Identifying Repackaged and Reused Android Apps) is a publicly available dataset designed to support empirical, reproducible research on Android application repackaging. The dataset comprises 2,775 original Android applications and 4,351 corresponding repackaged variants, collected from the AndroZoo repository. Each application has been analyzed using a static bytecode analysis pipeline to extract twelve structural code-level metrics, covering dimensional, complexity, and object-oriented characteristics. All repackaged applications are explicitly linked to their original counterparts using validated SHA-256 identifiers, enabling precise original–repackaged pairing, per-original variant grouping, and metric-delta analysis. MIRROR facilitates research on Android repackaging detection, structural similarity and divergence analysis, software quality assessment, and benchmarking of adversarial code modification techniques. The dataset is intended for use by researchers and practitioners studying mobile security, software forensics, and large-scale Android ecosystem analysis. If you use this dataset, please cite the associated paper describing the MIRROR dataset. @inproceedings{MIRROR, author={Sebastian Siedler and Karim Elish}, booktitle={3rd ACM International Conference on AI Foundation Models and Software Engineering (FORGE)}, title={MIRROR: A Dataset of Structural Metrics for Repackaged Android Apps}, year={2026}} App Access Information: The apps referenced in this dataset are real-world Android applications. To adhere to research ethics guidelines and comply with data sharing and redistribution policies, we do not directly distribute APK files. Instead, researchers are required to obtain the applications through the official AndroZoo repository. AndroZoo: https://androzoo.uni.lu/ Usage Instructions: Request access to the AndroZoo repository by following the instructions provided on the official website. Use the cryptographic hash values (e.g., SHA-256) included in our dataset to retrieve the corresponding application samples from the repository.