Detection of short tandem repeats in the cattle genome: a comparison of bioinformatic tools

20260 citationsJournal Articlegold Open Access

Authors

Amanda J. Chamberlain · La Trobe University

Abstract

Short tandem repeats (STRs) are repetitive DNA sequences with 1–6 nucleotide repeat units, exhibiting high polymorphism due to varying repeat counts. STRs are more variable than SNPs and can cause genetic disorders. With population-scale cattle whole-genome sequencing data available, whole-genome STR identification has attracted new interest, but challenges remain due to the lack of standardized methods, sequencing data limitations, and the diversity of STR-calling tools. This study compared six STR-calling tools: HipSTR, GangSTR, and ExpansionHunter for short-read data, and Straglr, RepeatHMM, and LongTR for Oxford Nanopore (ONT) long-read data—using sequences from five Holstein cattle (two parent–offspring trios with a shared sire). This is the first cattle study to evaluate short- and long-read STR callers using both data types from the same animals. In short-read data, ExpansionHunter identified the highest number of polymorphic STRs (pSTRs) (327,690), followed by HipSTR (205,900) and GangSTR (110,680), with 93,023 loci detected by all three tools. In long-read data, LongTR detected 470,250 pSTRs, RepeatHMM 224,185, and Straglr 90,275, with only 33,253 loci shared among them. Mendelian consistency of STR genotypes in the trio offspring was high (> 0.8) for all short-read tools, with HipSTR and GangSTR highest at 0.98. LongTR was the only long-read tool with high consistency (0.88). Short-read tools also showed higher concordance in STR genotypes among themselves than was observed among long-read tools. However, long-read tools had a clear advantage in detecting large STRs. Relative to computational efficiency, HipSTR and GangSTR (short-reads), and LongTR (long-reads) required less memory and shorter runtimes than the other tools. Tool selection is critical for accurate whole-genome STR identification in cattle. For short-read data, HipSTR showed relatively high Mendelian consistency and concordance compared to the other tools, while ExpansionHunter was able to detect longer STRs but with lower Mendelian consistency. For long-read data, LongTR demonstrated higher consistency and computational efficiency relative to the other tools. Based on these results, HipSTR and LongTR are suggested as preferred options for short-read and ONT long-read datasets, respectively, in cattle STR analysis. These recommendations are based on the metrics observed in this study, and confirmatory analyses across additional breeds, larger sample sizes, and validated truth sets are encouraged.

Topics & Keywords

Genetic diversity and population structure Genomics and Phylogenetic Studies Identification and Quantification in Food

UN Sustainable Development Goals

Zero hunger

Publication Details

Published in: BMC Genomics

DOI: 10.1186/s12864-026-12753-4

Field-Weighted Citation Impact: 0.00