Search for a command to run...
Barley (Hordeum vulgare L.) is a globally important cereal crop with high genetic diversity and adaptability to abiotic stresses, making it a key resource for developing climate-resilient varieties (Dawson et al. 2015; Raj et al. 2023). Although high-throughput sequencing has generated vast genomic variation data in barley (Jayakodi et al. 2024), the accessibility and utility of these data for researchers and breeders remain constrained due to limitations in existing databases, such as restricted scale, narrow data types and lack of integrated analytical tools. To bridge this gap, we introduce BarleyGVDB (https://www.barleygvdb.cn/), a large-scale, all-in-one variome database that supports population genetics and molecular breeding research in barley (Figure 1a). We integrated genomic data from 2677 publicly available barley accessions (comprising 761 wild accessions, 769 landraces, 958 cultivars and 189 Tibetan hull-less barley accessions) and 207 accessions sequenced by our group (Table S1). Detailed information for all accessions is provided in the Sample module. Variant calling was performed following the Genome Analysis Toolkit (GATK) best practices workflow: reads were aligned to the Morex v3 reference genome (Mascher et al. 2021) using BWA v0.7.17-r1188, processed with SAMtools v1.17 and sambamba v0.6.6, and variants were called with the GATK v4.2.0.0. All identified variants are organised in the VARinfo module by genotyping technology, with a dedicated section for chip-based data (Figure S1). Interactive visualisation tools display the genomic distribution and classification of variants. The VARsearch module enables queries by genes or genomic regions with customizable filters for variant type, functional annotation, genotype and accession subsets, including an error-rate filter to enhance reliability. Results are exportable in multiple formats (e.g., Excel, CSV and VCF) to accommodate diverse downstream analyses (Figure S2). BarleyGVDB offers dedicated modules for genetic analysis and interactive visualisation. The GeneHap module resolves haplotypes of target genes and generates multi-dimensional outputs, including haplotype tables, gene structure models, linkage disequilibrium (LD) heat maps and haplotype networks. The principal component analysis (PCA) module performs dimensionality reduction on genome-wide variant data, enabling interactive exploration through scatter plots and exporting principal component scores (Figure S3). The Tree module constructs evolutionary trees to visualise genealogical relationships among accessions. The Structure module supports user-defined group numbers (K values), customizable visualisation parameters and export in multiple image formats for flexible presentation. Furthermore, the platform incorporates essential bioinformatics tools, including BLAST for sequence alignment, JBrowse for genome navigation and Gene Ontology (GO) enrichment analysis. It also integrates population genetics analyses such as PCA, phylogenetic reconstruction and population structure inference. Users can upload custom variant files (e.g., VCF) for personalised analyses. This integrated framework offers multidimensional support for the functional interpretation of genomic variation. Leveraging this resource, we selected genome-wide, diverse variants from a panel of wild barley, landraces, cultivars and Tibetan hull-less barley to develop the first liquid-phase 40 K SNP array for barley (40 529 SNPs; Figure S4). Functional annotation classified the SNPs into major genomic categories, with 57.4% in intergenic regions, 6.05% in UTRs and 16.69% in upstream gene regions (Figure 1b). To evaluate its utility for genetic diversity studies, we compared nucleotide diversity (π) estimates from the 40 K SNP array with a resequencing dataset of 155 barley accessions. The π distributions across subpopulations were closely aligned with those from resequencing data (Figure 1c). Furthermore, PCA, phylogenetic tree and admixture analyses based on the array yielded population structures highly congruent with those derived from whole-genome resequencing, particularly in distinguishing major groups such as wild barley, landraces and cultivars (Figure S5). Selection scans identified a strong selection signal around HvF2KP, which encodes a fructose-6-phosphate, 2-kinase involved in sugar metabolism (Chen et al. 2023). This suggests that HvF2KP may have been targeted during barley domestication, potentially contributing to the improvement of traits such as grain starch content and size. HvF2KP displayed major haplotypes in both wild and cultivated barley, with a minor wild haplotype becoming dominant under domestication selection (Figure 1d). Domestication selection has also been reported to shape traits such as heavy metal homeostasis in crops (Maccaferri et al. 2019). HIPP06 encodes a metal-binding protein involved in metal homeostasis and detoxification (Deng et al. 2022). Our results based on the 40 K array showed that one minor HvHIPP06 haplotype in wild accessions increased in frequency in cultivated barley, suggesting its role in metal regulation (Figure 1d). These results highlight the SNP array's value in analysing population structure and domestication signatures. We further genotyped 216 Tibetan hull-less barley accessions using the 40 K SNP array and performed genome-wide association study (GWAS) for yield-related traits. This analysis identified 132 and 187 loci associated with spike number and spike length, respectively (Figure 1e), including known spike-related genes such as HvPRF5, HvYAO and HvBB (Disch et al. 2006; Liu et al. 2015; Li et al. 2010). To explore haplotype-assisted breeding potential, we analysed allele combinations of spike-related genes included on the array. Among 11 spike number-associated genes, 31 homozygous haplotypes were identified, with six found in more than five accessions. Phenotypic evaluation revealed that haplotypes hap2 and hap3 had the highest spike numbers, demonstrating their breeding potential (Figure 1f). All SNP, probe and GWAS association data are integrated into BarleyGVDB. In summary, we developed BarleyGVDB, which to our knowledge represents the most comprehensive barley genomic variation database to date. It significantly expands upon existing resources by integrating diverse data types, including a unique 40 K SNP array, and providing an integrated analysis workflow from variant query to population genetics, thereby complementing rather than replacing existing databases. The platform enables variant search, haplotype analysis and population genetics studies. Based on this resource, we designed and validated a 40 K SNP array for precise genotyping. BarleyGVDB will be continuously updated with multi-omics data and advanced prediction models to provide a closed-loop platform to support global barley research from discovery to breeding. L.C. and X.N. supervised the project. T.L., Z.C. and H.Z. analysed resequencing data. T.L. built the database. K.W. and B.L. conducted SNP array analysis. T.L. and X.N. wrote the manuscript. Z.Z., W.S. and X.N. revised the manuscript. We thank all data providers and we are also grateful to the High-Performance Computing platform of Northwest A&F University for providing computional resource. This research was supported by the National Natural Science Foundation of China (Grant No. U22A20453), the Open Project of Shaanxi Laboratory for Agriculture in Arid Areas (2024ZY-JCYJ-02-20) and the Natural Science Foundation of Jiangxi Province (20232BAB205012). The authors declare no conflicts of interest. All data and tools are available via BarleyGVDB at https://www.barleygvdb.cn/. Data S1: pbi70580-sup-0001-Supinfo.pdf. Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.