Search for a command to run...
Abstract Understanding orthologous and homeologous relationships among genes informs how evolution and artificial selection have shaped plant genomes of related species to adapt to different niches and produce diverse phenotypes. Grasses are ecologically and economically important plants, and many genes are conserved within syntenic blocks across species. Here, we describe the Pan-Grass Syntenic Gene Set (PGSGS), a curated dataset of orthologous and homeologous relationships among 746 743 protein-coding genes from 17 grass genomes anchored to sorghum, including both major and orphan crops, as well as wild grasses. From this analysis, 344 230 genes (46%) were identified as syntelogs, with 27 567 sorghum genes linked to at least one syntelog in other species. Of these, 11 624 syntelogs form a conserved core present in at least 15 species. PGSGS-core syntelogs are enriched for regulatory and cellular functions, form densely interconnected GO networks, and exhibit reduced nucleotide diversity and elevated Tajima’s D relative to nonsyntenic genes in ~400 sorghum accessions. Conversely, PACMAD-specific syntelogs (n = 2266) is a diverse set of clade-specific retained genes. This dataset provides a foundation for understanding genome evolution after polyploidy and encourages the integration of functional genetic and genomic information across grasses, fulfilling the original promise of the “grasses as a single genetic system.”