Methods for joint genetic prediction of multiple ordinal categorical and continuous traits

20260 citationsJournal Articlehybrid Open Access

Authors

Daniela Lourenco · University of Georgia

Abstract

The landscape of animal and plant breeding is rapidly evolving. Companies now invest millions in collecting extensive data on health, disease, welfare, and fertility, which are key fitness traits relevant for sustainable production. Likewise, genomic-based predictions or GWAS-based searches for causal variants in categorically scored psychiatric disorders and other diseases are key areas in clinical genetics. The categorical nature (yes/no, scores, etc.) of these traits violates the normality assumptions of standard linear models; threshold or threshold-linear models represent conceptually attractive alternatives. In Bayesian settings, these models can be fitted either via Gibbs sampling or by maximizing the posterior density (maximum a posteriori, MAP). While Gibbs sampling is flexible and can handle multivariate models with several categorical and continuous traits, its computational and memory demands limit its applicability in large-scale applications and genetic and genomic predictions. MAP methods are computationally efficient and suitable for large-scale datasets. However, existing MAP methods are restricted to a single categorical trait and many continuous ones, making them insufficient for the new generation of data. The lack of proper methods for fitting highly dimensional threshold-linear models for quantitative genetics has persisted for over two decades. Failing to model categorical traits appropriately can lead to reduced genetic gain and deterioration of the traits over time. This study develops theoretically sound and computationally efficient framework for jointly analyzing multiple categorical and Gaussian traits, specifically multi-trait threshold-linear models, including or not pedigree and genomic information. The proposed MAP multiple-trait threshold-linear model methods are based on Newton-Raphson and Expectation-Maximization schemes. Using a simulated dataset and treating Gibbs sampling as the benchmark, we found that the breeding values estimated by our methods agree nearly perfectly with those obtained from Gibbs sampling, while greatly reducing computational cost and allowing scalability to very large datasets. As expected, the Newton-Raphson iteration outperformed the Expectation-Maximization algorithm in terms of computational time. Our results demonstrate that routine genetic evaluations incorporating multiple categorical traits are now feasible using the presented methodology. Solving this decades-old gap enables accurate, large-scale genetic and genomic predictions of key categorical fitness traits under multi-trait models, unlocking the full potential of genomic selection for complex traits in animal and plant populations and genomic-based predictions in clinical genetics.

Topics & Keywords

Genetic and phenotypic traits in livestock Genetic Mapping and Diversity in Plants and Animals Genetic Associations and Epidemiology

UN Sustainable Development Goals

Responsible consumption and production

Publication Details

Published in: Genetics

DOI: 10.1093/genetics/iyag086

Field-Weighted Citation Impact: 0.00