Phylogenetic reconstruction and species distribution data for C. reinhardtii VIA1 (Cre07.g338350)

20260 citationsDatasetgreen Open Access

Authors

Pamela Vetrano

Kelsey Krall

Abstract

Study Context & Key Findings This dataset contains supplementary phylogenetic data supporting the analysis of the VIA1 protein in C. reinhardtii (Cre07.g338350). These data were generated to validate and extend the GreenCut assignment of VIA1 by reconstructing its phylogeny and assessing its distribution across eukaryotic and prokaryotic lineages. The analyses supported by these files demonstrate that VIA1 is highly conserved across most photosynthetic eukaryotes, including species bearing primary and secondary plastids, though it has been frequently lost in dinoflagellates. The presence of homologs in cyanobacteria, alongside the eukaryotic distribution, supports a plastid-associated origin for VIA1 dating back to the primary cyanobacterial endosymbiosis in the Viridiplantae. Methodology To reconstruct the VIA1 phylogeny, the C. reinhardtii ortholog (Cre07.g338350) was searched against a series of predicted proteomes representing eukaryotic and prokaryotic diversity using Diamond BLASTP v2.1.11 (ultra-sensitive, E < 10-5). These datasets comprised predicted chlorophyte proteomes (n = 13), plant proteomes (n = 37), the EukProt v3 TCS dataset, and selected prokaryotic reference proteomes (n = 5,143) from UniProt. The resulting hits were extracted and aligned with MAFFT v7.520, and alignments were trimmed with a gap-threshold of 50% using trimAl v1.4. Initial phylogenies were inferred using IQ-Tree v2.3.6 and the LG4M substitution model with support derived from Shimodaira-Hasegawa approximate likelihood ratio tests (SH-aLRT, n = 1,000). Orthologs were selected phylogenetically and extracted before being re-aligned with MAFFT using the L-INS-i algorithm. The alignment was trimmed with a gap-threshold of 10% and used to generate a hidden Markov model (HMM) using HMMER v3.1b2. To increase search sensitivity, proteomes were re-searched using the VIA1 HMM (E < 10-5) and the resulting hits were curated phylogenetically. To generate the final phylogeny, an alignment was generated using VIA1 homologs and MAFFT L-INS-i before being trimmed with a gap-threshold of 50%. The phylogeny was inferred using IQ-Tree and the LG+C60+F+I+R8 substitution model selected using ModelFinder. The phylogeny was visualized using FigTree v1.4 and ITOL v6. To account for missing gene models, the genomes of taxa lacking VIA1 were searched using MiniProt v0.18, and homologs were predicted and phylogenetically curated when found. This item includes protein sequences, alignments, phylogenies, and datasets used to perform this analysis.

Topics & Keywords

Publication Details

Published in: Figshare

DOI: 10.6084/m9.figshare.31442650.v1

Command Palette

Phylogenetic reconstruction and species distribution data for <i>C. reinhardtii</i> VIA1<i> </i>(Cre07.g338350)

Authors

Abstract

Topics & Keywords

Publication Details