Search for a command to run...
In photosynthetic eukaryotes of the green lineage, the expression of the chloroplast genome is mainly regulated post-transcriptionally, by RNA-binding proteins encoded in the nuclear genome termed organelle trans-acting factors. Most of those identified to date belong to two families of α−solenoid proteins - the pentatrico-peptide repeat (PPR) and octatrico-peptide repeat (OPR) families - and interact with specific sequences on their target mRNAs through a domain composed of repeated motifs, allowing their maturation, splicing, editing, stabilization and translation activation. To identify new organelle trans-acting factors, we developed three approaches for annotating α-solenoid proteins targeted to the chloroplast or the mitochondria. One to identify distant homologs of existing organelle trans-acting factors families, and two others (decision tree and random forest classifiers) to identify new organelle trans-acting factors families. The combined approaches efficiently retrieve previously annotated organelle trans-acting factors in 2 model organisms. It identified 1067 OPR proteins and 4983 PPR proteins in 43 proteomes of Archaeplastida. Our analysis also identified chimeric proteins composed of both OPR and PPR domains. Finally, our results identified 3300 other α-solenoid candidates which are likely to participate as new regulators of organelle gene expression. In particular, we identified new candidates in species in which the regulatory mechanisms of plastid gene expression are still understudied, such as in the glaucophyte Cyanophora paradoxa and the red alga Porphyridium purpureum. Our study contributes to the extensive description of organelle trans-acting factors by providing valuable new tools to decipher their repertoire and new candidates for experimental characterization in the entire eukaryotic tree of life.