Search for a command to run...
Abstract Many insects manipulate plants by injecting effector proteins. In one extreme example of this molecular “hijacking”, Hormaphis cornu aphids inject bicycle proteins into Hamamelis virginiana (Witch Hazel), contributing to the development of novel organs called galls. Bicycle proteins share no amino acid sequence similarity with proteins of known function. Here, we report the crystal structures of two divergent bicycle proteins. Both proteins contain saposin-like folds: one with multiple disulfide bonds exhibits a helix swap; the other has no disulfide bonds and possesses two tandem domains. To explore the structural evolution of bicycle proteins, we predicted bicycle protein structures with Alphafold2 (AF2). While AF2 did not recover the two experimental structures using existing databases, it succeeded after we provided multiple sequence alignments (MSAs) containing protein sequences encoded in new genome sequences from closely related aphid species. Using this customized approach at scale, we generated 2400 high-confidence predictions for bicycle proteins from seven aphid species. This dataset revealed that bicycle proteins without cysteines are outliers in fold space and appear to have evolved from ancestral proteins with disulfide-bonded saposin-like folds. While all bicycle proteins contain predicted saposin-like folds, they display a vast diversity of structural and physicochemical properties. While this diversity thwarts prediction of conserved functions encoded in structure, it suggests that bicycle proteins have evolved to target diverse plant processes and/or to evade plant immune surveillance. Significance statement Parasites introduce specialized “effector” proteins into hosts, both to suppress host immunity and to release nutrients. The molecular functions and structures of most effector proteins are unknown. Effector proteins often evolve rapidly and share no similarity with proteins of known function. Here, we demonstrate that machine learning algorithms can accurately predict the structures of aphid “bicycle” effector proteins when supplemented with data from closely related species. We exploit this finding to generate predictions of 2400 bicycle protein structures. These proteins exploit a common motif, yet exhibit diverse structures that form distinct structural clusters. Despite the clustering of these proteins in structure space, they occupy a nearly uniformly physicochemical space, suggesting that they encode a large diversity of molecular functions.