Search for a command to run...
We present evidence that a well defined subset of intron positions shows a non-random distribution in ancient genes. We analyze a database of ancient conserved regions drawn from GenBank 101 to retest two predictions of the theory that the first genes were constructed by exon shuff ling. These predictions are that there should be an excess of symmetric exons (and sets of exons) f lanked by introns of the same phase (positions within the codon) and that intron positions in ancient proteins should correlate with the bound- aries of compact protein modules. Both these predictions are supported by the data, with considerable statistical force (P values < 0.0001). Intron positions correlate to modules of diameters around 21, 27, and 33 A, and this correlation is due to phase zero introns. We suggest that 30-40% of present day intron positions in ancient genes correspond to phase zero introns originally present in the progenote, while almost all of the remaining intron positions correspond to introns added, or moved, appearing equally in all three intron phases. This proposal provides a resolution for many of the arguments of the introns-earlyyintrons-late debate. exons, or sets of exons, tend to begin and to end in the same phase, to be multiples of three bases. This argument was shown to hold at about the P 5 0.01 level (7). A second argument showed that intron positions were correlated with an aspect of the three-dimensional structure of ancient proteins, specifi- cally that intron positions were associated with compact mod- ules of diameters 21, 27, and 33 A, with P values less than 0.01 (8). Both of these regularities are predictions of any theory that holds that some or all of the introns were used in the progenote to assemble the genes for these proteins by exon shuffling; neither of these regularities is predicted by theories which hold that the introns were inserted into DNA by processes that are unrelated to the ultimate structure of the gene product. However, in the last year two papers have appeared that continue the argument that introns are late. One by Cho and Doolittle (9) tries to study a possible coincidence of intron positions in gene pairs that represent duplications that oc- curred in the progenote, ancient paralogous genes to ask whether the pattern of intron positions in those genes is more suggestive of intron addition or intron loss. A second paper studying the intron distribution in a large gene family argues that the pattern observed is more one of addition or movement than loss (10). The continuing increase of DNA sequences in the public databases, increasing by a factor of two every 18 months, has led us to reinvestigate this problem using much more data. In this paper, we shall show that the statistical regularities mentioned above can now be analyzed in greater detail with much higher statistical confidence. We reaffirm the basic regularities that we saw before, but now, since there is more data, we can go further in the analysis of the correlation of introns with three-dimensional structural elements. This fur- ther analysis shows that the strong correlation is carried by introns that lie between codons (in phase zero), while the introns that lie within the codons (phase one and phase two) do not show strong correlations with three-dimensional struc- ture. This analysis suggests an explicit description of intron positions in terms of both ancient introns and later additions in a way that resolves the conflict between the two viewpoints. We conclude that about 35% of the introns present in ancient genes are ancient, lie primarily in phase zero between codons, and are related to compact elements of protein structure, modules, ranging in diameter between 21 and 33 A. About 65% of the introns have been added to pre-existing genes, equal fractions in each of the other phases uncorrelated to structure. This division explains why certain analyses see a large fraction of introns as being added to previously existing genes, while the theory that the original genes were constructed through in- trons remains the simplest and strongest way of predicting the observed regularities.