Search for a command to run...
Peppers (Capsicum) are a very important agricultural crop worldwide. There are five major cultivated species of Capsicum: C. annuum L., C. frutescens L., C. chinense Jacquin, C. baccatum L. and C. pubescens Ruiz & Pavon (Heiser, 1969). Each of the five cultivated species has 12 chromosomes (Moscone et al., 2007). The genomes of two C. annuum cultivars, Zunla-1 (Qin et al., 2014) and CM334 (Kim et al., 2014), were first sequenced in 2014. The genomes of C. baccatum PBC81 and C. chinense PI159236 were also sequenced and reported recently (Kim et al., 2017). According to these studies, the genome sizes of these pepper species vary between 3.2 and 3.9 Gb, nearly 80% of which consists of repetitive regions. To explore genetic variability and diversity that exist within a species, pan-genome studies have been recently reported for some important crop plants, including corn, soybean, rice, and wheat (Golicz et al., 2016; Montenegro et al., 2017). These plant pan-genome studies have shown that significant gene presence–absence variations (PAVs) within a species can contribute to trait variation. For this reason, we constructed and annotated the first pepper pan-genome. Given that pepper genomes are highly repetitive, leading to the potential false identification of single nucleotide polymorphisms (SNPs), we used the gene PAV approaches for our phylogenetic analysis and a genome-wide association study (GWAS), instead of using SNPs. Using results of the read coverage for genes, we also identified genetic variations in some key genes involved in the biosynthetic pathways of capsaicinoids and carotenoids, which are two of the distinctive traits in the pepper plants compared with other crops. We have also created a website to provide access to the pepper pan-genome data. The pan-genome construction was the same as the rice pan-genome study (Sun et al., 2016), using the Zunla-1 genome (Qin et al., 2014) as the reference. From the next generation sequencing (NGS) data of 383 cultivars, including 355 C. annuum, four C. baccatum, 11 C. chinense and 13 C. frutescens (Supporting Information Table S1), of an ongoing resequencing project, 610 292 novel contigs were generated, with the total 956 430 739 bp (Table S2), representing a 28.4% increase relative to the 3.36 Gb genome size of the reference genome. The pepper pan-genome consists of 13 chromosomes (Chr00 is made of unordered scaffolds) from the reference genome, and the novel contigs from this study. Pan-genome gene structure and function annotation step results are available in Methods S1, Figs S1 and S2, and Tables S3 and S8. A total of 89 181 genes were predicted to be present in the pan-genome, which were divided into two groups: 51 757 high quality (HQ) genes and 37 424 low quality (LQ) genes, based on their Annotation Edit Distance scores and relationship with transposable elements. Of the HQ genes, 124 were predicted to be alternatively spliced, resulting in a total of 51 900 mRNAs; 6984 (13.5%) of which are from novel contigs. PAV analysis was performed for each predicted HQ gene in each cultivar, and then a phylogenetic tree was constructed (Methods S1). The phylogenetic analysis (Fig. 1a) shows that the 383 pepper cultivars divide into four major species groups, and that relationships among the four pepper species in our study are in agreement with previously reported results that C. chinense and C. frutescens are more closely related to each other than to C. annuum (Baral & Bosland, 2004). Furthermore, C. baccatum is even more varied in exhibiting longer karyotype lengths relative to the other three species and a more complicated heterochromatic banding pattern (Moscone et al., 2007). We also classified the HQ genes into core genes, and species-specific genes. For our study, if a gene was present in ≥ 50% of the cultivars of a species, it was considered to be important to that species. Using this criterion, most HQ genes were classified into one or more species, resulting in 15 groups, as shown in the Venn diagram in Fig. 1(b). Among four species, 28 840 (55.7%) of HQ genes were shared, and were thus considered to be core genes of the pepper pan-genome; 4633 genes (8.9%) were not in any of the 15 groups in the Venn diagram, probably due either to the genetic differences within each species or to variations caused by the low NGS depth or by uneven sequencing. We have developed a website PepperPan which allows researchers to visualize the pan-genome, the annotation data, the read alignment, and the read coverage for all 383 cultivars. Users can click on each of the 15 regions shown on the Venn diagram (Fig. 1b) to access the genes present in each combination of species, do simple searches for annotation data using the locus, gene symbol or a keyword, or search by gene position, with the results shown either in table or graph format. Users can select the pan-genome from the ‘browsers’ pulldown menu to open the pan-genome browser, then enter the chromosome name, optionally with the range information, to view the genome and zoom into the base level (Fig. 1c). Users can also optionally select the features listed, such as HQ and LQ genes and mRNAs (with gene structure information), RNAseq stringtie and trinity transcripts, Repeatmask result, miRNA, tRNA. For high-level read coverage information, users can select features and cultivars on the left panel or alignment details at the base level to show SNPs. We have also aligned the reads of four C. baccatum cultivars and 11 C. chinense cultivars to the corresponding genomes from Kim et al. (2017) and have made the alignment and coverage information for these two genomes available in the PepperPan website. Although typically GWAS shows the association between traits and SNPs, theoretically GWAS can also be used to assess an association between traits and any type of genotypic variation. In our study, the gene PAV values, in the form of the percentage of the gene length covered by reads for each of the predicted HQ 51 757 genes in 383 pepper cultivars on the pan-genome, were used as the genotype data. A single, significant association between the red carotenoid contents and the predicted gene pan06g005570 (capsanthin/capsorubin synthase, Ccs) on chromosome 6 was detected (Fig. 1d). This case study showed that the gene PAVs can be used as the genotypic data for GWAS as the traits are controlled by gene PAVs. We also investigated whether key genes in the carotenoid and capsaicinoid biosynthetic pathways carry large deletions by mapping the reads of each cultivar to the pan-genome. For the carotenoid biosynthetic pathway, our read coverage analyses (Tables S4, S5) showed that 26 C. annuum cultivars had a deletion around pan06g005570 (predicted Ccs) region. All of these 26 cultivars produce yellow or orange fruits (Table S5). By aligning the reads from these cultivars to the pan-genome with the genome browser in PepperPan, we confirmed that 24 of 26 C. annuum cultivars indeed had a 4.43 kb deletion of the pan06g005570 gene region (example cultivar CS16 in Fig. 1e) and that the remaining two cultivars had a 2.96 kb deletion (example cultivar HL2 in Fig. 1e) compared to the red-fruit cultivar NJ03, confirming the GWAS result. We also studied the read coverage for phytoene synthase gene (Psy) and found that four of the C. annuum cultivars without the red color in fruits had the same 20-kb deletion that spans the pan04g025380 (predicated Psy) region completely (Fig. S3a; Tables S4, S5). For the capsaicinoid biosynthetic pathway, we found that 50 of the C. annuum cultivars with low capsaicin content had a 2.5 kb deletion around pan02g021380, the predicted Pungent gene 1 (Pun1) region (Fig. S3b; Tables S6, S7). The same deletion in Pun1 has been previously reported in C. chinense (Lee et al., 2005; Stewart et al., 2005). The information we provide at the PepperPan website can enable pepper researchers, including those without bioinformatics skills, to easily study genes of interest by viewing the predicted gene structures in each cultivar, as well as other features on the pan-genome such as repeats, miRNA, tRNA, and read alignments. The sequence variations in the interested gene areas will be very useful for researchers to develop the functional markers. Our phylogenetic analysis results will be useful to pepper genetic researchers, as well as breeders who perform interspecific hybridization. The genetic variants that we used in our GWAS were gene PAVs, in the form of the read coverage variations of genes. This approach successfully identified PAVs in the Ccs gene as being associated with the fruit color, and should, in general, identify associations when traits are controlled by genes with sizable deletions. The PepperPan website URL is http://www.pepperpan.org:8012/, where the pan-genome and all the annotation data can be viewed and downloaded. The novel contig sequences reported in this paper have been deposited in the Genome Warehouse in BIG Data Center, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, under accession number GWHAAAT00000000 that is publicly accessible at http://bigd.big.ac.cn/gwh. This work was supported by National Key R&D Program of China 2016YFD0101704. LO, XZ, YM and BW co-designed the experiments and data analyses, supervised the study, and wrote the manuscript. DL, LO, JL, WC, WL, HG and QZ conducted the bioinformatics work. WC, ZZ, SY, YL and JW conducted the field experiments and sampling. XL, BY, SZ and JZ assisted in the bioinformatics work. XD, JW, HY, BO, F Li and F Liu advised on experiments and data analyses. LO, DL, JL and WC contributed equally to this work. Please note: Wiley Blackwell are not responsible for the content or functionality of any Supporting Information supplied by the authors. Any queries (other than missing material) should be directed to the New Phytologist Central Office. Fig. S1 The pan-genome annotation process overview. Fig. S2 The RNAseq assembly process overview. Fig. S3 The deletions around two predicted genes in some cultivars shown in the PepperPan genome browser. Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.