Comprehensive analysis of chloroplast intron-containing genes and conserved splice sites in dicot and monocot plants

Despite the increasing knowledge on the importance of the intron splicing of chloroplast genes during plant growth and stress responses, identification of intron-containing chloroplast genes and determination of splice sites in chloroplast introns are still lacking. Here, we carried out a comprehensive analysis of the chloroplast genome sequences in important plants and crops, including four dicots (Arabidopsis thaliana, Coffea arabica, Nicotiana tabacum, and Panax schinseng) and four monocots (Musa acuminata, Oryza sativa, Triticum aestivum, and Zea mays). The results showed that both dicot and monocot chloroplast genomes harbor 6 intron-containing tRNAs (trnA, trnG, trnI, trnK, trnL, and trnV) and 10-12 intron-containing mRNAs (atpF, rpl2, rpl16, rps16, ndhA, ndhB, petB, petD, rpoC1, rps12, ycf3, and clpP). Notably, rpoC1 and clpP lacked introns in monocot plants, except M. acuminata. Analysis of the nucleotide sequences of chloroplast introns revealed that the 5’-splice sites, 3’-splice sites, and branch-point sites of the chloroplast introns were highly conserved among dicots and monocots. Notably, the 5’-splice sites and 3’-splice sites of the chloroplast introns were similar to those of the nuclear U12 introns, whereas the branch-point sites of the chloroplast introns were homologous to those of the nuclear U2 introns. Taken together, these results indicated that the chloroplast genomes contained strictly limited intron-containing genes with conserved splice sites, suggesting that splicing of chloroplast introns was important for chloroplast biogenesis and function in both dicot and monocot plants.


INTRODUCTION
The chloroplast, the green plastid that is found only in plant and algal cells, is not only a major cellular organelle for photosynthesis but also plays important roles in many aspects of plant physiology and development, such as the biosynthesis of phytohormones, amino acids, fatty acids, and vitamins, the storage of a variety of products, the assimilation of sulfur and nitrogen, and function as a global sensor of abiotic stresses [1].It was hypothesized that plastids are derived from endosymbiotic cyanobacteria [2], and cyanobacteriaorigin of chloroplasts was firmly illustrated by recent molecular phylogenetic analysis [3].The chloroplast is a double membrane-bound organelle that contains the thylakoid system where the process of light reaction of photosynthesis occurs.The chloroplast is semi-autonomous and contains a single circular DNA as its own genome.Since the first chloroplast genome

Trang 61
was sequenced in tobacco (Nicotiana tabacum) which consists of 155,844 bp containing 4 rRNA genes, 30 tRNA genes, and 50 protein coding genes [4], the complete chloroplast genome sequences over 800 plants and algae have been determined and deposited in the NCBI database (http://www.ncbi.nlm.nih.gov),including some important crop plants and model plants in both monocot and dicot species such as Arabidopsis thaliana (154,478 [9], Triticum aestivum (134,540 bp containing 4 rRNA, 30 tRNA, and 71 protein coding genes) [10], and Zea mays (140,387 bp containing 4 rRNA, 30 tRNA, and 70 protein coding genes) [11].The chloroplast genome is transcribed as polycistronic units by plastid-encoded and nuclear-encoded RNA polymerases [12,13], which encodes approximately 120-140 genes that participate in photosynthesis, transcription, and translation of chloroplast genes.Importantly, it has been demonstrated that chloroplast gene expression is regulated mainly at the posttranscriptional level, such as RNA splicing, RNA processing, RNA editing, RNA degradation, and translation [14][15][16].
RNA splicing is the process of cutting introns out of precursor RNAs (pre-RNAs) and ligating the exons together to form mature RNA, which is one of the most important posttranscriptional regulations of the gene expression in the chloroplast as well as in the nucleus.For accurate splicing to occur, specific signals on RNA precursors must exist to identify where to "cut and paste", and many previous studies have revealed the consensus sequences at 5"-splice site, 3"-splice site, and branch-point site found in the introns of nuclear mRNAs: introns in mammals contain the conserved sequences 5"-AG/GUAAGUintron-YNCURAC-Y n NYAG/G-3", introns in plants contain the conserved sequences 5"-AG/GUAAGUintron-CRUAY-GCAG/G-3", and introns in yeast contain the conserved sequences 5"-AG/GUAUGUintron-UACUAAC-YAG/-G-3, where N is any bases, Y and R is either pyrimidine (U or C) or purine (A or G), respectively, and A is the conserved adenine nucleotide at branch-point site [17].
The introns found in approximately 20 chloroplast genes are classified as group II introns, except a single group I intron found in trnL gene, by virtue of the conserved features of primary sequences and predicted secondary structures [18][19][20][21][22][23].Although chloroplast introns belong to group I or group II introns, the splicing of which occurs via a selfsplicing mechanism, splicing of chloroplast introns is not self-splicing but is dependent on many nuclearencoded protein factors [23,24].Because the splicing of chloroplast group II introns differs from that of self-splicing group II introns, it is interesting to determine whether chloroplast introns contain splicing signals similar to or different from the signals found in nuclear introns.To answer this question, we analyzed the nucleotide sequences in exon-intron junctions of chloroplast genes in selected dicot and monocot plants, and report that the 5"-splice sites, 3"-splice sites, and branch-point sites of the chloroplast introns are highly conserved among dicots and monocots.

Retrieval of intron-containing genes from the chloroplast genome sequences
The maps of the chloroplast genomes in diverse plant species, including four dicot species (Arabidopsis thaliana Trang 62 [11]), were obtained from the indicated references, and the nucleotide sequences of intron-containing tRNA and mRNA genes described in each reference were downloaded from the National Center for Biotechnology Information (NCBI) database (http://ncbi.nlm.nih.gov).

Analysis of the 5'-splice site, 3'-splice site, and branch-point site of chloroplast introns
For the identification of conserved intron sequences at the 5"-and 3"-splice sites, the sequences at the 5"-end of introns spanning 5 nucleotides (nts) in the exon and 10 nts in the intron, and the sequences at the 3"-end of introns spanning 10 nts in the intron and 5 nts in the exon were extracted from the predicted exon-intron junction sequences, and the sequences were analyzed using the WEBLOGO program (http://weblogo.threeplusone.com).For the prediction of branch-point sites, approximately 100 nts upstream of the 3" splice sites were compared with the previously confirmed branch-point sequences [25][26], and the sequences showing a high similarity were extracted and analyzed using the WEBLOGO program.

Identification and characterization of chloroplast intron-containing genes
The genomes of plant chloroplasts are circular DNA consisting of approximately 130,000-160,000 base pairs and harbor approximately 140 genes, among which 16 to 20 genes contain introns.To obtain information on which genes contain introns, the maps of chloroplast genomes in diverse plant species, including four dicot species (Arabidopsis thaliana, Coffea arabica, Nicotiana tabacum, and Panax schinseng) and four monocot species (Musa acuminata, Oryza sativa, Triticum aestivum, and Zea mays), were obtained from indicated references, and the structures of chloroplast genomes were analyzed (Fig. 1).The results showed that there are six introncontaining tRNA genes (trnA, trnG, trnI, trnK, trnL, and trnV) in the chloroplasts of both dicot and monocot plants.Monocot plants contain ten introncontaining mRNA genes (atpF, rpl2, rpl16, rps16, ndhA, ndhB, petB, petD, rps12, and ycf3), whereas dicot plants harbor twelve intron-containing genes with two additional intron-containing genes (rpoC1 and clpP) in addition to the ten intron-containing mRNA genes found in monocot plants.However, M. acuminata, a monocot plant, retains introns in rpoC1 and clpP genes (Table 1).Notably, three mRNA genes, rps12, ycf3, and clpP, contained two introns.Interestingly, the rps12 is split into three separate parts on the chloroplast genome; the exon 2 and exon 3 are separated by a cis-intron and are transcribed with the downstream rps7, whereas the exon 1 is cotranscribed with the upstream clpP and the downstream rpl20 genes, after which the two separate transcripts are joined together by a trans-splicing process to form mature rps12 mRNA [27].
x denotes the genes whose sequences were included in the present analysis.
* denotes the genes having two introns.

Consensus sequences in the 5'-splice sites, 3'-splice sites, and branch-point sites of the chloroplast introns
To obtain the information on the sequences in the 5"-splice sites, 3"-splice sites, and branch-point sites of the chloroplast introns, nucleotide sequences of 48 intron-containing chloroplast mRNAs in dicots and 42 intron-containing chloroplast mRNAs in monocots (Table 1) were downloaded from NCBI database.The sequences of 10 nucleotides at the 5"-and 3"-ends of each intron were analyzed using the WEBLOGO software.The results showed that GU at the 5"-end and (A/C)(C/U) at the 3"-end were highly conserved in both dicot and monocot plants (Fig. 2).These predicted 5"-and 3"-splice sites of chloroplast introns are slightly different from those found in the splice sites of nuclear introns (Fig. 3).The major U2-type introns contain the highly conserved GU at the 5"splice sites and AG at the 3"-splice sites, whereas the minor U12-type introns contain the conserved (G/A)U at the 5"-splice sites and A(G/C) at the 3"splice sites [26,28] (Fig. 3).
We then analyzed the intron sequences to identify putative branch-point sites.
The sequences of approximately 100 nucleotides at the upstream of the 3"-splice sites were selected and aligned with the known branch-point sequences of humans and plants [25,26].The sequences of putative branch-point sites are showed in Fig. 2. Seventy two out of the 86 putative branch-point sites (83.7 %) were found at approximately 60-40 nucleotides upstream of the 3"splice sites.Analysis of the nucleotide sequences using the WEBLOGO software revealed that (C/U)U(C/U)A(U/C) is conserved at the branch-point sites in the chloroplast introns of both dicot and monocot plants (Fig. 2).These predicted branch-point sites in chloroplast introns are not identical to those found in nuclear introns that contain the conserved (C/U)U(A/G)A(U/C) in the U2-type introns and the UU(A/G)A(U/C) in the U12-type introns (Fig. 3).Correct splicing of introns in pre-RNAs is one of the most important steps for the regulation of gene expression in the chloroplast as well as in the nucleus.Nucleotide sequences in the 5"-splice sites, 3"-splice sites, and branch-point sites are highly conserved in the nuclear introns of plants [26,28] (Fig. 3).By contrast, our current analysis revealed that the 5"and 3"-splice sites in chloroplast introns are less conserved than those in nuclear introns (Fig. 2), and that the 5"-and 3"-splice sites in chloroplast introns are more similar to those in U12-type introns than to those in U2-type introns (Fig. 3).Notably, more variations were observed in the sequences and positions of branch-point sites between nuclear and chloroplast introns.The branch-point sites of the nuclear U12-type introns in plants harbor the conserved -UUnAn-sequences and are located approximately 10-16 nucleotides upstream of the 3"end of the U12 introns, whereas the branch-point sites of the nuclear U2-type introns harbor the conserved -nUnAn-sequences and are located approximately 20-40 nucleotides upstream of the 3"-end of the U2 introns [26,29] (Fig. 3).Interestingly, the sequences in the branch-point sites of chloroplast introns in both dicots and monocots are similar to those of the U2 introns (Fig. 2), whereas the positions of branch-point sites are quite different in that approximately 84% of chloroplast introns have the branch-point sites approximately 40-60 nucleotides upstream of the 3"end of introns.These conserved and divergent sequences in the 5"-splice, 3"-splice, and branch-point sites of the nuclear and chloroplast introns suggest that the nucleus and chloroplasts not only share common machineries for intron splicing, but also harbor specific components to mediate intron splicing in each organelle.It would be interesting to determine how the differences in intron sequences between the nucleus and chloroplasts are recognized, and how the introns are correctly spliced out by different splicing complexes.

Science & Technology Development, Vol 20, No.T1-2017
Trang 66 CONCLUSIONS Analysis of the chloroplast genome sequences revealed that introns are found in identical genes in dicot and monocot plants, and that the 5"-splice sites, 3"-splice sites, and branch-point sites of chloroplast introns are highly conserved in dicot and monocot plants.Given

Fig. 2 .
Fig. 2. Conserved sequences of splicing sites and branch point sites in the introns of chloroplast genes.Intron sequences in 48 mRNAs from 4 dicot species (Arabidopsis thaliana, Coffea arabica, Nicotiana tabacum, and Panax schinseng) and in 42 mRNAs from 4 monocot species (Musa acuminate, Oryza sativa,Triticum aestivum, and Zea mays) were analyzed using the WEBLOGO program (http://weblogo.threeplusone.com).The height of the letters at each nucleotide position indicates the degree of conserved sequence.

Fig. 3 .
Fig. 3. Consensus sequences in the 5"-splice, 3"-splice, and branch-point sites of the nuclear introns.The nucleotide sequences in the U2-type and U12-type introns in plants, moss, or alga were analyzed using the WEBLOGO program.The height of the letters at each nucleotide position indicates the degree of conserved sequence [26].