Using next generation transcriptome sequencing to predict an ectomycorrhizal metabolome
© Larsen et al; licensee BioMed Central Ltd. 2011
Received: 29 October 2010
Accepted: 13 May 2011
Published: 13 May 2011
Skip to main content
© Larsen et al; licensee BioMed Central Ltd. 2011
Received: 29 October 2010
Accepted: 13 May 2011
Published: 13 May 2011
Mycorrhizae, symbiotic interactions between soil fungi and tree roots, are ubiquitous in terrestrial ecosystems. The fungi contribute phosphorous, nitrogen and mobilized nutrients from organic matter in the soil and in return the fungus receives photosynthetically-derived carbohydrates. This union of plant and fungal metabolisms is the mycorrhizal metabolome. Understanding this symbiotic relationship at a molecular level provides important contributions to the understanding of forest ecosystems and global carbon cycling.
We generated next generation short-read transcriptomic sequencing data from fully-formed ectomycorrhizae between Laccaria bicolor and aspen (Populus tremuloides) roots. The transcriptomic data was used to identify statistically significantly expressed gene models using a bootstrap-style approach, and these expressed genes were mapped to specific metabolic pathways. Integration of expressed genes that code for metabolic enzymes and the set of expressed membrane transporters generates a predictive model of the ectomycorrhizal metabolome. The generated model of mycorrhizal metabolome predicts that the specific compounds glycine, glutamate, and allantoin are synthesized by L. bicolor and that these compounds or their metabolites may be used for the benefit of aspen in exchange for the photosynthetically-derived sugars fructose and glucose.
The analysis illustrates an approach to generate testable biological hypotheses to investigate the complex molecular interactions that drive ectomycorrhizal symbiosis. These models are consistent with experimental environmental data and provide insight into the molecular exchange processes for organisms in this complex ecosystem. The method used here for predicting metabolomic models of mycorrhizal systems from deep RNA sequencing data can be generalized and is broadly applicable to transcriptomic data derived from complex systems.
Within days of germination, 95% of the short roots of most conifers and deciduous trees form ectomycorrhizae (ECM) with soil fungi , a form of symbiosis between plants and fungi whose evolution dates back 360-410 million years . In ectomycorrhizae, the fungus forms a mycelial sheath around the plant's root, called the Hartig net that isolates the root from the soil and inhibits the development of short root hairs. Nutrients are exchanged between fungus and root across the apoplast, a zone that is outside both root and fungus, preventing direct contact between the fungus and plant cytoplasm and requiring that nutrients exchanged cross both fungal and plant cell walls. In ectomycorrhizae symbiosis, fungal and plant metabolisms are connected by a suite of transporters that shuttle essential nutrients across the apoplast from one organism to another. This union of the plant and fungal metabolisms, termed the mycorrhizal metabolome, provides greater environmental fitness to the partners in the symbiotic relationship than can be provided by either organism's metabolism alone. The mutualistic association provides the fungus with a source of photosynthetically derived carbohydrates . As much as 25% of the plant's net photosynthesis is used to support its fungal partner [3, 4]. The metabolic contributions of the fungus in return for those sugars are more diverse. The fungus increases the absorptive surface area of the root by the formation of an intense network of very thin hyphae in the soil, and is thereby able to explore and to access nutrients from a greater volume of soil than could be exploited by the plant's roots alone . Ectomycorrhizae fungi also contribute to tree nutrition by the mobilization of nutrients from organic material in the soil  and through mineral weathering . The network of hyphae in the soil is effective in taking up organic and inorganic nutrient resources such as phosphorus, nitrogen, zinc, copper and provides these nutrients to the host plant [8, 9]. The symbiotic fungus also provides a higher tolerance against abiotic and biotic stresses such as drought, toxic heavy metal concentrations and protection from pathogens [5, 10] and buffers the plant against sudden changes in its environment [11, 12].
An experimental model for the mycorrhizal system uses the ectomycorrhizae fungus Laccaria bicolor and tree species Populus tremuloides (aspen). Not only do these organisms readily form mycorrhizae in the laboratory, but also the L. bicolor and the closely related species Populus trichocarpa genomes have been sequenced and annotated through the efforts of the Department of Energy and Joint Genome Institute's [13, 14]. We used next generation sequencing (NGS) of the fully formed mycorrhizal transcriptome to construct a model of the ectomycorrhizal metabolome. The model is comprised of those expressed ectomycorrhizal genes that code for proteins that are either enzymes participating in metabolism or transporters capable of conveying nutrients from one organism to another. Previous studies reporting on differential expression [15–19] in mycorrhizal systems identify some aspects of regulation and metabolism but do not provide a comprehensive representation of the ectomycorrhizal metabolome. Although differential expression can highlight where a transcriptome differs in one biological condition to another, a full understanding of metabolism does not require that an associated gene or enzyme be differentially expressed to play a key role in a metabolic network, only that it be expressed. These previous studies have focused on discrete elements derived from either aspen or L. bicolor transcription profiles. Our approach simultaneously evaluates the transcript profiles of aspen and L. bicolor to identify the ways which fungal and plant metabolism merge to form this ecologically important symbiotic relationship. In the analysis of the predicted model of the ectomycorrhizal metabolome, we identify those regions in the model where aspen and L. bicolor expressed enzymatic activities that its partner does not. Where those unique metabolic contributions overlap with statistically significant, enriched expressed gene models for transporters, we find the likely nutrients that are exchanged between aspen and L. bicolor in the ectomycorrhizal interaction.
Ectomycorrhizae were established in Woody Plant Media (WPM) as an approach for identification of metabolic pathways unique to L. bicolor or aspen. This plant growth matrix is comprised only of nutrient salts and is devoid of any complex sources of organic nitrogen or phosphorus. Observations or inference for metabolism of complex molecules for ectomycorrhizae grown in this medium must therefore be attributable to the synthetic abilities of L. bicolor and aspen and would not be attributable to the symbiont catabolism of any compounds present in the media. L. bicolor and aspen roots were allowed to form fully mature ectomycorrhizae in culture, then were harvested and RNA extracted. The harvested mycorrhizal RNA is a mixture of transcripts from both L. bicolor and aspen. Using Illumina NGS technology and 'BowStrap' a tool for identifying statistically significant levels of gene expression, we sequenced the mycorrhizal transcriptome and identified the gene models that are detected as statistically significantly expressed at p < 0.001 in both biological replicates (Table S1). We make the assumption that if a gene is detected as expressed, then the protein encoded by that gene is also expressed. The predicted ectomycorrhizal metabolome was constructed by identification of enzymes and transporters associated with cellular metabolic compounds. Expressed genes that code for annotated enzymes were mapped to Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathways and expressed transporter proteins for sugars, phosphorous, and nitrogen containing compounds were identified and mapped to specific metabolites.
In the ectomycorrhizal interaction, the individual enzymes and transporters of plant and fungal metabolism remain largely compartmentalized, prevented from directly interacting by the apoplastic space. To exchange nutrients, plant and fungus must have the capacity for importing and exporting nutrients across their cell walls. For the ectomycorrhizal metabolome, nitrogen, phosphorous and carbohydrate compounds are essential elements of nutrient exchange and Gene Ontology (GO) annotations  were used as a systematic approach to identify expressed genes in the mycorrhizal transcriptome annotated as members of these transporter classes. GO annotations for predicted proteins coded by best gene models from sequenced genomes were generated by automated BLAST homology to proteins of previously identified function . These GO annotations are corroborated by individual publications on specific transporters for amino acids , ammonium [21, 22], sugar , and inorganic phosphate  which have been identified as encoded in the genome.
Expressed mycorrhizal transporters identified by their Gene Ontology annotation in either L. bicolor or aspen
Amino acid transport
Ammonium transporter activity
Sugar porter activity
Inorganic phosphate transporter activity
The lack of enrichment for ammonium transporter genes suggests that ammonium is not one of the primary mediums of interaction between aspen and L. bicolor in spite of the fact that ammonium salts are present in the growth medium. Amino acid transport, enriched in both aspen and L. bicolor suggests that amino acids are a major metabolite that is shared during ECM interaction. A number of nitrogen containing compound such as ammonia and amino acids have been proposed as candidate compounds for nitrogen exchange in ECM systems [26–28]. The correlation of the present model with amino acid transport is consistent with experimental data for arbuscular fungal and plant systems . The transported amino acids are likely originated from the action of fungal enzymes as the growth medium lacks amino acids. Aspen transcriptome is not enriched for sugar transporters, but aspen does not require them. Sucrose molecules enter the apoplast from aspen roots via diffusion, where it is hydrolyzed into hexose sugars by a sugar invertase produced by aspen . The glucose and fructose products of the invertase reaction are substrates for specific fungal hexose transporters . The observed enrichment of sugar transporters in the L. bicolor transcriptome is consistent with the experimental data demonstrating transfer of carbohydrates across the apoplast interface in ectomycorrhizal roots of aspen . The hexose transporter subset of enriched sugar transporters identified in this analysis overlaps with an expression profiling and characterization analysis of hexose transporters proteins encoded in the genome of L. bicolor .
(An interactive version of Figure 2, allowing a user to zoom in on any of the 145 metabolic sub-networks of map01100, is available in Supplementary Data, Figure S1-a http://www.bio.anl.gov/molecular_and_systems_biology/MycorMetabolome/Supplementalfigures.htm.
The utility of the derived metabolic network (Figure 1) as a model for metabolism during mycorrhizal interaction was evaluated by analysis expressed sequences for core metabolic functions. The ability of the sequence data to fully represent expression of core metabolic functions would indicate the suitability of the transcriptomic data to investigate all predicted metabolic pathways. The approach and results are largely consistent with a previous transcript profiling study that delineated the major pathways of carbohydrate metabolism in L. bicolor . To determine if genes that are expected to be expressed in aspen and L. bicolor are detected as expressed, we consider enrichment of core metabolism KEGG maps in the observed transcriptomics data. Expressed enzyme activities in the KEGG pathways for the core metabolic pathways pyruvate metabolism, glycolysis/gluconeogenesis (map00010), citrate cycle (map000230), Fatty acid metabolism (map00071), purine metabolism (map00230), and pyrimidine metabolism (map00240) are significantly enriched (p-value less than 0.0005) in both aspen and L. bicolor transcriptomes.
Metabolic pathways significantly enriched for enzyme activities expressed in both aspen and L. bicolor
KEGG Pathway (map#)
% Shared annotations
Amino Acid Metabolism
Alanine and aspartate metabolism (00252)
Arginine and proline metabolism (00330)
Glutamate metabolism (00251)
Valine, leucine and isoleucine degradation (00280)
Biosynthesis of Polyketides and Terpenoids
Diterpenoid biosynthesis (00904)
Butanoate metabolism (00650)
Citrate cycle (TCA cycle) (00020)
Pentose phosphate pathway (00030)
Starch and sucrose metabolism (00500)
Oxidative phosphorylation (00190)
Ether lipid metabolism (00565)
Fatty acid metabolism (00071)
Glycerolipid metabolism (00561)
Glycerophospholipid metabolism (00564)
Sphingolipid metabolism (00600)
Purine metabolism (00230)
Pyrimidine metabolism (00240)
Calcium signaling pathway (04020)
Phosphatidylinositol signaling system (04070)
Aminoacyl-tRNA biosynthesis (00970)
Expressed mycorrhizal EC activities unique to L. bicolor mapped to enriched KEGG pathways
KEGG Pathway (map#)
Amino Acid Metabolism
Arginine and proline metabolism (00330)
Aminosugars metabolism (00530)
Another set of unique activities in L. bicolor is the capability for the synthesis and utilization of allantoin via ureidoglycolate or urea (illustrated in Additional File 1,figure S1-b and S1-c). This capability is represented by the uric acid oxidase, allantoinase, allantoicase, and urease proteins encoded in the L. bicolor genome and expressed in the mycorrhizal metabolome. The assigned enzyme activities and annotations are supported by sequence alignments to conserved domain database profiles (CDD of NCBI; http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml), for these enzymes . The L. bicolor urate oxidase protein sequence (EC 18.104.22.168, Protein ID 190858) generates a position-specific scoring matrix (PSSM) alignment to the conserved domain for urate oxidase (d00445) with and E-value of 2.03e-61. The L. bicolor protein sequences for allantoinase (EC22.214.171.124), allantoicase (EC126.96.36.199), and urease (EC188.8.131.52) generate similar PSSM alignments consistent with the assigned activities (allantoinase,TIGR03178, E-value of 8.08e-121; allantoicase, TIGR02961, E-value of 4.71e-99; and urease TIGR01792, E-value of 0e+00. These observations suggest that the main nutrients L. bicolor exchanges with aspen roots for sugars are nitrogen-based compounds more complex than the ammonium salts already present in the media [26, 27].
Expressed mycorrhizal EC activities unique to aspen, mapped to enriched KEGG pathways
KEGG Pathway (map#)
Amino Acid Metabolism
Methionine metabolism (00271)
Phenylalanine metabolism (00360)
Arginine and proline metabolism (00330)
Pyruvate metabolism (00620)
Starch and sucrose metabolism (00500)
Flavonoid biosynthesis (00941)
Carbon fixation in photosynthetic organisms (00710)
Phenylpropanoid biosynthesis (00940)
Biosynthesis of steroids (00100)
Fatty acid biosynthesis (00061)
Xenobiotics Biodegradation and Metabolism
DDT degradation (00351)
Transcriptome data mapped at the level of KEGG pathways enabled identification of unique metabolic capabilities contributed by aspen and L. bicolor to the predicted mycorrhizal metabolome. To identify a priori the specific molecular mechanisms of those contributions it is necessary to consider the mycorrhizal metabolome model as a complete network of compartmentalized metabolisms interacting by transported metabolites through expressed membrane transporters. The process is enabled by identification of regions in the model represented in Figure 1 that exhibit unique interconnecting enzyme activities. Particular emphasis is assigned to those regions of unique activities that are also associated with compounds predicted to be transported by expressed, enriched amino acid transporters or sugar porters. Those identified regions underscore the most likely metabolic compounds that are synthesized by one of the ECM partners and transported across the apoplast to the partner organism. Several regions are identified using this strategy and lead to the identification of five compounds; fructose, sucrose, glycine, glutamate, and allantoin (highlighted in pink in Figure 1 and in interactive Additional File 1, Figure S1). These compounds meet the three criteria imposed for analysis of the predicted mycorrhizal metabolome; 1) the activities are uniquely expressed in one of the organisms, 2) the synthesized compound can be matched to appropriate exchange transporters, and 3) there is a reciprocal pathway in the partner organisms to enable utilization of the synthesized compound.
In ectomycorrhizal symbiosis, sucrose is released into the apoplast where it may be hydrolyzed into hexose sugars by a plant-provided acid hydrolase, EC184.108.40.206 [30, 39]. Five out of the eight aspen genes models annotated with the activity EC220.127.116.11 are significantly expressed in mycorrhizae (illustrated in Additional File 1, Figure S1-d and S1-e). However, no gene in L. bicolor is annotated with this activity indicating the fungus is dependent upon aspen's ability to cleave sucrose into hexoses that can be imported by L. bicolor. This hypothesis from the predicted mycorrhizal metabolome is consistent with accepted models for sugar transfer and with experimental observations . The genome of L. bicolor encodes several hexose importer genes  which are expressed in mycorrhizae (Table 1).
In spite of the presence of abundant ammonium salts in the growth media, we observe that L. bicolor is expressing enzymes for synthesizing nitrogenous compounds and membrane transporters for their export to aspen. Aspen expresses the transporters necessary for uptake and metabolism of those amino acids. The ability to absorb amino acids through the roots appears ubiquitous in plant species [27, 40, 41]. Glutamic acid in particular has been demonstrated to be significantly absorbed through the root systems  and glycine uptake has been observed in trees [41, 43, 44]. Unique enzymatic transformations expressed in L. bicolor generate L-glutamate from ammonium via EC 18.104.22.168, glutamic dehydrogenase. Aspen expressed a unique enzymatic pathway that converts L-glutamte to succinyl-CoA (Additional File 1, Figure S1). Aspen expresses a unique pathway from glycine to glycolysis/glucogenesis (illustrated in Additional File 1, Figure S1-f and S1-g).
Another set of unique activities in L. bicolor is the capability for the synthesis and utilization of allantoin via ureidoglycolate or urea (illustrated in Additional File 1, Figures S1-b and S1-c). Allantoin is used by plants, bacteria, and some fungi as a nitrogen and carbon source  and is a candidate metabolic product that is useful to aspen. The expression of these enzymes in L. bicolor indicates compounds derived from this pathway may supplement the ability of the fungus to supply nitrogen to the plant. Examples of plants that are able to import allantoin through their roots and the ability of some plants to use allantoin as sole nitrogen source have been previously reported [46, 47]. There is evidence that allantoin transporters are expressed in mycorrhizae. L. bicolor possesses at least one gene annotated as an allantoin permease, 248955, which is highly homologous to yeast protein DAL4, allantoin permease (1293888) with a BLASTp eValue 2e-88. Aspen also has at least one allantoin uptake transporter gene, 421134, homologous to Arabidopsis allantoin uptake ATUPS5 (NM_202186.2) with a BLASTp eValue of 8e-166. The current predicted metabolome model does not indicate the utilization of allantoin by aspen roots. This is, however, the result that is expected. Allantoin has been previously identified as one of the primary nitrogenous compounds exported from soybean root nodules to be metabolized in shoots [48, 49]. Assuming that allantoin is used similarly in aspen, that the metabolic pathways for immediate utilization of allantoin should not be observed as expressed in the root, but should be detected in transcriptomic analysis of aspen shoots..
We have generated a predicted model of L. bicolor and aspen root mycorrhizal metabolome using transcriptomic data. The mycorrhizal metabolome is comprised of the expressed metabolic enzymes in the mycorrhizal transcriptome and the transporters required for the exchange of metabolic compounds. In mycorrhizal symbiosis, aspen exchanges with L. bicolor photosynthetic sugars for nutrients. This expectation is validated by the statistically enriched expression of sugar porters by L. bicolor. The enrichment of KEGG amino acid metabolism pathways with unique expressed enzyme activities and the enrichment for expressed amino acid transporters for both aspen and L. bicolor indicate that, for mycorrhizae formed in WPM, L. bicolor's debt to aspen for carbon is paid with organic nitrogen. L. bicolor expresses the metabolic capacity to synthesize nitrogenous compounds such as glycine, glutamate and allantoin, via pathways not expressed in aspen roots. In the growth conditions used here, the predicted exchange compounds are the fructose and glucose as well as organic nitrogen compounds, specifically glycine, glutamate and perhaps allantoin. The predictions suggest L. bicolor is an active metabolic partner in addition to passively extending the absorptive surface of aspen roots. This role encompasses uptake of ammonium from the medium and synthesis of more complex compounds provided to the plant. Additional experiments in different nutrient environments are expected to uncover additional mechanisms of mycorrhizal metabolic interactions.
The model we generated confirms prior biological knowledge and predicts previously unobserved mechanisms of mycorrhizal interaction. To generate the model required transcriptomic analysis in addition to knowledge of aspen and L. bicolor genomic annotations as it is necessary to know not only the complete metabolic capacity of an organism, but what fraction of that capacity is being actively transcribed under specific biological conditions or developmental state. These predictions are suitable for the design of future, hypothesis driven biological experiments. A deepening understanding of the molecular components of mycorrhizal interactions, an important component of terrestrial forest ecosystems, will have applications in global carbon management and sustainability. The approach applied to this ectomycorrhizal system can also be generalized to additional symbiotic metabolomic systems. This method depends on accurate annotation of genomes and a complete picture of metabolism from KEGG. In the JGI gene model annotations for L. bicolor version 1, nearly 60% of the predicted gene models do have annotation and 30% do not share homology to any previously identified protein. As the quality of gene model annotation continues to improve and as more complete metabolic pathways are published, generation of future metabolomic models will become more and more accurate. Our available and interactive model of the mycorrhizal metabolome will serve as an important resource for other investigators.
Mycorrhizal aspen and L. bicolor were grown in biological duplicates. L. bicolor (Maire) Orton (strain S238N) culture was maintained on Modified Melin Norkan's (MMN) media as described  at 20°C. Populus tremuloides seeds were surface sterilized and germinated on McCown's woody plant media (WPM) in Petri dishes as described . One week old germinated seedlings were transferred to Magenta vessels (Sigma, St. Louis, MO) containing the interaction medium (WPM with 1.5% of sucrose). The seedlings were grown under 16 hr light and 8 hr dark cycles at 24°C for 4-5 weeks until fine lateral roots were developed. L. bicolor mycelial plugs were transferred to Magenta vessels with aspen plants for ectomycorrhizae formation. Mycorrhizal roots were visualized approximately after 6 weeks. Mycorrhizal roots were collected, snap frozen in liquid nitrogen and stored at -80°C. Total RNA was extracted from mycorrhizal samples by CTAB method  and RNA quality was assessed by gel electrophoresis prior to library preparation. Total RNA was treated with RQ-DNase (Promega, Madison, WI) to remove any traces of DNA. The transcriptomic sequence was derived from the two independent biological samples but the data sets were combined for this analysis. Analysis of the independent set indicated 94% of aspen reads detected as expressed in one replicate were detected in the other. Significance of overlap for aspen is a p-value less than 1e-256. 95% of L. bicolor reads detected as expressed in one replicate were detected in the other. Significance for overlap for L. bicolor is a p-value less than 1e-256.
Procedures described for preparation of mRNA for the mouse transcriptome analysis  were used with some modifications. Ten ug of total RNA from each sample was hybridized to Sera-mag oligo (dT) beads (Thermo Scientific) for mRNA purification. Purified mRNA was fragmented by addition of 5X fragmentation buffer (Illumina, Hayward, CA) and was heated for 5 min at 94° C in a thermocycler. First strand cDNA was synthesized using random primers to eliminate the general bias towards 3' end of the transcript. Second strand cDNA synthesis was done by adding GEX second strand buffer (Illumina, Hayward, CA), dNTPs, RNaseH and DNA Polymerase I followed by incubation for 2.5 h at 16° C. Second strand cDNA was further subjected to end repair, A-tailing, and adapter ligation in accordance with the manufacturer supplied protocols. Purified cDNA templates were enriched by PCR amplification with Phusion DNA polymerase (Illumina, Hayward, CA) and the samples were cleaned using QIAquick PCR purification columns and eluted in 30 μl EB buffer as per manufacturer's instructions (QIAGEN, CA). Purified cDNA libraries were quantified using Nanodrop spectrophotometer and loaded onto Illumina flow cells with one of the flow cells reserved for a recurrent internal standard as control for sequencing efficiency. Over 25 million sequence reads were generated for analysis.
Where μ is the average and σ is the standard deviation of a re-sampled gene model's RPKM. As this equation will fail to return a value if the average or standard deviation is equal to 0, if the μ is equal to 0, then the CND pValue is set equal to 1, else if σ is equal to 0, then the CND pValue is set equal to 0. A CND pValue close to zero indicates statistically significant levels of detected gene expression.
In our analysis, all sequence reads were trimmed to 46 bp. Optimal read length for expression analysis is impacted by system and sequence characteristics  but the 46 bp read length was empirically determined to yield good sequence alignments to L. bicolor and aspen gene models. Bowtie indexes were generated from sets of published JGI gene models for L. bicolor and aspen. The default Bowtie conditions were used to generate alignments for all sets of sequence reads to gene models, except for setting Bowtie to return all possible sequence alignments. We used 1000 iterations in 'BowStrap' for the calculation of average and standard deviations of RPKM values. The source code for a PERL implementation of 'BowStrap' is available at [http://www.bio.anl.gov/molecular_and_systems_biology/bowstrap/Bowstrap_download.html]. 'Bowtie' and 'BowStrap' output files are available at the Gene Expression Omnibus (GEO) as series record GSE28157 (http://www.ncbi.nlm.nih.gov/geo/ ).
Analysis of alignments generated with Bowtie indicated a total of 53% of the total sequence reads aligned to L. bicolor or aspen gene models. 55.2% of those aligned reads were to L. bicolor gene models and 45.8% to aspen gene models. A small fraction of the total aligned reads, 0.49%, aligned to both L. bicolor and aspen gene models. Of the entire set of aligned reads, 82% aligned to a single location with the balance aligned to two or more possible locations in the set of gene models. Alignment of reads to the genome scaffolds indicate 57% of the sequence reads aligned to L. bicolor or aspen genomic sequence. We observed that 3.2% of sequences aligned to gene models but not to genomic sequence indicating that these sequences align across gene model splice junctions. A fairly high fraction (11.1%) of sequence reads align to genomic sequence but not to gene models. These aligned sequences may indicate the presence of genes that are expressed but have not been identified in the set of gene models or errors in the published gene models. This is consistent with a recently reannotation of the L. bicolor genome which increased the number of gene models by ~6% (JGI gene portal). The revised annotation may also account for a fraction of the 36.1% of sequences that align to neither gene models nor genomic sequence. Some of these unaligned reads may come from errors or deviations in the assembled L. bicolor and aspen genomic sequences. Some fraction of the unaligned sequence reads likely derive from erroneously called or multiple splice sites in the set of gene models. Previous analysis of deep RNA sequencing in L. bicolor have identified that up to 60% of L. bicolor gene models contain at least one deviation from expressed gene sequences . The last likely source of un-aligned sequence reads is sequencing errors and the error rates inherent in Illumina sequencing have been previously well characterized [57–59]. A single insertion or deletion in a sequence read will render it un-alignable under the 'Bowtie' criteria used here. More than two substitution errors in a sequence read will also prevent it from finding an alignment. aspen The observed percentage of reads aligned at a high stringency to the gene models is comparable to similar transcriptomic experiments [60–63]. To generate a measurement of the specificity of reads that were found to align to gene models, we generated a synthetic set of reads that were the reverse, but not the reverse compliment, of all observed sequences reads (entire dataset). This synthetic set of reads retains the frequency and relative distribution of nucleotides from the observed data. Aligning the synthetic read set to gene models identifies a total of 28 (0.0001%) aligned reads. Inspection of the aligned synthetic reads indicated that they are comprised of 2 or 3-mer repeats. Alignment with the synthetic read set demonstrates that observed alignments are highly specific and miss-alignments are not expected.
In our transcriptomic data, at a CND-pValue < 0.001, for aspen there are 27318 expressed gene models out of a total possible 45555 (60.0%) gene models. For L. bicolor, there are 12856 significantly expressed gene models out of 26014 (62.4%).
Annotation to gene model predicted proteins was provided by JGI publicly available downloads using the following files: "Lacc.ecAnnot.txt" for (EC) annotation for L. bicolor best gene models, "Lbicolor_GO.txt" for Gene Ontology (GO) annotation for L. bicolor best gene models, "mntPop.ec.Annotation.txt" for Enzyme Commission (EC) annotation for aspen gene models, and "Poptr1_1.goinfo.tab" for Gene Ontology (GO) annotation for aspen gene models. The genome annotation resources used in this study were derived by a combination of automated and manual approaches that are described in the Additional material associated with the genome sequence publications.
Using a strategy employed by other investigators [64–66], we used gene models from the JGI sequenced and closely related P. trichocarpa as surrogates for the P. tremulous genes. There are 12,813 published P. tremuloides ESTs in NCBI with an average length of 472 bp, for which the longest sequence is 848 bp and the shortest sequence is 101bp. The limited number of available EST sequences does not represent a complete catalogue of P. tremuloides genes and the average sequence length limits functional prediction approaches. To verify that this is a reasonable substitution, we BLAST the P. tremuloides ESTs against the JGI predicted best gene models for P. trichocarpa. 95% of P. tremuloides ESTs hit JGI P. trichocarpa gene models at an e-value threshold of 1e-6. The average percent identity is 93% with a minimum observed 73% identity. 422 (3.3%) P. tremuloides ESTs share 100% identity with P. trichocarpa gene models. While the species are not identical and the possibility of presence of unique genes in one species cannot be completely dismissed, the level of similarity observed is suitable for a system-scale analysis of the P. tremuloides transcriptome.
10% of L. bicolor gene models have EC number annotations and 37% have GO Molecular Function annotation. 13% of aspen gene models have EC number annotations and 41% have GO Molecular Function annotations. Gene models with GO annotations amino acid transport (GO:0006865), Ammonium transporter activity (GO:0008519), Sugar porter activity (GO:0005351), and Inorganic phosphate transporter activity (GO:0005315) are the relevant transporters in the mycorrhizal metabolome.
In this manuscript, we use the term "enzyme function" to describe a specific annotation applied to an enzyme, i.e. "Phosphotransferases with an alcohol group as acceptor". We use "enzyme reactions" to refer to metabolite transformations catalyzed by an enzyme function, i.e "ATP + D-Glycerate ↔ ADP + 3-Phospho-D-glycerate". A unique enzyme function may catalyze more than one enzyme reaction and an enzyme reaction may be catalyzed by more than one unique enzyme function. A metabolite is a molecular compound that is a reactant or product in an enzyme reaction. A network, constructed from annotated unique enzyme functions, enzyme reactions and inferred metabolites, is used to generate the predicted mycorrhizal metabolome. The set of metabolic reactions in KEGG databases was used to represent the set of all possible enzyme reactions in the mycorrhizal metabolome and Enzyme Commission (EC) number annotations for enzyme function for genes models were used to assign unique enzyme functions to the predicted proteins encoded in the mycorrhizal transcriptome. A unique enzyme reaction was included in the mycorrhizal metabolome if a gene annotated to an enzyme function that catalyzed that reaction was detected as significantly expressed in the mycorrhizal transcriptome. All metabolites associated with included enzyme reactions were included in the predicted mycorrhizal metabolome.
Annotation of proteins encoded by predicted gene models were obtained from publically available data resource at the Joint Genome Institute (http://www.jgi.doe.gov/) ftp archive. Lists of reactions in the form of reactants, products, and mediating EC enzyme activities were acquired from KEGG (ftp://ftp.genome.jp/pub/kegg/pathway/). Sets of enzyme activities associated with individual KEGG pathways were also collected from the KEGG website. The KEGG database of metabolic interactions includes 1329 organism-specific metabolomes, including 14 complete plant and 51 fungal metabolomes. Of the 378 pathways in KEGG, 27 are plant-specific pathways. The set of plant and fungal metabolomes includes those specific to P. trichocarpa and L. bicolor. KEGG includes over 8000 individual metabolic reactions, between nearly 15 thousand metabolites and mediated by over 2500 specific EC activities. Every gene model in aspen or L. bicolor annotated with a specific EC activity was mapped to KEGG enzymatic reactions as described below.
Where x is equal to the total number of successes in a sample, n is equal to the sample size, p is equal to the probability of success of each sample, and Annot is a specific annotation. For the application here, the number of successes x is equal to the number of times a particular annotation Annot is present in a subset of gene models and the sample size n is the number of gene models in the subset. The probability p is the proportion of gene models in the superset attributed with annotation Annot. A CBD pValue close to zero indicates an enrichment of that annotation in the subset while a CBD pValue close to one indicates a depletion of that annotation relative to the distribution of the annotation in a larger set.
Significant enrichment of specific annotations in the sets of expressed mycorrhizal genes, relative to the total number of those specific annotations in the annotated genomes and aspen and L. bicolor, was calculated. For the calculation, the number of successes is equal to the number of genes with a specific annotation in the set of expressed gene models. The sample size is the total number of expressed gene models. The probability of success is the frequency of that specific annotation in the complete genome. Specific annotations considered were gene models with an EC activity, sugar transporters, nitrogen transporters, and inorganic phosphate transporters.
KEGG pathways were identified if they were enriched for expressed EC activities in common between the sets of aspen and L. bicolor expressed gene models. For each KEGG pathway, the number of successes is equal to the number of EC activities in the set of expressed aspen and L. bicolor gene models that map to that pathway. The number of trial is the total number of EC activities expressed in mycorrhizae by aspen and L. bicolor. The probability of success is equal to the number of EC activities associated with that KEGG pathway divided by the total number of EC activities expressed by aspen and L. bicolor. Specific KEGG pathways were also identified if they were statistically significantly enriched for mapped genes with EC activities unique to expression in either aspen or L. bicolor. In this analysis, for each KEGG pathway, the number of successes is equal to the number of gene models in the set of expressed aspen or L. bicolor gene models that map to that pathway. The number of samples is equal to the total number of expressed gene models in mycorrhizae by aspen or L. bicolor. The probability of success is equal to the number of aspen or L. bicolor gene models that map to that KEGG pathway divided by the total number of expressed gene models by aspen or L. bicolor. Enriched KEGG pathways were reported at a significance of p < 0.01, and if at least 25% of the genes or activities associated with that KEGG pathway were also in the complete genomic set of aspen or L. bicolor gene models.
The complete set of KEGG pathway EC activities was collected from ftp://ftp.genome.jp/pub/kegg/pathway/ec/. Identifying expressed activities and transported small molecules in KEGG global metabolism was performed using the KEGG PATHWAY Query tool.
Next Generation Sequencing
Joint Genome Institute
Kyoto Encyclopedia of Genes and Genomes
Woody Plant Media
Reads Per Killobase per M RNAseq reads
Cumulative Binomial Distribution
Cumulative Normal Distribution.
The submitted manuscript has been created by UChicago Argonne, LLC, Operator of Argonne National Laboratory ("Argonne"). Argonne, a U.S. Department of Energy Office of Science laboratory, is operated under Contract No. DE-AC02-06CH11357. The U.S. Government retains for itself, and others acting on its behalf, a paid-up nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Government.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.