- Research article
- Open Access
Rapid construction of metabolic models for a family of Cyanobacteria using a multiple source annotation workflow
© Mueller et al.; licensee BioMed Central Ltd. 2013
- Received: 25 September 2013
- Accepted: 19 December 2013
- Published: 27 December 2013
Cyanobacteria are photoautotrophic prokaryotes that exhibit robust growth under diverse environmental conditions with minimal nutritional requirements. They can use solar energy to convert CO2 and other reduced carbon sources into biofuels and chemical products. The genus Cyanothece includes unicellular nitrogen-fixing cyanobacteria that have been shown to offer high levels of hydrogen production and nitrogen fixation. The reconstruction of quality genome-scale metabolic models for organisms with limited annotation resources remains a challenging task.
Here we reconstruct and subsequently analyze and compare the metabolism of five Cyanothece strains, namely Cyanothece sp. PCC 7424, 7425, 7822, 8801 and 8802, as the genome-scale metabolic reconstructions i Cyc792, i Cyn731, i Cyj826, i Cyp752, and i Cyh755 respectively. We compare these phylogenetically related Cyanothece strains to assess their bio-production potential. A systematic workflow is introduced for integrating and prioritizing annotation information from the Universal Protein Resource (Uniprot), NCBI Protein Clusters, and the Rapid Annotations using Subsystems Technology (RAST) method. The genome-scale metabolic models include fully traced photosynthesis reactions and respiratory chains, as well as balanced reactions and GPR associations. Metabolic differences between the organisms are highlighted such as the non-fermentative pathway for alcohol production found in only Cyanothece 7424, 8801, and 8802.
Our development workflow provides a path for constructing models using information from curated models of related organisms and reviewed gene annotations. This effort lays the foundation for the expedient construction of curated metabolic models for organisms that, while not being the target of comprehensive research, have a sequenced genome and are related to an organism with a curated metabolic model. Organism-specific models, such as the five presented in this paper, can be used to identify optimal genetic manipulations for targeted metabolite overproduction as well as to investigate the biology of diverse organisms.
- Semi-automated metabolic network reconstruction
- Metabolic modeling
- Genome-scale metabolic model
Genome-scale models (GSMs) are the collection of gene to protein to reaction associations (GPRs), charge and elementally balanced reactions, and constraints on molecular functions found within a cell [1–4]. The constraints placed on molecular function define the possible phenotypes of an organism under specific conditions . There are a number of applications for GSMs beyond the prediction of wildtype phenotypes in varying environments. These include the identification of optimal gene and medium modifications, non-native routes for metabolite production, and lethal gene deletions [5–9]. A genome-scale model of Cyanothece ATCC 51142, i Cyt773, was recently published . It contains four compartments, with 811 metabolites and 946 charge and elementally balanced reactions. i Cyt773 is an improvement upon the previously published i Cce806 model , and contains 43 genes and 266 reactions unique from i Cce806 . Further comparison of the two models can be found in the work by Saha et al. . i Cyt773 also models the diurnal rhythm of the Cyanothece metabolism. Since Cyanothece ATCC 51142 is closely related to all five Cyanothece species discussed in this paper , it was used in the development of the reconstructions for five organisms, Cyanothece PCC 7424, 7425, 7822, 8801, and 8802, as i Cyc792, i Cyn731, i Cyj826, i Cyp752, and i Cyh755 respectively (all five models are included in Additional files 1 and 2). All models were named using their associated KEGG organism code. The objective of this study is to expediently generate models for a collection of members of a genus, using as a foundation an existing high-quality metabolic model for a representative member of the genus, while integrating information from a range of available sources.
The genus Cyanothece belongs to the phylum of Cyanobacteria. Cyanobacteria have a number of properties that make them ideal candidates for bio-production. Photosynthetic cyanobacteria bypass the need for sugar carbon substrates while having higher solar energy conversion efficiencies (i.e., 3-9%) than C3 (2.4%) and C4 plants (3.7%) . Cyanothece generate not only hydrogen [12, 14–16] but also fix atmospheric nitrogen by temporally segregating the photosynthesis and nitrogenase activities [17, 18]. In addition, Cyanothece possess the potential to grow in air and can be easily fixed to solid matrices . All five species discussed in this paper are capable of fixing nitrogen and producing hydrogen, while Cyanothece PCC 7425 is the only species that is not capable of accomplishing this task in an aerobic environment . 7425 also varies in a number of physical characteristics, enough so that it has been suggested that it should be reclassified to another genus pending further review .
Cyanothece PCC 7424, 7425, 7822, 8801, and 8802, were all sequenced following the promising discoveries made concerning the metabolic capabilities of Cyanothece ATCC 51142 . These five species exhibit unique metabolic characteristics that motivated the development of five separate reconstructions. Fragments of a butanol producing pathway have been postulated to exist in all strains through an inspection of the Cyanothece genomes . Metabolic capabilities such as the alkane biosynthetic pathway and alternative pathways for breaking down arginine across species  have been hypothesized to exist as well. Given differences in metabolism, developed genetic systems , and variations in growth characteristics, phenotype, and rhythms of nitrogen fixation and respiration , it is important to globally assess the metabolic repertoire of each strain separately.
There exist numerous databases devoted to gene annotations for a wide variety of organisms [24–27]. However, the number of gene annotations is skewed towards a handful of extensively studied organisms. Escherichia coli K-12, the strain modeled in the i AF1260 metabolic reconstruction , has approximately 16 times the number of reviewed annotations (4,326) in the Universal Protein Resource (Uniprot) compared to Cyanothece PCC 7424 (271) . For most (microbial) organisms Uniprot contains only a small subset of required gene annotations (i.e., 200–300). Faced with this paucity of organism-specific gene annotation information, most metabolic reconstructions rely on a single database (i.e., typically KEGG) from which to pull gene annotations [24, 29–31]. This may introduce errors in the reconstruction as absent functionalities could be included in the model due to permissive homology cutoffs or errors in the original annotation source. In addition, specific and non-specific references to the same metabolite (e.g. D-Glucose vs. α-D-Glucose) and generic or unbalanced reactions  may also affect the consistency of the reconstruction. Integrating and contrasting information from multiple databases can remedy many of these shortcomings.
A systematic workflow is put forth that addresses the aforementioned challenges. It allows for the parallel reconstruction of genome-scale models for organisms that have a sequenced genome and are closely related to a species with a curated genome-scale model. Using this workflow, reconstructions were developed for all five Cyanothece species using i Cyt773 and reviewed annotations from Uniprot , NCBI Protein Clusters , and the Rapid Annotations using Subsystems Technology (RAST) method . These annotations were used to retrieve charge and elementally balanced reactions from both the i Cyt773 model and the SEED database  for the construction of draft models. No reconciliation between the i Cyt773 and SEED reactions or metabolites was required as i Cyt773 was initially constructed using SEED notation when possible. The five models are all capable of producing biomass using the i Cyt773 biomass equations under diverse nutrient conditions. All five models are free of thermodynamically infeasible cycles, and the fractions of reactions mapped to specific genes (i.e., GPRs) are within the range of manually curated reconstructions. The use of multiple annotation sources helps to mitigate errors that may arise from a single source. Unlike automated draft models (i.e., Model SEED ), organism-specific metabolites such as pigments are included in the biomass equation and light reactions are fully traced. This workflow is also more adept at excluding metabolites present in related species but absent in the reconstructed organism. For example, menaquinone and ubiquinone are known to not exist within Cyanothece, but are often pulled into draft models generated by automated software.
Statistics for the five developed models: genes, reactions, and metabolites for each of the five models are listed, along with reactions that are unique to that reconstruction
Strain - reconstruction
Model validation using published findings
The effect of a gene knockout on an organism’s phenotype is frequently used in assessing GSM quality [10, 38]. However, unlike the CyanoMutants database for Synechocystis PCC 6803 [39, 40], none of the five species have a detailed repository of known mutants. The ΔnifK mutant for Cyanothece 7822 was shown to not be able to grow without the presence of combined nitrogen (nitrate) . This finding implies the critical involvement of nifK in the fixation of nitrogen. In i Cyj826 this gene is involved in the GPR of the nitrogen fixation reaction present within the model. Given that the GPR describes nifK as one of three critical subunits of the enzyme, its deletion results in the inability for that reaction to carry flux. Upon its removal from i Cyj826, the model is unable to grow without the addition of nitrate or ammonium, showing that the model reacts to the knockout in the same manner as the organism does in vivo.
Despite the many similarities between the five species, significant differences also exist . Genes that code for isocitrate lyase and malate synthase (glyoxylate shunt) are present only in Cyanothece 7424 and 7822 as reflected in the models. 2-oxoglutarate decarboxylase and succinic semialdehyde dehydrogenase, found in many cyanobacteria, complete the TCA cycle despite the absence of 2-oxoglutarate dehydrogenase . Both of the enzymes in the alternate pathway are present within i Cyt773, and were transferred to all five models. The associated genes are also bidirectional best hits with the two genes in Synechococcus PCC 7002 that are associated with the aforementioned enzymes . i Cyn731, i Cyp752, and i Cyh755 all contain an alkane biosynthetic pathway similar to what is present within i Cyt773. While i Cyt773 contains the pathway that enables the production of pentadecane from hexadecenoyl-ACP, Schirmer et al. have measured heptadecane but not pentadecane production from Cyanothece 7425 . i Cyn731 contains only heptadecane production, while i Cyp752 and i Cyh755 contain pathways for both pentadecane and heptadecane (no specific literature evidence neither in support nor in conflict with this was found). The two enzymes required, hexadecenoyl-ACP reductase and hexadecenal decarbonylase (enzyme commision (EC) numbers 188.8.131.52 and 184.108.40.206 respectively per i Cyt773), have no corresponding annotations or orthologous genes in Cyanothece 7424 or 7822 .
Polyhydroxyalkanoates (PHAs) are a complex family of polyesters that can be synthesized by a wide variety of bacteria . Cyanothece 7424, 7425, and 7822 all contain the enzymes keto-thiolase and acetoacetyl-CoA reductase, which are necessary for the synthesis of polyhydroxyalkanoic acids [43–45]. There are RAST and unreviewed Uniprot annotations that identify genes within each of these three organisms associated with a PHA synthase. The non-fermentative pathway for higher alcohols exist in the 7424, 8801, and 8802 strains . The same pathway has been seen in E. coli[46, 47] after the addition of the kivD gene from Lactococcus lactis and the adh2 gene from Saccharomyces cerevisiae. The pathway uses the 2-keto acid intermediates of amino acid biosynthesis and diverts them towards the synthesis of alcohols . The kivD gene codes for a 2-keto acid decarboxylase that acts on a wide range of substrates and enables the conversion of the 2-keto acids into aldehydes. The workflow identified genes in Cyanothece 7424, 8801, and 8802 which are bidirectional best hits with the kivD gene from Lactococcus lactis, and also annotated as being associated with the same EC number as kivD. An alcohol dehydrogenase, such as adh2, then converts these aldehydes into alcohols. The adhA gene (slr1192) in Synechocystis PCC 6803 has been found to have wide substrate specificity that includes the aldehydes associated with butanol and propanol . All five species contained a gene that was a bidirectional best hit with slr1192. While both the forward and reverse BLAST searches for Cyanothece 7425 had e-values on the order of 10-28 and percent identities of 30%, the searches, both forward and reverse, for the other four organisms had e-values ranging between 10-138 and 10-153 with percent identities ranging from 58 to 61%. The presence of orthologs to both a 2-keto acid decarboxylase and alcohol dehydrogenase with wide ranges of specificity in Cyanothece 7424, 8801, and 8802 provides annotation evidence for the hypothesized presence of non-fermentative higher alcohol pathways .
Significant variations in nitrogen metabolism between the five species has been documented . Arginine decarboxylase is present in all five reconstructions, but differences arise in the subsequent agmatine catabolism. Cyanothece 51142 does not contain the associated genes for the conversion of agmatine to putrescine, and this is reflected in the i Cyt773 model [10, 12] as these reactions are absent. Both i Cyc792 and i Cyj826 contain agmatinase and urease. The proposed pathway for agmatine breakdown into putrescine in Cyanothece 7425, 8801, and 8802 is through N-carbamoylputrescine. The two reactions required for this degradation can be found in all three associated models. Finally, as predicted by Bandyopadhyay et al. , i Cyc792, i Cyj826, i Cyp752, and i Cyh755 contain the reactions required to break putrescine down into spermidine and spermine.
Validation of proposed reconstruction workflow
The proposed workflow also served to complete unfinished pathways from i Cyt773. All five models are capable of converting galactose-1-phosphate to fructose-6-phosphate as in i Cyt773. Three of the models, i Cyn731, i Cyj826, and i Cyh755, also include the reaction that converts galactose into galactose-1-phosphate, enabling them to process galactose in the glycolysis pathway. Tetrahydrobiopterin (BH4) is a pteridine compound that acts as a cofactor for nitric oxide synthases and aromatic amino acid hydrolases in higher animals . Pteridine glycosides have been found in cyanobacteria, although their function is still unknown , and the first isolated pteridine glycosyltransferase from Synechococcus PCC 7942 acted on BH4 . Even though i Cyt773 does not contain the complete BH4 pathway as described by Thony et al. , our workflow completed the pathway in all five species, identifying a gene that is a bidirectional best hit with the gene in Synechococcus PCC 7942. The reaction was not included in the models, as it does not exist within the SEED reaction database. All enzymes that were retrieved from annotations but were not included in the model because of a lack of associated reaction in the subset of the SEED database used for model development are listed in Additional file 3.
Reactions not transferred from i Cyt773 offer insight into divergences between the metabolism of the new organism and the reference model. Two of the reactions that were not transferred from i Cyt773 to the models for Cyanothece 7424 and 7822 are responsible for the conversion of hexa- or octadecenoyl-ACP to n-hepta or pentadecane. As previously mentioned it is accepted that the alkane biosynthetic pathway does not exist in these organisms . Another compound that is generally not found in the five species is xanthine, a purine base involved in the breakdown of purine ribonucleotides such as inosine-5′-phosphate and xanthosine-5′-phosphate, into uric acid. i Cyt773 can produce xanthine from either hypoxanthine or xanthosine, i Cyc792 only contains the reactions for production from xanthosine and cannot break xanthine down into uric acid. i Cyn731 only contains the reactions for production from hypoxanthine, but can convert xanthine into uric acid. The other three species do not contain any reactions involving xanthine and thus cannot process purine ribonucleotides through this pathway. Six reactions involved in transporting metabolites between the cytoplasm and periplasm or extracellular space were not transferred, such as molybdate transport via the ABC system. Given the likelihood that such reactions still exist within the other Cyanothece strains, it is possible that the associated GPR in i Cyt773 should be reevaluated for these reactions.
Comparisons with other model development methods
Current model development methods can be generally characterized as manual, semi-automated, or automated. The workflow presented in this paper is best classified as semi-automated. This workflow allows for more expedited model development while avoiding some of the sources of error plaguing automated model generation and allowing for a wide range of customization. This workflow can be adapted for use with any models, annotation sources, and additional reaction sets given annotation availability and user preferences.
Many draft models are nowadays generated through the identification and comparison of homologs with the GPRs of curated models [58–61]. Hamilton et al. identified the possibility for bidirectional BLAST searches to identify false positive ortholog pairs . The e-value cutoff for the searches performed for the test was 10-5. Here we use a more conservative cutoff of 10-30 to safeguard against such instances. When the cutoff was relaxed from 10-30 to 10-5 for the bidirectional BLAST between Cyanothece 51142 and the five species there were between 280 and 403 additional best hit pairs for each of the organisms. The number of these pairs that involved genes present in i Cyt773 varied between 15 for Cyanothece 7424 and 8801, and 26 for Cyanothece 7425. The reliance of manually constructed models on reviewing every annotation and manually curating the model greatly increases the time spent on development. This workflow helps to mitigate the need for manual review of each annotation by only using annotations that are reviewed or are derived from reviewed sources. Manual curation can then be reserved for certain key steps. Some of these models only include additional reactions beyond those retrieved from the curated models if the reactions are required for biomass production [58, 60, 61]. This restricts the inclusion of reactions unique to either that organism or a subset of organisms that the reference models do not belong to. This introduces the risk of not including secondary metabolism pathways, which could be of great interest. The workflow presented here aims to overcome this through the use of outside annotations to retrieve SEED reactions.
There are a number of approaches for the automated development of metabolic reconstructions [35, 62–64] affording significant gains in development time, however, at the expense of some omissions and erroneous additions. The Cyanothece models created using the MIRAGE method contain generalized lipids along with a non-specific acceptor metabolite . Both the KBase and MIRAGE models constructed for Cyanothece 7424 contain menaquinone and ubiquinone, compounds shown to not exist within that organism . Conversely, there are a number of metabolites present in the biomass composition of the five reconstructed models that do not exist within either in the KBase or MIRAGE models (i.e., 22 specific lipid metabolites, 4 pigments and cyanophycin). The model produced through KBase also does not contain the pigment β-carotene. Many of these models do not have specified compartments apart from cytoplasm and extracellular space [35, 62, 64]. Automated model development can exclude unique metabolic pathways if they are absent from the training set of models. Specifically, both the MIRAGE and KBase models generally lack light reactions.
Other methods that combine manual and automated steps provide their own distinct approach to model reconstruction. The RAVEN toolbox  allows for the curation of a reconstruction from models of related species using homologs identified through BLAST bidirectional best hits, and additional unique functions supplied through annotations taken from KEGG Orthology . This method was employed for the construction of the Penicillium chrysogenum model i AL1006 . Our workflow can currently pull from up to three sources, with the ability to quickly expand the number of sources sampled, resulting in more identified EC numbers with higher confidence.
In this paper we presented a workflow that was used to rapidly develop curated models for five Cyanothece strains using the previously published i Cyt773 model and reviewed annotations from numerous sources. The comparisons between these five models line up with the established phylogenetic relationships between the species. Specific reactions that were both kept from being taken from i Cyt773 or added from the SEED database demonstrate the efficacy of this workflow and provide insights into the metabolism of the five species, as well as suggesting areas for further curation in the i Cyt773 model. This workflow can easily be adapted to work with other metabolic models, annotation sources, and reaction databases. All five models (i Cyc792, i Cyn731, i Cyj826, i Cyp752, and i Cyh755) are included in the supplementary material.
Draft model development
Reviewed annotations retrieved from Uniprot , NCBI Protein Clusters , and RAST , are used to support the inclusion of additional reactions into the draft models. An automated process was used to retrieve annotations that reference specific EC numbers, along with the EC numbers associated with the reactions retrieved using bidirectional BLAST. Only specific EC numbers were used to avoid the inclusion of unnecessary reactions. For some genes the annotations are inconsistent. These discrepancies are resolved through a manual multi-step procedure shown in Figure 4. First the EC numbers are checked to confirm that they have not been transferred to a new number. An example of this transfer of EC numbers can be seen with the annotations for the Cyanothece 7424 gene PCC7424_1895. Both Uniprot and NCBI Clusters assigned the EC 220.127.116.11 to the gene, whereas the RAST method assigned the EC number 18.104.22.168. Despite the apparent mismatch, EC 22.214.171.124 had previously been transferred to 126.96.36.199, resolving any conflict between the annotations. If the enzymes are uniquely classified, a search of literature, specifically the InterPro database , is then performed to validate their existence (or non-existence) in the organism. The Cyanothece 7424 gene PCC7424_2477 has an associated annotation of 188.8.131.52 from i Cyt773, whereas RAST assigns both 184.108.40.206 and 220.127.116.11 to the gene. InterPro states that the 18.104.22.168 enzyme belongs to a protein family that is found in hyperthermophilic archaea, thus ruling out its existence in Cyanothece 7424. After using the InterPro information to rule out a possible associated enzyme, the annotation is resolved through order of confidence (described below), and 22.214.171.124 is attributed to the gene. Next, any enzymes that are associated with generic metabolites, or metabolites known to not be found within the organism, are removed. Such filtering can be seen with the Cyanothece 7425 gene Cyan7425_1569. Both the model and RAST annotation suggest that succinate dehydrogenase (126.96.36.199) is associated with this gene. However NCBI Protein Clusters suggests enzyme 188.8.131.52, which is a succinate dehydrogenase specific to ubiquinone. As ubiquinone is not present within Cyanothece, this conflict is resolved. The list of all reactions removed from each model for containing generic metabolites is included in Additional file 4. If discrepancies still exist, annotation resolutions are made based on a confidence order of i Cyt773, Uniprot, NCBI, and RAST. The order of confidence is derived from the likelihood that a source has been manually reviewed and is applicable to the individual gene in question. i Cyt773 GPR relationships were curated specifically for a Cyanothece model. Uniprot reviewed annotations are manually annotated individually , while the protein cluster annotations used in this study are curated as a group of related genes , and RAST annotations are developed using the manually curated FIGfams [33, 67]. Lower confidence is placed in these annotations, as it is possible that the automated RAST program could improperly assign annotations in some cases. If all of the enzymes proposed by the other annotation sources are contained within the list of enzymes found to relate to the gene through inspection of i Cyt773, the annotation is not listed as conflicting and the enzymes from the model are used. There were on average between 40 and 50 genes with conflicting annotations. Between 55 and 70% of conflicts required order of confidence to resolve. Using multiple sources allows for the identification of probable errors in the databases. These annotations can also reveal errors in other databases not used in the model development. One such example is gene PCC7424_2817 in the Cyanothece 7424 genome. All sources used in this paper, along with KEGG Orthology , indicate that the enzyme associated with this gene is 2-succinyl-5-enolpyruvyl-6-hydroxy-3-cyclohexene-1-carboxylic-acid synthase (EC 184.108.40.206). Both the KEGG and REFSEQ  annotations list the same enzyme name, but list the EC number as 220.127.116.11 (associated with 2-oxoglutarate decarboxylase).
Subsequently, this resolved list of EC numbers is referenced against the i Cyt773 model. Reactions with a matching EC number are retained, and the remaining EC numbers are used to retrieve reactions from the SEED database . Reactions are only taken from the subset used by the SEED service for GapFilling , as these reactions are confirmed to be charge and elementally balanced. Those EC numbers that did not have an associated reaction within this set of SEED reactions and were therefore not included within the models are compiled in Additional file 3. All duplicate reactions retrieved from i Cyt773 are removed while the remaining reactions necessary for photosynthesis are included. These reactions are known to exist within the organisms, as they can grow autotrophically. Any oxidative phosphorylation reactions or diffusion transport reactions that had not previously been added to the model are appended given their obvious essentiality. This set of reactions constitutes the draft model. All steps in draft model development are automated except for the EC annotation reconciliation. The time required to complete this step is reduced as more models are developed, and results can be applied to related organisms.
Biomass and removal of thermodynamically infeasible cycles
The four biomass descriptions developed for the i Cyt773 model were used in the five models . Initially, all draft models were not capable of producing biomass. A subset of reactions from iCyt773 needed to be included in the draft models to allow for the generation of biomass. A mixed integer linear program was used to determine the minimal set of additional reactions required for the production of biomass. All alternative solutions within two reactions of the global minimum were found, and every reaction was examined for evidence suggesting its existence within the organism. Given the necessity of their inclusion for biomass production even reactions with no identified evidence were included in the models. In situations with several alternate solutions, the solution that contained the most reactions with evidence for their inclusion was chosen. Necessary reactions, which could not have previously been included in the models as they did not have associated enzymes or genes, were added at this point. Between three and eight reactions with a GPR in i Cyt773 that did not have direct literature or annotation evidence were included in order to produce biomass. A substantial number of these reactions did not have both a gene and enzyme associated in i Cyt773, which would lower their chance to be included during the initial stages of draft model development (See Additional file 5 for a full list of reactions included in this step). While the initial reaction set was generated for the production of 1% of the maximum biomass when all i Cyt773 reactions were included, the inclusion of two reactions expected to be present in all models, the exchange reaction for oxygen and the diffusive transport of carbon dioxide between the periplasm and cytoplasm, allowed for biomass production exceeding 90% of the maximum. The 7425 model requires an additional two reactions to produce maximum biomass, but the other four models are capable of such production with the addition of the carbon dioxide transport and oxygen exchange reactions. This process was performed for both autotrophic and heterotrophic growth conditions. For autotrophic growth, 16 reactions were added to i Cyc792, 24 to i Cyn731, and 18 to i Cyj826, i Cyp752, and i Cyh755. The same approach was used for heterotrophic growth, where only i Cyn731 required the inclusion of one reaction to grow under heterotrophic conditions.
The models were further modified to avoid the presence of thermodynamically infeasible cycles. Flux variability analysis was performed to identify unbounded reaction fluxes. Given the absence of thermodynamically infeasible cycles within i Cyt773, added reactions from SEED were solely responsible for the creation of any cycles. The number of SEED reactions present in cycles varied between 39 in i Cyh755 and 51 in both i Cyn731 and i Cyj826. Three steps were taken to modify the SEED reactions involved in the cycles. First the Gibbs free energy values provided by SEED were examined. Any reactions where the entire free energy value range, factoring in error, was more than 4 kcal/mol removed from zero was restricted to the directionality specified by Gibbs free energy. Any SEED reactions whose fluxes still hit the bounds were restricted to the direction opposite of the cycle. The annotations of the few SEED reactions that were still involved in cycles were inspected. All of these reactions were supported solely by RAST annotations. Given this lower confidence due to the single-source annotation, the reactions (between four and ten for each model) were removed. Additional file 6 lists all reaction modifications made to eliminate the cycles.
GPR relationships were primarily derived from either the previous bidirectional BLAST analysis of i Cyt773 reactions or the analysis of retrieved annotations. Bidirectional best hits were previously used to evaluate the presence of each reaction in the new organism. If a reaction is added to the model, the GPR for every isozyme or complete subunit that is present is translated to the list of genes for the new organism.
The GPR relationships for reactions retrieved from SEED were developed by applying the Autograph method . All genes that were linked to an enzyme through an annotation were used for the GPR for each reaction associated with that enzyme. If there are RAST annotations for each of these genes with the correct EC annotation, then they are used for the comparison. For all five species there were no ECs for which this was not the case. Genes that shared the same annotation designation were determined to be isozymes while those with different names were seen to be subunits of a protein. There is a small subset of reactions in the models that were taken from i Cyt773 because of either proof of their existence (e.g. photosynthetic reactions) or their requirement for biomass production. Many of these GPR relationships are missing a small number of bidirectional best hits. For these genes the BLAST cutoff was reduced to 10-10. These few additional best hits aided in the resolution of many of the remaining reactions, leaving between six and thirteen of the reactions without a transferred GPR.
Model simulations and analysis
Flux balance analysis  was used in both the model development and model validation phases to determine flux distribution under varying conditions.
Maximize v Biomass
Where Sij is the stoichiometric coefficient for metabolite i in reaction j, vj,min and vj,max denote the minimum and maximum flux values for reaction j, while vj represents the flux value of reaction j. N and M denote the total number of metabolites and reactions respectively.
All reactions were assigned a binary variable yj, which when equal to zero eliminates flux through reaction j. The value of y for all reactions present in the draft model was fixed at one. Biomass production was fixed at greater than 1% of the maximum value when all i Cyt773 reactions were included, and the number of included reactions was minimized.
No constraints were placed on the biomass growth so as to identify all possible cycles within the model. This analysis was performed iteratively after each series of modifications was made to the reactions present within the cycles.
A denotes the total number of shared reactions between the two organisms, whereas B and C represent the number of unique reactions in each model.
CPLEX solver (version 12.3 IBM ILOG) was used in the GAMS (version 23.3.3, GAMS Development Corporation) environment for solving the optimization models. All computations were carried out on Intel Xeon X5675 Six-Core 3.06 GHz processors that are a part of the lionxf cluster, which was built and operated by the Research Computing and Cyberinfrastructure Group of The Pennsylvania State University. All codes used in model development were written using the Python programming language.
Supported by funding from the Office of Science (BER), U.S. Department of Energy to Drs. Costas D. Maranas and Himadri B. Pakrasi, grant DE-SC0006870.
- Thiele I, Palsson BO: A protocol for generating a high-quality genome-scale metabolic reconstruction. Nat Protoc. 2010, 5: 93-121. 10.1038/nprot.2009.203.PubMedPubMed CentralView ArticleGoogle Scholar
- Price ND, Reed JL, Palsson BO: Genome-scale models of microbial cells: evaluating the consequences of constraints. Nat Rev Microbiol. 2004, 2: 886-897. 10.1038/nrmicro1023.PubMedView ArticleGoogle Scholar
- Feist AM, Herrgård MJ, Thiele I, Reed JL, BO P: Reconstruction of Biochemical Networks in Microbial Organisms. Nat Rev Microbiol. 2009, 7: 129-143.PubMedPubMed CentralView ArticleGoogle Scholar
- Reed JL, Famili I, Thiele I, Palsson BO: Towards multidimensional genome annotation. Nat Rev Genet. 2006, 7: 130-141. 10.1038/nrg1769.PubMedView ArticleGoogle Scholar
- Zomorrodi AR, Suthers PF, Ranganathan S, Maranas CD: Mathematical optimization applications in metabolic networks. Metab Eng. 2012, 14: 672-686. 10.1016/j.ymben.2012.09.005.PubMedView ArticleGoogle Scholar
- Carneiro S, Rocha I, Ferreira E: Application of a genome-scale metabolic model to the inference of nutritional requirements and metabolic bottlenecks during recombinant protein production in Escherichia coli. Microb Cell Factories. 2006, 5: P52-10.1186/1475-2859-5-S1-P52.View ArticleGoogle Scholar
- Ranganathan S, Maranas CD: Microbial 1-butanol production: Identification of non-native production routes and in silico engineering interventions. Biotechnol J. 2010, 5: 716-725. 10.1002/biot.201000171.PubMedView ArticleGoogle Scholar
- Ranganathan S, Suthers P, Maranas CD: OptForce: An optimization procedure for identifying all genetic manipulations leading to targeted overproductions. PLoS Comput Biol. 2010, 6: e1000744-10.1371/journal.pcbi.1000744.PubMedPubMed CentralView ArticleGoogle Scholar
- Suthers PF, Zomorrodi A, Maranas CD: Genome-scale gene/reaction essentiality and synthetic lethality analysis. Mol Syst Biol. 2009, 5: 301-PubMedPubMed CentralView ArticleGoogle Scholar
- Saha R, Verseput AT, Berla BM, Mueller TJ, Pakrasi HB, Maranas CD: Reconstruction and Comparison of the Metabolic Potential of Cyanobacteria Cyanothece sp. ATCC 51142 and Synechocystis sp. PCC 6803. PloS one. 2012, 7: e48285-10.1371/journal.pone.0048285.PubMedPubMed CentralView ArticleGoogle Scholar
- Vu T, Stolyar S, Pinchuk G, Hill E, Kucek LA, Brown R, Lipton M, Osterman A, Fredrickson J, Konopka A: Genomescale modeling of light-driven reductant partitioning and carbon fluxes in diazotrophic unicellular cyanobacterium Cyanothece sp. ATCC 51142. PLoS Comput Biol. 2012, 8: e1002460-10.1371/journal.pcbi.1002460.PubMedPubMed CentralView ArticleGoogle Scholar
- Bandyopadhyay A, Elvitigala T, Welsh E, Stöckel J, Liberton M, Min H, Sherman LA, Pakrasi HB: Novel metabolic attributes of the Genus Cyanothece, comprising a group of unicellular nitrogen-fixing cyanobacteria. mBio. 2011, 2: e00214-PubMedPubMed CentralView ArticleGoogle Scholar
- Dismukes GC, Carrieri D, Bennette N, Ananyev GM, Posewitz MC: Aquatic phototrophs: efficient alternatives to land-based crops for biofuels. CurrOpin Biotechnol. 2008, 19: 235-240.Google Scholar
- Tamagnini P, Axelsson R, Lindberg P, Oxelfelt F, Wunschiers R, Lindblad P: Hydrogenases and hydrogen metabolism of cyanobacteria. Microbiol Mol Biol Rev: MMBR. 2002, 66: 1-20. 10.1128/MMBR.66.1.1-20.2002. table of contentsPubMedPubMed CentralView ArticleGoogle Scholar
- Min H, Sherman LA: Hydrogen production by the unicellular, diazotrophic cyanobacterium Cyanothece sp. strain ATCC 51142 under conditions of continuous light. Appl Environ Microbiol. 2010, 76: 4293-4301. 10.1128/AEM.00146-10.PubMedPubMed CentralView ArticleGoogle Scholar
- Melnicki MR, Pinchuk GE, Hill EA, Kucek LA, Fredrickson JK, Konopka A, Beliaev AS: Sustained H(2) production driven by photosynthetic water splitting in a unicellular cyanobacterium. mBio. 2012, 3: e00197-00112.PubMedPubMed CentralView ArticleGoogle Scholar
- Welsh EA, Liberton M, Stockel J, Loh T, Elvitigala T, Wang C, Wollam A, Fulton RS, Clifton SW, Jacobs JM, et al: The genome of Cyanothece 51142, a unicellular diazotrophic cyanobacterium important in the marine nitrogen cycle. Proc Natl Acad Sci USA. 2008, 105: 15094-15099. 10.1073/pnas.0805418105.PubMedPubMed CentralView ArticleGoogle Scholar
- Stockel J, Jacobs JM, Elvitigala TR, Liberton M, Welsh EA, Polpitiya AD, Gritsenko MA, Nicora CD, Koppenaal DW, Smith RD, Pakrasi HB: Diurnal rhythms result in significant changes in the cellular protein complement in the cyanobacterium Cyanothece 51142. PloS one. 2011, 6: e16680-10.1371/journal.pone.0016680.PubMedPubMed CentralView ArticleGoogle Scholar
- Hall D, Markov S, Watanabe Y, Rao K: The potential applications of cyanobacterial photosynthesis for clean technologies. Photosynth Res. 1995, 46: 159-167. 10.1007/BF00020426.PubMedView ArticleGoogle Scholar
- Porta D, Rippka R, Hernandez-Marine M: Unusual ultrastructural features in three strains of Cyanothece (cyanobacteria). Arch Microbiol. 2000, 173: 154-163. 10.1007/s002039900126.PubMedView ArticleGoogle Scholar
- Wu B, Zhang B, Feng X, Rubens JR, Huang R, Hicks LM, Pakrasi HB, Tang YJ: Alternative isoleucine synthesis pathway in cyanobacterial species. Microbiology. 2010, 156: 596-602. 10.1099/mic.0.031799-0.PubMedView ArticleGoogle Scholar
- Min H, Sherman LA: Genetic transformation and mutagenesis via single-stranded DNA in the unicellular, diazotrophic cyanobacteria of the genus Cyanothece. Appl Environ Microbiol. 2010, 76: 7641-7645. 10.1128/AEM.01456-10.PubMedPubMed CentralView ArticleGoogle Scholar
- Bandyopadhyay A, Elvitigala T, Liberton M, Pakrasi HB: Variations in the rhythms of respiration and nitrogen fixation in members of the unicellular diazotrophic cyanobacterial genus Cyanothece. Plant Physiol. 2012, 161: 1334-1346.PubMedPubMed CentralView ArticleGoogle Scholar
- The Uniprot Consortium: Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res. 2012, 40: D71-D75.PubMed CentralView ArticleGoogle Scholar
- Gillespie JJ, Wattam AR, Cammer SA, Gabbard JL, Shukla MP, Dalay O, Driscoll T, Hix D, Mane SP, Mao C, et al: PATRIC: the comprehensive bacterial bioinformatics resource with a focus on human pathogenic species. Infect Immun. 2011, 79: 4286-4298. 10.1128/IAI.00207-11.PubMedPubMed CentralView ArticleGoogle Scholar
- Kanehisa M, Goto S: KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000, 28: 27-30. 10.1093/nar/28.1.27.PubMedPubMed CentralView ArticleGoogle Scholar
- Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M: KEGG for integration and interpretation of large-scale molecular datasets. Nucleic Acids Res. 2012, 40: D109-D114. 10.1093/nar/gkr988.PubMedPubMed CentralView ArticleGoogle Scholar
- Feist AM, Henry C, Reed JL, Krummenacker M, Joyce A, Karp P, Broadbelt L, Hatzimanikatis V, Palsson BO: A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Mol Syst Biol. 2007, 3: 121-PubMedPubMed CentralView ArticleGoogle Scholar
- Balagurunathan B, Jonnalagadda S, Tan L, Srinivasan R: Reconstruction and analysis of a genome-scale metabolic model for Scheffersomyces stipitis. Microb Cell Fact. 2012, 11: 27-10.1186/1475-2859-11-27.PubMedPubMed CentralView ArticleGoogle Scholar
- Dal'Molin CG, Quek LE, Palfreyman RW, Nielsen LK: AlgaGEM--a genome-scale metabolic reconstruction of algae based on the Chlamydomonas reinhardtii genome. BMC genomics. 2011, 12 (4): S5-PubMedGoogle Scholar
- Licona-Cassani C, Marcellin E, Quek L, Jacob S, Nielsen L: Reconstruction of the Saccharopolyspora erythraea genome-scale model and its use for enhancing erythromycin production. Antonie Van Leeuwenhoek. 2012, 102: 493-502. 10.1007/s10482-012-9783-2.PubMedView ArticleGoogle Scholar
- Klimke W, Agarwala R, Badretdin A, Chetvernin S, Ciufo S, Fedorov B, Kiryutin B, O’Neill K, Resch W, Resenchuk S: The National Center for Biotechnology Information's Protein Clusters Database. Nucleic Acids Res. 2009, 37: D216-D223. 10.1093/nar/gkn734.PubMedPubMed CentralView ArticleGoogle Scholar
- Aziz R, Bartels D, Best A, DeJongh M, Disz T, Edwards R, Formsma K, Gerdes S, Glass E, Kubal M, et al: The RAST Server: rapid annotations using subsystems technology. BMC genomics. 2008, 9: 75-10.1186/1471-2164-9-75.PubMedPubMed CentralView ArticleGoogle Scholar
- Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, Cohoon M, de Crécy-Lagard V, Diaz N, Disz T, Edwards R, et al: The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 2005, 33: 5691-5702. 10.1093/nar/gki866.PubMedPubMed CentralView ArticleGoogle Scholar
- Henry CS, DeJongh M, Best AA, Frybarger PM, Linsay B, Stevens RL: High-throughput generation, optimization and analysis of genome-scale metabolic models. Nat Biotechnol. 2010, 28: 977-982. 10.1038/nbt.1672.PubMedView ArticleGoogle Scholar
- Collins MD, Jones D: Distribution of Isoprenoid Quinone Structural Types in Bacteria and Their Taxonomic Implications. Microbiol Rev. 1981, 45: 316-354.PubMedPubMed CentralGoogle Scholar
- Knoop H, Grundel M, Zilliges Y, Lehmann R, Hoffmann S, Lockau W, Steuer R: Flux balance analysis of cyanobacterial metabolism: the metabolic network of Synechocystis sp. PCC 6803. PLoS Comput Biol. 2013, 9: e1003081-10.1371/journal.pcbi.1003081.PubMedPubMed CentralView ArticleGoogle Scholar
- Knoop H, Zilliges Y, Lockau W, Steuer R: The metabolic network of Synechocystis sp. PCC 6803: systemic properties of autotrophic growth. Plant Physiol. 2010, 154: 410-422. 10.1104/pp.110.157198.PubMedPubMed CentralView ArticleGoogle Scholar
- Nakao M, Okamoto S, Kohara M, Fujishiro T, Fujisawa T, Sato S, Tabata S, Kaneko T, Nakamura Y: CyanoBase: the cyanobacteria genome database update 2010. Nucleic Acids Res. 2010, 38: D379-D381.PubMedPubMed CentralView ArticleGoogle Scholar
- Nakamura Y, Kaneko T, Miyajima N, Tabata S: Extension of CyanoBase. CyanoMutants: repository of mutant information on Synechocystis sp. strain PCC6803. Nucleic Acids Res. 1999, 27: 66-68. 10.1093/nar/27.1.66.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang SY, Bryant DA: The Tricarboxylic Acid Cycle in Cyanobacteria. Science. 2011, 334: 1551-1553. 10.1126/science.1210858.PubMedView ArticleGoogle Scholar
- Schirmer A, Rude MA, Li X, Popova E, del Cardayre SB: Microbial biosynthesis of alkanes. Science. 2010, 329: 559-562. 10.1126/science.1187936.PubMedView ArticleGoogle Scholar
- Steinbuchel A, Valentin HE: Diversity of Bacterial Polyhydroxyalkanoic Acids. FEMS Microbiol Lett. 1995, 128: 219-228.View ArticleGoogle Scholar
- Philip S, Keshavarz T, Roy I: Polyhydroxyalkanoates: biodegradable polymers with a range of applications. J Chem Technol Biotechnol. 2007, 82: 233-247. 10.1002/jctb.1667.View ArticleGoogle Scholar
- Rehm BH, Steinbuchel A: Biochemical and genetic analysis of PHA synthases and other proteins required for PHA synthesis. Int J Biol Macromol. 1999, 25: 3-19. 10.1016/S0141-8130(99)00010-0.PubMedView ArticleGoogle Scholar
- Atsumi S, Hanai T, Liao JC: Non-fermentative pathways for synthesis of branched-chain higher alcohols as biofuels. Nature. 2008, 451: 86-U13. 10.1038/nature06450.PubMedView ArticleGoogle Scholar
- Clomburg JM, Gonzalez R: Biofuel production in Escherichia coli: the role of metabolic engineering and synthetic biology. Appl Microbiol Biotechnol. 2010, 86: 419-434. 10.1007/s00253-010-2446-1.PubMedView ArticleGoogle Scholar
- de la Plaza M, de Palencia Fernandez P, Pelaez C, Requena T: Biochemical and molecular characterization of alpha-ketoisovalerate decarboxylase, an enzyme involved in the formation of aldehydes from amino acids by Lactococcus lactis. FEMS Microbiol Lett. 2004, 238: 367-374.PubMedGoogle Scholar
- Russell DW, Smith M, Williamson VM, Young ET: Nucleotide sequence of the yeast alcohol dehydrogenase II gene. J Biol Chem. 1983, 258: 2674-2682.PubMedGoogle Scholar
- Vidal R, Lopez-Maury L, Guerrero MG, Florencio FJ: Characterization of an alcohol dehydrogenase from the Cyanobacterium Synechocystis sp strain PCC 6803 that responds to environmental stress conditions via the Hik34-Rre1 two-component system. J Bacteriol. 2009, 191: 4383-4391. 10.1128/JB.00183-09.PubMedPubMed CentralView ArticleGoogle Scholar
- Papoutsakis ET: Engineering solventogenic clostridia. Curr Opin Biotechnol. 2008, 19: 420-429. 10.1016/j.copbio.2008.08.003.PubMedView ArticleGoogle Scholar
- Ezeji TC, Qureshi N, Blaschek HP: Bioproduction of butanol from biomass: from genes to bioreactors. Curr Opin Biotechnol. 2007, 18: 220-227. 10.1016/j.copbio.2007.04.002.PubMedView ArticleGoogle Scholar
- Sillers R, Chow A, Tracy B, Papoutsakis ET: Metabolic engineering of the non-sporulating, non-solventogenic Clostridium acetobutylicum strain M5 to produce butanol without acetone demonstrate the robustness of the acid-formation pathways and the importance of the electron balance. Metab Eng. 2008, 10: 321-332. 10.1016/j.ymben.2008.07.005.PubMedView ArticleGoogle Scholar
- Yu M, Zhang Y, Tang IC, Yang ST: Metabolic engineering of Clostridium tyrobutyricum for n-butanol production. Metab Eng. 2011, 13: 373-382. 10.1016/j.ymben.2011.04.002.PubMedView ArticleGoogle Scholar
- Thony B, Auerbach G, Blau N: Tetrahydrobiopterin biosynthesis, regeneration and functions. Biochem J. 2000, 347 (Pt 1): 1-16.PubMedPubMed CentralView ArticleGoogle Scholar
- Choi YK, Hwang YK, Park YS: Molecular cloning and disruption of a novel gene encoding UDP-glucose: tetrahydrobiopterin alpha-glucosyltransferase in the cyanobacterium Synechococcus sp. PCC 7942. FEBS Lett. 2001, 502: 73-78. 10.1016/S0014-5793(01)02667-9.PubMedView ArticleGoogle Scholar
- Chung HJ, Kim YA, Kim YJ, Choi YK, Hwang YK, Park YS: Purification and characterization of UDP-glucose:tetrahydrobiopterin glucosyltransferase from Synechococcus sp. PCC 7942. Biochim Biophys Acta. 2000, 1524: 183-188. 10.1016/S0304-4165(00)00156-2.PubMedView ArticleGoogle Scholar
- Sun J, Sayyar B, Butler JE, Pharkya P, Fahland TR, Famili I, Schilling CH, Lovley DR, Mahadevan R: Genome-scale constraint-based modeling of Geobacter metallireducens. BMC Syst Biol. 2009, 3: 15-10.1186/1752-0509-3-15.PubMedPubMed CentralView ArticleGoogle Scholar
- Pinchuk GE, Hill EA, Geydebrekht OV, De Ingeniis J, Zhang X, Osterman A, Scott JH, Reed SB, Romine MF, Konopka AE, et al: Constraint-based model of Shewanella oneidensis MR-1 metabolism: a tool for data analysis and hypothesis generation. PLoS Comput Biol. 2010, 6: e1000822-10.1371/journal.pcbi.1000822.PubMedPubMed CentralView ArticleGoogle Scholar
- Sun J, Haveman SA, Bui O, Fahland TR, Lovley DR: Constraint-based modeling analysis of the metabolism of two Pelobacter species. BMC Syst Biol. 2010, 4: 174-10.1186/1752-0509-4-174.PubMedPubMed CentralView ArticleGoogle Scholar
- Hamilton JJ, Reed JL: Identification of functional differences in metabolic networks using comparative genomics and constraint-based models. PloS one. 2012, 7: e34670-10.1371/journal.pone.0034670.PubMedPubMed CentralView ArticleGoogle Scholar
- Reyes R, Gamermann D, Montagud A, Fuente D, Triana J, Urchueguia JF, de Cordoba PF: Automation on the generation of genome-scale metabolic models. J Comput Biol. 2012, 19: 1295-1306. 10.1089/cmb.2012.0183.PubMedView ArticleGoogle Scholar
- Liao YC, Chen JC, Tsai MH, Tang YH, Chen FC, Hsiung CA: MrBac: a web server for draft metabolic network reconstructions for bacteria. Bioeng Bugs. 2011, 2: 284-287. 10.4161/bbug.2.5.16113.PubMedView ArticleGoogle Scholar
- Vitkin E, Shlomi T: MIRAGE: a functional genomics-based approach for metabolic network model reconstruction and its application to cyanobacteria networks. Genome Biol. 2012, 13: R111-10.1186/gb-2012-13-11-r111.PubMedPubMed CentralView ArticleGoogle Scholar
- Agren R, Liu LM, Shoaie S, Vongsangnak W, Nookaew I, Nielsen J: The RAVEN toolbox and its use for generating a genome-scale metabolic model for penicillium chrysogenum. PLoS Comput Biol. 2013, 9: e1002980-10.1371/journal.pcbi.1002980.PubMedPubMed CentralView ArticleGoogle Scholar
- Hunter S, Jones P, Mitchell A, Apweiler R, Attwood TK, Bateman A, Bernard T, Binns D, Bork P, Burge S, et al: InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res. 2012, 40: D306-D312. 10.1093/nar/gkr948.PubMedPubMed CentralView ArticleGoogle Scholar
- Meyer F, Overbeek R, Rodriguez A: FIGfams: yet another set of protein families. Nucleic Acids Res. 2009, 37: 6643-6654. 10.1093/nar/gkp698.PubMedPubMed CentralView ArticleGoogle Scholar
- Pruitt KD, Tatusova T, Brown GR, Maglott DR: NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res. 2012, 40: D130-D135. 10.1093/nar/gkr1079.PubMedPubMed CentralView ArticleGoogle Scholar
- Satish Kumar V, Dasika MS, Maranas CD: Optimization based automated curation of metabolic reconstructions. BMC Bioinforma. 2007, 8: 212-10.1186/1471-2105-8-212.View ArticleGoogle Scholar
- Notebaart RA, van Enckevort FH, Francke C, Siezen RJ, Teusink B: Accelerating the reconstruction of genome-scale metabolic networks. BMC Bioinforma. 2006, 7: 296-10.1186/1471-2105-7-296.View ArticleGoogle Scholar
- Orth JD, Thiele I, Palsson BO: What is flux balance analysis?. Nat Biotechnol. 2010, 28: 245-248. 10.1038/nbt.1614.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.