Knowledge management for systems biology a general and visually driven framework applied to translational medicine
© Maier et al; licensee BioMed Central Ltd. 2011
Received: 14 June 2010
Accepted: 5 March 2011
Published: 5 March 2011
To enhance our understanding of complex biological systems like diseases we need to put all of the available data into context and use this to detect relations, pattern and rules which allow predictive hypotheses to be defined. Life science has become a data rich science with information about the behaviour of millions of entities like genes, chemical compounds, diseases, cell types and organs, which are organised in many different databases and/or spread throughout the literature. Existing knowledge such as genotype - phenotype relations or signal transduction pathways must be semantically integrated and dynamically organised into structured networks that are connected with clinical and experimental data. Different approaches to this challenge exist but so far none has proven entirely satisfactory.
To address this challenge we previously developed a generic knowledge management framework, BioXM™, which allows the dynamic, graphic generation of domain specific knowledge representation models based on specific objects and their relations supporting annotations and ontologies. Here we demonstrate the utility of BioXM for knowledge management in systems biology as part of the EU FP6 BioBridge project on translational approaches to chronic diseases. From clinical and experimental data, text-mining results and public databases we generate a chronic obstructive pulmonary disease (COPD) knowledge base and demonstrate its use by mining specific molecular networks together with integrated clinical and experimental data.
We generate the first semantically integrated COPD specific public knowledge base and find that for the integration of clinical and experimental data with pre-existing knowledge the configuration based set-up enabled by BioXM reduced implementation time and effort for the knowledge base compared to similar systems implemented as classical software development projects. The knowledgebase enables the retrieval of sub-networks including protein-protein interaction, pathway, gene - disease and gene - compound data which are used for subsequent data analysis, modelling and simulation. Pre-structured queries and reports enhance usability; establishing their use in everyday clinical settings requires further simplification with a browser based interface which is currently under development.
In biological or clinical research the creation of knowledge, here defined as "the realisation and understanding of patterns and their implications existing in information" relies on data mining. This in turn requires the collection and integration of a diverse set of up-to-date data and the associated context i.e. information. These sets include unstructured information from the literature, specifically extracted information from the multitude of available databases, experimental data from "-omics" platforms as well as phenotype information and clinical data. Although a large amount of information is stored in numerous different databases (the 2010 NAR database issue listing more than 1200 ) even more is still embedded in unstructured free text. Over the last 15 years a large number of methods and software tools have been developed to integrate aspects of biological knowledge such as signalling pathways or functional annotation with experimental data. However, it has proven extremely difficult to couple true semantic integration (i.e. the mapping of equivalent meaning and objects) across all information types relevant in a life science project with a flexible and extendible data model, robustness against structural changes in services and data, transparent usage, and low set-up and maintenance requirements (see  for an excellent recent review). In principle this difficulty arises from the high complexity of life science data, which is partly an artefact of the fragmented landscape of data sources but also stems from reasons integral to the life sciences. The ever extending "parts-list of life" itself already offers an astounding number of object classes, from the molecular to the organism, even if common naming/identifier and definitions could be agreed upon. In addition experimental data can only be interpreted in the context of the exact identity of the experimental sample, the samples environment, the samples processing and the processing and quality of the generated data. Even more than the occasional extension of the "parts-list" from our growing knowledge, technical development continually generates new data types, processing methods and experimental conditions. While life science projects in general will (hopefully) share some concepts, almost each one will require some individual adjustment to integrate and view the relevant information. Therefore an optimal data integration approach will ensure that the data model can be based on existing concepts (ideally ontological i.e. controlled, structured vocabulary) yet remains flexible and extendible by the advanced user. In this respect today's most successful (i.e. widely used) data integration approaches such as SRS  or Entrez  show only weak, cross-reference based data integration without semantic mapping to a common concept (categorised as link/index integration by Köhler  and Stein ). They depend on pairwise mappings between individual database entries provided by the data source e.g. from a protein sequence entry to the corresponding transcript, the mappings lack semantic meaning i.e. the notion that a protein is expressed from a gene can not be stated or queried. Additional processing and data mapping is required to answer even simple questions such as "which molecular mechanisms are known to be involved in the pathology of chronic obstructive pulmonary disease ?". Currently custom-developed data warehouses such as Atlas , BIOZON  or BioGateway , are the most common technical concept to achieve full semantic integration (in public and industry projects). While these are ideally suited to answer complex queries their inflexible and pre-determined data model and the necessary, often difficult, data synchronisation result in high set-up and maintenance costs. Further, adaptation of such data warehouses structure to an ever changing environment or requirements are difficult at best . Fortunately, as more data sources start to adopt semantic web representations such as OWL  and RDF  maintenance for semantic mappings becomes less of an issue as concomitantly to adopting a common language to transport semantics many data sources also standardise the semantics they provide such as using common entity references and ontologies.
An optimisation, at least regarding data synchronisation, has been to present a semantically fully integrated view of the data while the underlying data is assembled on-the-fly from distributed sources using a coherent data model and semantic mappings [12, 13] (categorised as federation/view integration by Köhler and Stein [5, 6]). Details of this approach vary widely. The ad-hoc data assembly process can be provided by home made scripts or, more recently, using workflow engines such as Taverna . The data model can be programmed with a specific language as in Kleisli  or may make use of standard ontologies as with TAMBIS . Semantic mapping to the common concept can be produced by a view providing environment, such as BioMediator  and the Bio2RDF project , or can come from individual integrated data sources. In the latter case the data sources either provide such mappings voluntarily, working for the common good of the "semantic web"  or are forced to do so by a closed application environment such as caBIG , Gen2Phen/PaGE-OM [12, 21] or GMOD . While conceptually elegant, these approaches have some disadvantages: the start-up costs are quite high (e.g. [13, 23]), the performance is determined by the slowest, least stable of the integrated resources, complex queries result in large joins which are hard to optimise, and data models are often hard to extend. Ad-hoc desktop data integration and visualisation tools such as Cytoscape , Osprey  or ONDEX  on the other hand combine excellent flexibility with good performance due to local data storage, however they do not allow large scale knowledge bases to be collaboratively generated, managed and shared.
Another issue, which is only partially addressed by current data integration solutions, is the need to organise not only public information but project-specific knowledge and data, keep it private or partially private for some time, store and connect experimental results and corresponding metainformation about materials and methods and, if eventually verified, merge it into the pool of common knowledge. This may for example take the form of an existing signal transduction pathway which is privately extended with new members or connections. The extension is then published and discussed within a specific project until it is accepted as common knowledge. While data resources such as GEO provide the option to keep submitted data private for some time, they generally do not allow existing knowledge to be extended as described above or allow existing data to be annotated with private or public comments.
Our challenge was to develop a knowledge management environment that achieves several goals: focus on the management of project-specific knowledge; ease data model generation and extension; provide completely flexible data integration and reporting methods combined with intuitive visual navigation and query generation; and address the issues of set-up and maintenance cost.
To do so we chose to apply different aspects of the approaches described above. In the next sections we describe the creation of a knowledge base for chronic diseases based on the BioXM software platform that efficiently models complex research environments with a flexible management, query and reporting interface which automatically adapts to the conceptualisation of the modelled information.
The BioXM rationale
BioXM has been developed around the concept of object-oriented semantic integration. In this concept semantically identical objects, which represent information about the same real world object, and the meanings of associations between these objects are identified and mapped based on data and descriptive metainformation . In the life sciences this mostly concerns the mapping of biological entities and descriptive data from literature and databases to common instances of objects like genes, phenotypes or patients., Associations between the entities are mapped as relations (e.g. compound X inhibits protein A) and object - relation information is contextually structured (e.g. gene B expressed in tissue Z at time T after application of compound X). Based on objects as nodes (in BioXM called "elements") and relations as edges a "semantic network" which provides semantic information about the connection between participating object instances can be generated.
Details about the technical implementation of the BioXM software have been published elsewhere  and are only briefly summarised here.
The EU FP6 BioBridge Systems Medicine project http://www.biobridge.eu focused on the integration of genomics and chronic disease phenotype data with modelling and simulation tools for clinicians to support understanding, diagnosis and therapy of chronic diseases. We have configured and extended the generic BioXM knowledge management environment to create the knowledge base for this translational system biology approach, focusing on chronic obstructive pulmonary disease (COPD) as an initial use case.
Data model configuration
Fundamental semantic objects
Represents a basic unit of a knowledge model
"Gene" element type can be used to create the "STAT3" gene element "Disease term" element type can be used to create the "pancreatic tumor" disease term element
Describes a relationship between semantic objects
"Gene-disease" relation class can be used to create the "STAT3 is associated with disease pancreatic tumor" relation
Extends the properties of
a semantic object by a set of attributes
Experimental data (evidence)
Performance optimized extension of an element by a set of attributes
Classifies semantic objects according to a defined hierarchical nomenclature of concepts
"22.214.171.124 DNA-3-methaladenine glycosidase II" entry is part of the "EC numbers" ontology
Gene Ontology to classify biological function
NCI Thesaurus of disease terms taxonomy
Represents sets of semantic objects
Metabolic pathways Protein complexes A disease process or pattern
A basic unit of a knowledge model populated from an external application/database
dbSNP Sequence Variant Genome feature
COPD specific knowledge base
Level of curation
High throughput data submission and manually curated from literature
last public version 20.3.07
19 707 interactions
Manually curated from literature
Different evidence codes
Manually curated from literature
Curated from different data sources
Compound-gene, Compound-disease and Gene-disease relationships
259 898 relations
Manually curated from literature
Gene functional information
80 793 human, mouse and rat genes
Curated information integrated from different databases, based on RefSeq genomes
Enzyme related functional information
Manually curated from literature
data (expression, ChIP-chip etc.)
>400 000 individual experiments
21 584 binary interactions
Manually curated from literature
Manually curated from the published literature
Manually curated from literature
current release 31.10.07
Gene - disease relations
Curated from literature
Protein family information
10 340 families
Manually curated from sequence alignments
last release 18.10.04
Automatic collection with manual curation
Interactions and pathways
>600 pathways, >24 000
DNA and protein sequences
Automatic processing and manual curation
User submission followed by automatic clustering
Automatic processing, Swissprot subsection manual curation
Populating the data model
The model described above enabled us to semantically integrate existing public databases and information derived from the literature with clinical and experimental data created during the BioBridge project.
Mapping a resource to the data model requires expertise about the semantic concept of the resource and the configured BioXM data model. To integrate the individual entities of a data source semantically the mapping method for the entities need to be defined. If available, BioXM makes use of namespace based standard identifiers, existing cross-references and ontologies for the population of the data model. In most cases the semantics of a given data source are not (yet) described in machine readable form and the initial mapping template needs to be generated manually. The BioXM core framework is extended with pre-defined semantic mappings currently existing for about 70 public data resources and formats (see additional file 2). In addition text-mining and sequence similarity (BLAST) based mappings are enabled, however users need to be aware of the pitfalls of these methods as no automatic conflict resolution is attempted.
For the COPD knowledge base we use entities, references and ID mappings provided by EntrezGene , Genbank , RefSeq , HGNC , ENSEMBL , UniProt  and EMBL  to populate the system with instances of genes and proteins from human, mouse and rat. Starting with EntrezGene we create gene instances and map entities from the other sources iteratively by reference. For each database the quality of the references to external sources needs to be judged individually and correspondingly be constrained against ambiguous connections. UniProt protein entries for example provide references to DNA databases, with some references pointing to mRNA which allows the corresponding gene to be identified uniquely, and others pointing to contigs and whole chromosomes with multiple gene references. Use of references in this case therefore is constrained to the target entry type "mRNA". A new instance is generated for each database entry from the corresponding organisms which can not be mapped to an existing instance by ID reference; no name based mapping or name conflict resolution is attempted at this stage. As the knowledge base develops iterative rounds of extension occur with additional data sources. Based on non-ambiguous identifiers we map additional information from the sources described below with mappings being extended, removed and remapped during each updating round.
Generating an import template using the import wizard requires no software development knowledge and for many sources only takes minutes (e.g. for protein-protein interaction data which uses UniProt accessions to unambiguously identify the protein entities and the Molecular Interaction Ontology  to describe the interaction type and evidence). However, integration can also take up to a week of software development if extensive parsing and transformation of a complex data source such as ENSEMBL is required.
Naming conflicts and lack of descriptive, structured metainformation are the main reasons for the lack of semantic integration in the life sciences, issues that are as much technological as sociological. The use of a structured knowledge management tool within BioBridge ensured all newly produced data makes use of unique identifiers and provides extensive, structured metainformation. This semantic integration and standardisation fostered data exchange as well as social interactions within the project, which are a pre-requisite for translational systems biology projects and their highly diverse multi-subject expert teams. In addition the semantic integration greatly simplifies the future sharing of the produced data as it is immediately available in semantic form.
For the import of data several formats are supported from simple manually mapped delimiter formats such as tab-delimited to XML formats with potentially fully automatic semantic mapping like Pedro , SBML or OWL. If machine readable metainformation is provided, such as MIRIAM references in SBML, they are used to automatically map the imported entities to existing instances of semantic objects. In the current version, the knowledge base integrates more than 20 different public databases (see Table 2) representing a total of 80 793 relevant genes (30 246 human, 27 237 mouse and 23 310 rat), 1 307 pathways, 78 528 compounds with related gene/disease information, 1 525 474 protein interactions and the entire Gene Expression Omnibus and PubChem databases resulting in a total of 3 666 313 connections within the knowledge network. In addition two BioBridge specific datasets, 54 inflammation and tissue specific pathways and 122 COPD and exercise specific metabolite and enzyme concentrations and activities were manually curated from the literature within the project. The pathway curation followed a standard text-mining supported process as described for example in  while the enzyme concentration and activity curation was fully manual due to the small set of available relevant publications. To our knowledge BioBridge thus provides the first semantically integrated knowledge base of public COPD-specific information. In addition the resource will be continuously extended as more COPD specific data becomes publicly available e.g. the experimental data generated within BioBridge (160 pre- and post-training expression, metabolite and proteomics data sets) will become publicly available as soon as the consortium has generated an initial analysis of the data. Currently the COPD knowledge base contains almost 10 million experimental result data of which almost 6 million come from public data. In other projects we are currently using BioXM with several hundred million data points on networks with tens of million edges and nodes, showing that the approach scales for at least two more orders of magnitude (unpublished data).
Browse, query and retrieve
Users visually browse and query the network simply by right-clicking on any focus of interest (e.g. a gene, a patient or a protein-protein interaction) so that associated entities can be added to the existing network visualisation. The corresponding context menu is dynamic, offering all those entities for selection which, based on the data model, are directly associated with the initial focus (i.e. one step in the network). A researcher could, for example, expand from a gene to include its relationships with diseases. From the disease association it may be of interest to identify patients represented in the gene expression database who share that particular diagnosis. Entities distanced by more than one step in the data model can be associated with each other by complex queries which transverse several nodes within the graph and aggregate information to decide whether a connection is valid. These complex queries are transparent to the user who executes them as part of the graphical navigation when asking for "associated objects" (see below for query construction). Graph based navigation will become difficult in terms of visualisation layout and performance beyond several thousand objects.
The intuitive graphic query system therefore is supplemented by a more complex wizard that allows dynamic networks to be created by in depth, structured searches, which combine semantic terms that are dynamically pre-defined by the data model (see Figure 3 for the query wizard and Figure 2A for a resulting network. The additional files 3 and 4 provide details and example data on how to create a query). The query construction is natural language like and thus allows to generate complex searches without knowledge of special query languages such as SQL or SPARQL but some knowledge about the data model must be acquired to work efficiently with the wizard. A search for all patients diagnosed with COPD severity grade above 2 but no cancer which have low body mass index for example would read: "Object to find is a Patient which simultaneously is annotated by Patient diagnostic data which has GOLD attribute greater than 2 and is annotated by Patient Anthropometrics which has BMI-BT attribute less than 18 and never is diagnosed with a NCI Thesaurus entry which is inferred by ontology entry which has name like '*cancer*'". A query can be saved as a "smart folder" or query template for re-use and thus allows experienced users to share their complex queries with less frequent users. For saved queries "Query variables" can be defined so generic smart folder queries can be adapted to specific question. In the example above the actual parameters for COPD severity grade, BMI and diagnosed disease might be set as variables for other users to change. Normal folders (yellow) allow users to organise data manually in their private space by drag-and-drop e.g. to create a permanent list of "favourite genes" or a specific pathway. In contrast the content of "smart folders" is dynamic, as it is actually a query result, which immediately updates whenever changes in the content of the knowledge base occur. Defining a query takes between seconds for simple "search all compounds used as medication" type questions to tens of minutes for complex questions which traverse the full connectivity of the semantic network. In the same way performance of query execution directly depends on the complexity of the query. Queries traversing many connections in the semantic network with entwined constrains may take several minutes to execute while simple queries even with millions of results return within seconds.
Therefore configuring reports does require no software development skill but is based on an understanding of the data model configuration. As with the query and import template wizards, view items are drawn from the data model using functions such as "related object", "assigned annotation" or "query result" which can be further restricted to specific types such as "relation of type protein expression". Configuring a new report on average takes only minutes but, as reports can contain query results, can also take tens of minutes if a new complex query needs to be defined. Reports defined for an object like gene can be re-used as "nested reports" wherever a gene type object is included as view item in another report allowing complex reports to be assembled from simple units.
Report display performance directly depends on the configuration and takes between seconds and several minutes. Simple, fast reports depend on directly related information e.g. a gene report which brings together Sequence Variant, gene-disease and gene-compound information. Complex, slow reports integrate queries to traverse the semantic network and pull together distant information e.g. the medication for the patients for which a given gene was upregulated.
View items can also be used within "information layers" which visualise the information directly on top of a network graph by changing size and colour of the displayed objects. To define the information layer ranges of expected numerical or nominal values in the view item are assigned to colour and size ranges for the graphical object display. In a simple case this is used to display expression data on top of a gene network but based on using query results as view items it can also be used to display the number of publications associated with a gene - phenotype association. Within the graph information layers are executed for every suitable object displayed and depending on the complexity of the defined view item the generation of the overlay can take between seconds and several minutes.
The BioBridge knowledge base implementation enables the integrative analysis of clinical data e.g. questionnaires, anthropometric and physiologic data with gene expression and metabolomics data and literature derived molecular knowledge. The knowledge base is currently used by data analysis and modelling groups within BioBridge to extend literature-derived, COPD-specific molecular networks with probabilistic networks derived from expression data (method described in ). Output of the probabilistic networks together with expression and metabolomics data is then used to tune mathematical models of the central metabolism () for COPD specific simulations.
While the promises of the Semantic Web continue to creep slowly into existence  individual projects need immediate, adaptable solutions which allow project specific knowledge conceptualisations to be set-up with low start-up cost and the flexibility to extend, standardise and exchange their data and knowledge. Using a generic knowledge management framework we were able to configure and populate a productively used, project specific systems biology knowledge base within 6 month with similar, software development based integration projects being reported to take between 2-5 years [13, 23]. The COPD knowledge base, set-up as the central knowledge management resource of the BioBridge project, provides a free, comprehensive, easy to use resource for all COPD related clinical research and will be continuously extended aiming to generate the definitive resource on clinical research in COPD. More broadly our configuration based approach to semantic integration is generally applicable to close the knowledge management gap between public and project specific data affecting a large number of current systems biology and high-throughput data dependent clinical research projects. To bridge the gap between the current user interface, tuned to suit experienced, frequent users, and everyday clinical application the BioBridge project developed a simplified web portal interface for a number of use-cases. Based on the feedback from clinical users we recently developed a Foswiki  plug in for BioXM which unifies the simple set-up of a Wiki with the knowledge management functions. From this we will develop a browser based portal as the primary access to the COPD knowledge base. Future directions include: support of additional structured languages for import and export such as BioPAX and CellML ; development of a workflow framework for data analysis and integration of algorithmic methods for semantic mapping.
Availability and requirements
The BioBridge COPD knowledge base as a data resource is freely available to academic users (requires pre-installed Java 6.0.4 or newer). Upon request to DM the BioXM software application itself can be made available for academic PhD research projects within the Biomax BioXM PhD collaboration programme.
We thank all the members of the BioBridge team for valuable discussions on the data model and their evaluation of BioXM. This work was supported by the European Commission (FP6) BioBridge LSHG-CT-2006-037939. We thank the unknown reviewers for their very valuable suggestions and criticism which helped to clarify and structure the manuscript.
- Cochrane GR, Galperin MY: Nucleic Acids Research annual Database Issue and the NAR online Molecular Biology Database Collection in 2010. Nucl Acids Res 2010, 38: D1-4. 10.1093/nar/gkp1077PubMed CentralView ArticlePubMedGoogle Scholar
- Goble C, Stevens R: State of the nation in data integration for bioinformatics. J Biomed Inform 2008, 41: 687-93. 10.1016/j.jbi.2008.01.008View ArticlePubMedGoogle Scholar
- Etzold T, Ulyanov A, Argos P: SRS: information retrieval system for molecular biology data banks. Methods Enzymol 1996, 266: 114-28. full_text full_text full_textView ArticlePubMedGoogle Scholar
- Schuler GD, Epstein JA, Ohkawa H, Kans JA: Entrez: molecular biology database and retrieval system. Methods Enzymol 1996, 266: 141-62. full_text full_text full_textView ArticlePubMedGoogle Scholar
- Köhler J: Integration of life science databases. Drug Discovery Today: BIOSILICO 2004, 2: 61-69.View ArticleGoogle Scholar
- Stein LD: Integrating biological databases. Nat Rev Genet 2003, 4: 337-345. 10.1038/nrg1065View ArticlePubMedGoogle Scholar
- Shah SP, Huang Y, Xu T, Yuen MMS, Ling J, Ouellette BFF: Atlas - a data warehouse for integrative bioinformatics. BMC Bioinformatics 2005, 6: 34. 10.1186/1471-2105-6-34PubMed CentralView ArticlePubMedGoogle Scholar
- Birkland A, Yona G: BIOZON: a system for unification, management and analysis of heterogeneous biological data. BMC Bioinformatics 2006, 7: 70. 10.1186/1471-2105-7-70PubMed CentralView ArticlePubMedGoogle Scholar
- Antezana E, Blondé W, Egaña M, Rutherford A, Stevens R, De Baets B, Mironov V, Kuiper M: BioGateway: a semantic systems biology tool for the life sciences. BMC Bioinformatics 2009,10(Suppl 10):S11. 10.1186/1471-2105-10-S10-S11PubMed CentralView ArticlePubMedGoogle Scholar
- Web Ontology Language OWL/W3C Semantic Web Activity[http://www.w3.org/2004/OWL/]
- RDF - Semantic Web Standards[http://www.w3.org/RDF/]
- Thorisson GA, Muilu J, Brookes AJ: Genotype-phenotype databases: challenges and solutions for the post-genomic era. Nat Rev Genet 2009, 10: 9-18. 10.1038/nrg2483View ArticlePubMedGoogle Scholar
- Hu M, Mural R, Liebman M: Biomedical Informatics in Translational Research. Artech House Publishers; 2008.Google Scholar
- Hull D, Wolstencroft K, Stevens R, Goble C, Pocock MR, Li P, Oinn T: Taverna: a tool for building and running workflows of services. Nucleic Acids Res 2006, 34: W729-732. 10.1093/nar/gkl320PubMed CentralView ArticlePubMedGoogle Scholar
- Davidson SB, Wong L: The Kleisli Approach to Data Transformation and Integration. 2001, 135-165.Google Scholar
- Stevens R, Baker P, Bechhofer S, Ng G, Jacoby A, Paton NW, Goble CA, Brass A: TAMBIS: Transparent Access to Multiple Bioinformatics Information Sources. Bioinformatics 2000, 16: 184-186. 10.1093/bioinformatics/16.2.184View ArticlePubMedGoogle Scholar
- Mork P, Halevy A, Tarczy-hornoch P: A model for data integration systems of biomedical data applied to online genetic databases. Proceedings of the Symposium of the American Medical Informatics Association 2001, 473-477.Google Scholar
- Belleau F, Nolin M, Tourigny N, Rigault P, Morissette J: Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J Biomed Inform 2008, 41: 706-716. 10.1016/j.jbi.2008.03.004View ArticlePubMedGoogle Scholar
- Berners-Lee T, Hendler J, Lassila O: The Semantic Web. Scientific American 2001, 34-43. 10.1038/scientificamerican0501-34Google Scholar
- Covitz PA, Hartel F, Schaefer C, De Coronado S, Fragoso G, Sahni H, Gustafson S, Buetow KH: caCORE: a common infrastructure for cancer informatics. Bioinformatics 2003, 19: 2404-12. 10.1093/bioinformatics/btg335View ArticlePubMedGoogle Scholar
- Brookes AJ, Lehvaslaiho H, Muilu J, Shigemoto Y, Oroguchi T, Tomiki T, Mukaiyama A, Konagaya A, Kojima T, Inoue I, Kuroda M, Mizushima H, Thorisson GA, Dash D, Rajeevan H, Darlison MW, Woon M, Fredman D, Smith AV, Senger M, Naito K, Sugawara H: The phenotype and genotype experiment object model (PaGE-OM): a robust data structure for information related to DNA variation. Hum Mutat 2009, 30: 968-977. 10.1002/humu.20973View ArticlePubMedGoogle Scholar
- Stein LD, Mungall C, Shu S, Caudy M, Mangone M, Day A, Nickerson E, Stajich JE, Harris TW, Arva A, Lewis S: The generic genome browser: a building block for a model organism system database. Genome Res 2002, 12: 1599-1610. 10.1101/gr.403602PubMed CentralView ArticlePubMedGoogle Scholar
- Post LJG, Roos M, Marshall MS, van Driel R, Breit TM: A semantic web approach applied to integrative bioinformatics experimentation: a biological use case with genomics data. Bioinformatics 2007, 23: 3080-3087. 10.1093/bioinformatics/btm461View ArticlePubMedGoogle Scholar
- Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 2003, 13: 2498-2504. 10.1101/gr.1239303PubMed CentralView ArticlePubMedGoogle Scholar
- Breitkreutz B, Stark C, Tyers M: Osprey: a network visualization system. Genome Biol 2003, 4: R22. 10.1186/gb-2003-4-3-r22PubMed CentralView ArticlePubMedGoogle Scholar
- Köhler J, Baumbach J, Taubert J, Specht M, Skusa A, Rüegg A, Rawlings C, Verrier P, Philippi S: Graph-based analysis and visualization of experimental results with ONDEX. Bioinformatics 2006, 22: 1383-1390.View ArticlePubMedGoogle Scholar
- Bornhövd C, Buchmann A: A Prototype for Metadata-Based Integration of Internet Sources. In Advanced Information Systems Engineering. Volume 1626. Springer Berlin/Heidelberg; 1999:439-445.View ArticleGoogle Scholar
- Noy NF, Crubezy M, Fergerson RW, Knublauch H, Tu SW, Vendetti J, Musen MA: Protege-2000: an open-source ontology-development and knowledge-acquisition environment. AMIA Annu Symp Proc 2003, 953.Google Scholar
- Kaps A, Dyshlevoi K, Heumann K, Jost R, Kontodinas I, Wolff M, Hani J: The BioRS(TM) Integration and Retrieval System: An open system for distributed data integration. JIB 2006, 3.Google Scholar
- Losko S, Heumann K: Semantic data integration and knowledge management to represent biological network associations. Methods Mol Biol 2009, 563: 241-258. full_text full_text full_textView ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25: 3389-402. 10.1093/nar/25.17.3389PubMed CentralView ArticlePubMedGoogle Scholar
- Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, Arkin AP, Bornstein BJ, Bray D, Cornish-Bowden A, Cuellar AA, Dronov S, Gilles ED, Ginkel M, Gor V, Goryanin II, Hedley WJ, Hodgman TC, Hofmeyr J, Hunter PJ, Juty NS, Kasberger JL, Kremling A, Kummer U, Le Novère N, Loew LM, Lucio D, Mendes P, Minch E, Mjolsness ED, Nakayama Y, Nelson MR, Nielsen PF, Sakurada T, Schaff JC, Shapiro BE, Shimizu TS, Spence HD, Stelling J, Takahashi K, Tomita M, Wagner J, Wang J: The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 2003, 19: 524-31. 10.1093/bioinformatics/btg015View ArticlePubMedGoogle Scholar
- Kanehisa M, Goto S: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 2000, 28: 27-30. 10.1093/nar/28.1.27PubMed CentralView ArticlePubMedGoogle Scholar
- Maglott D, Ostell J, Pruitt KD, Tatusova T: Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res 2005, 33: D54-8. 10.1093/nar/gki031PubMed CentralView ArticlePubMedGoogle Scholar
- Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL: GenBank. Nucleic Acids Res 2008, 36: D25-30. 10.1093/nar/gkm929PubMed CentralView ArticlePubMedGoogle Scholar
- Pruitt KD, Tatusova T, Maglott DR: NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 2007, 35: D61-D65. 10.1093/nar/gkl842PubMed CentralView ArticlePubMedGoogle Scholar
- Wain HM, Lush M, Ducluzeau F, Povey S: Genew: the human gene nomenclature database. Nucleic Acids Res 2002, 30: 169-171. 10.1093/nar/30.1.169PubMed CentralView ArticlePubMedGoogle Scholar
- Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T, Durbin R, Eyras E, Gilbert J, Hammond M, Huminiecki L, Kasprzyk A, Lehvaslaiho H, Lijnzaad P, Melsopp C, Mongin E, Pettett R, Pocock M, Potter S, Rust A, Schmidt E, Searle S, Slater G, Smith J, Spooner W, Stabenau A, Stalker J, Stupka E, Ureta-Vidal A, Vastrik I, Clamp M: The Ensembl genome database project. Nucleic Acids Res 2002, 30: 38-41. 10.1093/nar/30.1.38PubMed CentralView ArticlePubMedGoogle Scholar
- Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LL: UniProt: the Universal Protein knowledgebase. Nucleic Acids Res 2004, 32: D115-9. 10.1093/nar/gkh131PubMed CentralView ArticlePubMedGoogle Scholar
- Kneale GG, Kennard O: The EMBL nucleotide sequence data library. Biochem Soc Trans 1984, 12: 1011-1014.View ArticlePubMedGoogle Scholar
- Orchard S, Salwinski L, Kerrien S, Montecchi-Palazzi L, Oesterheld M, Stümpflen V, Ceol A, Chatr-aryamontri A, Armstrong J, Woollard P, Salama JJ, Moore S, Wojcik J, Bader GD, Vidal M, Cusick ME, Gerstein M, Gavin A, Superti-Furga G, Greenblatt J, Bader J, Uetz P, Tyers M, Legrain P, Fields S, Mulder N, Gilson M, Niepmann M, Burgoon L, De Las Rivas J, Prieto C, Perreau VM, Hogue C, Mewes H, Apweiler R, Xenarios I, Eisenberg D, Cesareni G, Hermjakob H: The minimum information required for reporting a molecular interaction experiment (MIMIx). Nat Biotechnol 2007, 25: 894-898. 10.1038/nbt1324View ArticlePubMedGoogle Scholar
- Jameson D, Garwood K, Garwood C, Booth T, Alper P, Oliver SG, Paton NW: Data capture in bioinformatics: requirements and experiences with Pedro. BMC Bioinformatics 2008, 9: 183. 10.1186/1471-2105-9-183PubMed CentralView ArticlePubMedGoogle Scholar
- Losko S, Wenger K, Kalus W, Ramge A, Wiehler J, Heumann K: Knowledge Networks of Biological and Medical Data: An Exhaustive and Flexible Solution to Model Life Science Domains. In Data Integration in the Life Sciences. Volume 4075/2006. Springer Berlin/Heidelberg; 2006:232-239. full_textView ArticleGoogle Scholar
- Sameith K, Antczak P, Marston E, Turan N, Maier D, Stankovic T, Falciani F: Functional modules integrating essential cellular functions are predictive of the response of leukaemia cells to DNA damage. Bioinformatics 2008, 24: 2602-7. 10.1093/bioinformatics/btn489View ArticlePubMedGoogle Scholar
- Selivanov VA, de Atauri P, Centelles JJ, Cadefau J, Parra J, Cussó R, Carreras J, Cascante M: The changes in the energy metabolism of human muscle induced by training. J Theor Biol 2008, 252: 402-410. 10.1016/j.jtbi.2007.09.039View ArticlePubMedGoogle Scholar
- Good BM, Wilkinson MD: The Life Sciences Semantic Web is full of creeps! Brief Bioinform 2006, 7: 275-86. 10.1093/bib/bbl025View ArticlePubMedGoogle Scholar
- Demir E, Cary MP, Paley S, Fukuda K, Lemer C, Vastrik I, Wu G, D'Eustachio P, Schaefer C, Luciano J, Schacherer F, Martinez-Flores I, Hu Z, Jimenez-Jacinto V, Joshi-Tope G, Kandasamy K, Lopez-Fuentes AC, Mi H, Pichler E, Rodchenkov I, Splendiani A, Tkachev S, Zucker J, Gopinath G, Rajasimha H, Ramakrishnan R, Shah I, Syed M, Anwar N, Babur O, Blinov M, Brauner E, Corwin D, Donaldson S, Gibbons F, Goldberg R, Hornbeck P, Luna A, Murray-Rust P, Neumann E, Reubenacker O, Samwald M, van Iersel M, Wimalaratne S, Allen K, Braun B, Whirl-Carrillo M, Cheung K, Dahlquist K, Finney A, Gillespie M, Glass E, Gong L, Haw R, Honig M, Hubaut O, Kane D, Krupa S, Kutmon M, Leonard J, Marks D, Merberg D, Petri V, Pico A, Ravenscroft D, Ren L, Shah N, Sunshine M, Tang R, Whaley R, Letovksy S, Buetow KH, Rzhetsky A, Schachter V, Sobral BS, Dogrusoz U, McWeeney S, Aladjem M, Birney E, Collado-Vides J, Goto S, Hucka M, Le Novère N, Maltsev N, Pandey A, Thomas P, Wingender E, Karp PD, Sander C, Bader GD: The BioPAX community standard for pathway data sharing. Nat Biotechnol 2010, 28: 935-942. 10.1038/nbt.1666PubMed CentralView ArticlePubMedGoogle Scholar
- Lloyd CM, Halstead MDB, Nielsen PF: CellML: its future, present and past. Prog Biophys Mol Biol 2004, 85: 433-450. 10.1016/j.pbiomolbio.2004.01.004View ArticlePubMedGoogle Scholar