Database | Open | Published:
Functional network of glycan-related molecules: Glyco-Net in Glycoconjugate Data Bank
BMC Systems Biologyvolume 4, Article number: 91 (2010)
Glycans are involved in a wide range of biological process, and they play an essential role in functions such as cell differentiation, cell adhesion, pathogen-host recognition, toxin-receptor interactions, signal transduction, cancer metastasis, and immune responses. Elucidating pathways related to post-translational modifications (PTMs) such as glycosylation are of growing importance in post-genome science and technology. Graphical networks describing the relationships among glycan-related molecules, including genes, proteins, lipids and various biological events are considered extremely valuable and convenient tools for the systematic investigation of PTMs. However, there is no database which dynamically draws functional networks related to glycans.
We have created a database called Glyco-Net http://www.glycoconjugate.jp/functions/, with many binary relationships among glycan-related molecules. Using search results, we can dynamically draw figures of the functional relationships among these components with nodes and arrows. A certain molecule or event corresponds to a node in the network figures, and the relationship between the molecule and the event are indicated by arrows. Since all components are treated equally, an arrow is also a node.
In this paper, we describe our new database, Glyco-Net, which is the first database to dynamically show networks of the functional profiles of glycan related molecules. The graphical networks will assist in the understanding of the role of the PTMs. In addition, since various kinds of bio-objects such as genes, proteins, and inhibitors are equally treated in Glyco-Net, we can obtain a large amount of information on the PTMs.
Glycans are involved in a wide range of biological process, and they play an essential role in functions such as cell-cell interaction, pathogen-host recognition, toxin-receptor interaction, signal transduction.[1–5] One of their roles are modulating the functions of many proteins and lipids through post-translational modifications (PTMs). Glycomics is the study of the structural and functional aspects of various glycoconjugates, such as glycoproteins, glycolipids, and proteoglycans produced during PTMs in cells and organisms. The field of glycomics has lagged behind that of genomics and proteomics, mainly because of the inherent difficulties in the analysis of glycan structure and function. However, glycomics is now an emerging field due to exceptional progress in the development of modern experimental techniques and equipment including mass spectrometry (MS), high-performance liquid chromatography (HPLC), nuclear magnetic resonance (NMR) and knockout mice.[8–15] It is expected that a large quantity of information concerning glycan structure and function will be accumulated. Bioinformatics of glycans, which used to suffer from a lack of data in early studies, is now becoming a practical field in the biological sciences related to PTMs. Therefore, the construction of a new class glycan database indicating the relationship between structures and their functions and the development of related tools is strongly required from biological, pharmaceutical and medical fields.
There are several groups energetically developing both public and commercial glycan databases. For instance, some of the public databases are KEGG [16–18], SWEET-DB  in the GLYCOSCIENCES.de , the United States Consortium for Functional Glycomics (CFG) , and GlycoSuiteDB in The Expert Protein Analysis System (ExPASy) Proteomics Server . GlycoMinds http://www.glycominds.com is known as the commercial database. The Complex Carbohydrate Structure Database (CCSD) [23, 24] is the first database of glycan structures. The CCSD was developed in the 1980s and 1990s by the CarbBank Project and was discontinued in 1999 due to the lack of funding. The data of the CCSD are currently included in the public glycan databases as mentioned above. Although the web service of GLYCOSCIENCES.de is currently not available, they are trying to organize the new base for their database. The Carbohydrate-Active Enzyme (CAZy) database is known as a database of enzymes relating to glycans, such as glycosyltransferases and lectins . All of these databases with the exception of CAZy are focused on glycan structures. The SWEET-DB mainly develops the tools with which to treat the glycan structures and geometry [26–29]. The CFG is constructing carbohydrate chips to investigate the interaction between carbohydrates and proteins for therapy, and databases for functional glycomics, such as an annotated database of mass spectrometry. The KEGG GLYCAN database also has over 10,000 glycan structures; in addition, a manually drawn graphical pathway for various bio-molecules is included in KEGG PATHWAY. The Expert Protein Analysis System (ExPASy) http://www.expasy.org/ which includes a protein sequence database also holds many graphical figures of biochemical pathways. Krambeck and Betenbaugh  and Liu et al.  have developed a system which dynamically constructs a structural network regarding N- and O-linked glycans, respectively.
Recently, emerging analytical techniques enabled us to obtain a great deal of information about the relationships, not only between the glycan structures and functions, but also among glycans, phenotypes of diseases and expression of glycan-related genes. In this situation, graphical networks describing the relationships among glycan-related molecules, including genes, proteins, lipids and biological events are considered to become potential tools for accelerating the integrated study of PTMs. Although the KEGG PATHWAY and Biochemical Pathways in ExPASy http://us.expasy.org/cgi-bin/show_thumbnails.pl have graphical network figures, these are all manually selected and organized. Since glycomics and glycoproteomics data are expected to increase substantially, it is clear that the network figures generated from the glycan structures should be drawn based on the available updated data in order to give the most current overview of glycan functions.
We have endeavoured to dynamically draw figures of functional networks among glycans, genes, inhibitors, lipids, glycosphingolipids, various biological events, diseases and carbohydrate-binding proteins such as glycosyltransferases and lectins (hereafter, these are denoted as "bio-objects") for several years. Dynamic generation of the network figures within bio-objects is more progressive than networks of biosynthesis with static pictures such as KEGG PATHWAY and ExPASy. Glyco-Net was constructed as a part of the Glycoconjugate Data Bank (GDB) http://www.glycoconjugate.jp/. Each bio-object in Glyco-Net is linked to the other databases to obtain more detailed information, since Glyco-Net has been focusing on the collection of the functional relationships among bio-objects from research articles. In this paper, we describe the concept and status of Glyco-Net.
Construction and content
Glyco-Net dynamically draws functional networks in a variety of bio-objects relating to known glycans. Binary relations among glycan-related information are accumulated, as shown on the left side of Figure 1. Two bio-objects are connected by a "verb" which expresses the function of a bio-object, for example "Sugar A links to Sugar B" in Figure 1a. A function "A link to" means alpha linkage of formed glycoside bond after sugar transformation. A bio-object is described as a node, and a "verb" is expressed as an arrow in the functional network. We use over 100 verbs in Glyco-Net which are listed in Additional file 1. In order to make a network figure, the same bio-objects are superimposed at the node, which is shown on the right hand side of Figure 1b. The binary relationships for bio-objects have been curated manually. We are now constructing a kind of ontology in the wide field of glycan-related research in order to automatically gather the above binary relationships from the web articles such as PubMed http://www.pubmed.org, and this will be discussed further in our future articles. This ontology will be a powerful tool in the near future even though we should verify the data by sight.
The functional networks are dynamically created by using the stacked binary relationships. Since all bio-objects (nodes) and functions (arrows) are treated equally in Glyco-Net, an arrow behaves as a node in some cases. This feature enables us to draw flexible networks. Glyco-Net also holds a linkage which is displayed in the object tables to obtain the detailed information of the bio-objects from the other biological databases, though the linkages are not available from the network figures. Therefore the development of clickable linkage on the network figures would lead to the further examination. Following linkages are listed in Table 1: the genes entries hold the linkage to search the gene in GenBank , sugar entries have the linkages to KEGG GLYCAN, the enzyme entries have the linkage to the enzyme database ENZYME in ExPASy, and the disease entries have linkage to a disease database such as Online Mendelian Inheritance in Man (OMIM) and literature databases such as PubMed.
Glyco-Net is expected to be used as an interface between the various biological databases and the functional network of glycan-related bio-objects. The current notation of carbohydrate structures is ad hoc. There are various structural databases, such as KEGG GLYCAN, GLYCOSCIENCES.de, and CFG. Thus, it is only necessary to give the linkage from our glycan data to enter these databases. At the moment, Glyco-Net holds limited linkage to carbohydrate structure databases. We will modify the nomenclature of the carbohydrate structures with a more standard one, such as GLYDE , to make it accessible in other databases with the carbohydrate structure as a key.
Implementation of our database was carried out with a MySQL database system and a Linux environment. The interface web page was written in JavaServer Pages (JSP). The search engine and the drawing method were written in Java Programming language.
Glyco-Net aims to collect binary relations that could be extracted by going through the scientific articles such as research papers, i.e. evidence of functions by specific assays. These data were manually curated from the "Handbook of Glycosyltransferases and Related Genes."  Functions with different experimental conditions in the assay are all recognized as different functions and existed in the network figure at the same time. It is necessary to classify the functions with ontology according to the experimental conditions and/or the environment where the bio-objects are in so that the quantitative discussion can be carried out.
Data structure and statistics
Glyco-Net consists of four categories of data which are shown in Table 2. The first category is the "function", which describes the relationships between biological objects. The second one is the "object", which describes the detailed information of biological objects such as genes, proteins, lipids, glycans, and diseases. The third one is the "assay", which provides information on assays from which the functions are suggested. In addition, the references are found in the "article" category. These categories of data are divided into several tables, for example, the "object" is divided into seven tables called "sugar", "protein complex", "protein", "gene", "lipid", "disease" and "event". Relationships between articles and other tables are described in the "reference" table. Detailed data structures are provided in the Additional file 1.
Currently, Glyco-Net has 3,724 objects (1,149 objects for glycosyltransferases, 2,480 objects for genes, and 95 pieces of data concerning diseases caused by or related to carbohydrate abnormalities), 2,302 pieces of function data, and 1,201 pieces of data concerning the assay that verifies the functions of the glycoconjugates. Records (1,332) are also contained in the "article" category. Data which referenced from any articles that published after Reference 34 will be updated in the future. Furthermore, we have been developing ontology regarding Glyco-Net.
Utility and Discussion
Access to Glyco-Net
The main page of the Glycoconjugate Data Bank http://www.glycoconjugate.jp provides three links to databases, including 1) "Resources", which is a database of carbohydrate-related compounds, 2) "Structure" , which is a 3D structure database of glycans extracted from the Protein Data Bank and 3) "Glyco-Net", which shows the functional network of carbohydrate-related molecules. We can browse several function lists and network figures by clicking the bio-object type or typing the keyword to see the details of the functions.
Simple examples of searching Glyco-Net
Figure 2 shows a simple example of the search results from Glyco-Net. Figure 2a shows the result of keyword search "adhere" and several functions were found. The function list includes the function ID, the function itself, a detailed description of the function, and the comments in the function tables. Figure 2b shows the details of an object "cancer cell". In the table, the object ID, the object type, the object name, the synonyms, and the comments of the object are shown. The list of functions with the objects is shown in the part of the table. The "HOPS" number refers to the number of nodes from the object which consists of functions or objects in the table linked to the selected object of "cancer cells". In this sample search, the HOPS was set to 2. Currently, HOPS was limited to five in Glyco-Net due to our technical problem. The limitation of the HOPS number could cause ambiguous results by partly drawing network figures. In order to obtain fully accurate network figures, we are developing a novel drawing method and will update the drawing routine in the near future. By clicking the "Show Diagram" button, a figure of the functional network is shown as Figure 2c. Topology in the figure might vary as redrawing. In addition, the resolution of the figure can be changed by selecting the size of the figure. The default size of the figure is 1024 × 768 dots. Figure 2c shows a simple example that a function "Cancer cells adhere to metastatic sites." (Function ID is F0000914) is enhanced by poly-N-acetyllactosamine (Function ID is F0000913). Furthermore, since all bio-objects are treated equally, arrows have a node that is the same as other bio-objects such as "cancer cell". This is a characteristic feature of Glyco-Net. According to the value of the HOPS number, the networks would grow substantially. However, the network figures are quite complicated for a large HOPS.
Figure 3 shows a slightly more complex network than Figure 2c. The "cause" is the function that increases in hyaluronan synthase-1 causes cancer metastasis. Any function which relates to the object itself is expressed as a round arrow returning to the object. From the rest of the network figure we can discern that hyaluronan synthase-1 catalyzes two glycosylation reactions, including the formation of glycoside linkages between glucuronic acids (GlcA) and GlcNAcβ1-4(GlcAβ1-3GlcNAcβ1-4) n and between N-acetylglucosamine (GlcNAc) and (GlcAβ1-3GlcNAcβ1-4) n . This is consistent with the actual function of the hyaluronan synthase-1 that synthesizes hyaluronic acids which are comprised of repeats of the GlcA-GlcNAc disaccharide unit.
In this paper, we describe our new database, Glyco-Net, which shows graphical networks of glycan-related bio-objects such as genes, proteins, glycoproteins, lipids, glycolipids, and glycans. Each bio-object can easily be linked to the available databases such as GenBank, ExPASy, KEGG GLYCAN, GLYCOSCIENCES.de, CFG, and PubMed, though the linkage is limited from the bio-object tables at the present time. Dynamic generation of the functional network figures among bio-objects is expected to have great advantages compared with KEGG PATHWAY and ExPASy which hold static figures for biosynthesis. Since various kinds of bio-objects such as genes, proteins and inhibitors are equally treated in Glyco-Net, a large amount of information on the PTMs can be obtained. Although these characteristics are the novel implementation in the existing glycan databases, figures made by Glyco-Net are still complicated to adapt to a larger HOPS at this stage. In addition, the quantity of total data in Glyco-Net still remains a small. Therefore, we are now constructing ontology for partly automatic curation from web articles. An automatic curation with ontology will become a quite powerful tool, even though collected data should be verified carefully by scientists. We will also develop a routine to clearly draw the functional network figures. Furthermore, the nomenclature of the glycan structure should be standardized in order to search the glycans in other structure-based carbohydrate databases without uncertainty. Use of GLYDE notation is found to be quite feasible, since only our database indicates the relationships among biological objects relating to glycans. As a result, the details of the objects have to be found in other databases, and we will have to increase the linkages from our objects to other databases. Thus, the establishment of the collaboration with researchers in bioinformatics and other biosciences to improve this new type of database is the significant asset for the further development of Glyc-Net.
Availability and requirements
URL of Glyco-Net is http://www.glycoconjugate.jp/functions/.
Wells L, Vosseller K, Hart GW: Glycosylation of Nucleocytoplasmic Proteins: Signal Transduction and O-GlcNAc. Science. 2001, 291: 2376-2378. 10.1126/science.1058714
Rudd PM, Elliott T, Cresswell P, Wilson IA, Dwek RA: Glycosylation and the Immune System. Science. 2001, 291: 2370-2376. 10.1126/science.291.5512.2370
Helenius A, Aebi M: Intracellular Functions of N-Linked Glycans. Science. 2001, 291: 2364-2368. 10.1126/science.291.5512.2364
Varki A: Glycan-based interactions involving vertebrate sialic-acid-recognizing proteins. Nature. 2007, 446: 1023-1029. 10.1038/nature05816
Bishop JR, Schuksz M, Esko JD: Heparan sulphate proteoglycans fine-tune mammalian physiology. Nature. 2007, 1030-1037. 446,
Hart GW, Housley MP, Slawson C: Cycling of O-linked β-N-acetylglucosamine on nucleocytoplasmic proteins. Nature. 2007, 446: 1017-1022. 10.1038/nature05815
Turnbull J, Field RA: Emerging glycomics technologies. Nature Chemical Biology. 2007, 3: 74-77. 10.1038/nchembio0207-74
Nishimura SI, Niikura K, Kurogochi M, Matsushita T, Fumoto M, Hinou H, Kamitani R, Nakagawa H, Deguchi K, Miura N, Monde K, Kondo H: High-throughput protein glycomics: combined use of chemoselective glycoblotting and MALDI-TOF/TOF mass spectrometry. Angew Chem Int. 2005, 44: 91-96. 10.1002/anie.200461685.
Kita Y, Miura Y, Furukawa JI, Nakano M, Shinohara Y, Ohno M, Takimoto A, Nishimura SI: Quantitative glycomics of human whole serum glycoproteins based on the standardized protocol for liberating N-glycans. Mol Cell Proteom. 2007, 6: 1437-1445. 10.1074/mcp.T600063-MCP200.
Miura Y, Hato M, Shinohara Y, Kuramoto H, Furukawa JI, Kurogochi M, Shimaoka H, Tada M, Nakanishi K, Ozaki M, Todo S, Nishimura SI: BlotGlycoABC™: An integrated glycoblotting technique for rapid and large-scale clinical glycomics. Mol Cell Proteom. 2008, 7: 370-377. 10.1074/mcp.M700377-MCP200.
Anumula KR: Advances in fluorescence derivatization methods for high-performance liquid chromatgtphic analysis of glycoprotein carbohydrates. Anal Biochem. 2006, 350: 1-23. 10.1016/j.ab.2005.09.037
Campbell MP, Royle L, Fadcliffe CM, Dwek RA, Rudd PM: GlycoBase and autoGU: tools for HPLC-based glycan analysis. Bioinformatics. 2008, 24: 1214-1216. 10.1093/bioinformatics/btn090
Artemenko NV, Campbell MP, Rudd PM: GlycoExtractor: A web based interface for high throughput processing of HPLC-glycan data. Journal of proteome research. 2010, 9: 2037-2041. 10.1021/pr901213u
Kato K, Sasakawa H, Kamiya Y, Utsumi M, Nakano M, Takanashi N, Yamaguchi Y: 920 MHz Ultra-high field NMR approaches to structural glycobiology. Biochimi Biophys Acta. 2008, 1780: 619-625.
Kurogochi M, Amano M, Fumoto M, Takimoto A, Kondo H, Nishimura SI: Reverse glycoblotting allows rapid enrichment glycoproteomics of biopharmaceuticals and disease-related biomarkers. Angew Chem Int. 2007, 46: 8808-8813. 10.1002/anie.200702919.
Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M: KEGG resoures for deciphering the genome. Nucleic Acids Res. 2004, 32: D277-D280. 10.1093/nar/gkh063
Aoki FK, Ueda N, Yamaguchi A, Kanehisa M, Akutsu T, Mamitsuka H: Application of a new probabilistic model for recognizing complex patters in glycans. Bioinformatics. 2004, 20: i6-i14. 10.1093/bioinformatics/bth916
Hashimoto K, Goto S, Kawano S, Aoki-Kohinata KF, Ueda N, Hamajima M, Kawasaki T, Kanehisa M: KEGG as a glycome informatics resources. Glycobiology. 2006, 16: 63R-70R. 10.1093/glycob/cwj010
Loss A, Bunsmann P, Bohne A, Loss A, Schwarzer E, Lang E, von der Lieth CW: SWEET-DB: an attempt to create annotated data collections for carbohydrates. Nucreic Acids Research. 2002, 30: 405-408. 10.1093/nar/30.1.405.
Lütteke T, Bohne-Lang A, Loss A, Goetz T, Frank M, von der Lieth CW: GLYCOSCIENCES.de: and internet portal to support glycomics and glycobiology research. Glycobiology. 2006, 16: 71R-8R. 10.1093/glycob/cwj049
Raman R, Venkataraman M, Ramakrishnan S, Lang W, Raguram S, Sasisekharan R, : Implementation strategies at the consortium for functional glycomics. Glycobiology. 2006, 16: 82R-90R. 10.1093/glycob/cwj080
Cooper CA, Joshi HJ, Harrison MJ, Wilkins MR, Packer NH: GlycoSuiteDB: a curated relational database of glycoprotein glycan structures are their biological sources. Nucreic Acids Research. 2003, 31: 511-513. 10.1093/nar/gkg099.
Doubet S, Bock K, Smith D, Darvill A, Albersheim P: The complex carbohydrate structure database. Trends in biochemical Science. 1989, 14: 475-477. 10.1016/0968-0004(89)90175-8.
Doubet S, Albersheim P: CarbBank. Glycobiology. 1992, 2: 505-505. 10.1093/glycob/2.6.505
Cantarel BL, Coutinho PM, Rancurel C, Bernard T, Lombard V, Henrissat B: The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics. Nucleic Acids Res. 2009, 37: D233-D238. 10.1093/nar/gkn663
Bohne A, Lang E, von der Lieth CW: SWEET - www-based rapid 3D construction of oligo- and polysaccharides. Bioinformatics. 1999, 15: 767-768. 10.1093/bioinformatics/15.9.767
Bohne-Lang A, Lang E, Förster T, von der Lieth CW: LINUCS: Linear notation for unique description of carbohydrate sequences. Carbohydr Res. 2001, 336: 1-11. 10.1016/S0008-6215(01)00230-0
Lütteke T, Frank M, von der Lieth CW: Data mining the protein data bank: automatic detection and assignment of carbohydrate structures. Carbohydr Res. 2004, 339: 1015-1020. 10.1016/j.carres.2003.09.038
Lütteke T, von der Lieth CW: pdb-care (PDB carbohydrate residue check): a program to support annotation of complex carbohydrate structures in PDB files. BMC Bioinformatics. 2004, 5: 69-74. 10.1186/1471-2105-5-69
Kurambeck FJ, Betenbaugh MJ: A mathematical model of N-linked glycosylation. Biotechnology and Bioengineering. 2005, 92: 711-728. 10.1002/bit.20645
Liu G, Marathe DD, Matta KL, Neelamegham S: Systems-level modeling of cellular glycosylation reaction networks: O-linked glycan formation on natural selectin ligands. Bioinformatics. 2008, 24: 2740-2747. 10.1093/bioinformatics/btn515
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL: GenBank. Nucleic Acids Res. 2007, 35: D21-D25. 10.1093/nar/gkl986
Toukach P, Joshi JJ, Ranzinger R, Knirel Y, von der Lieth CW: Sharing of world wide distributed carbohydrate-related digital resources: online connection of the Bacterial Carbohydrate Structure DataBase and GLYCOSCIENCES.de. Nucleic Acids Res. 2007, 35: D280-D286. 10.1093/nar/gkl883
Taniguchi N, Honke K, Fukuda M: Handbook of Glycosyltransferases and Related Genes. 2003, Springer Verlag, Tokyo,
Nakahara T, Hashimoto R, Nakagawa H, Monde K, Miura N, Nishimura SI: Glycoconjugate Data Bank:Structures -an annotated glycan structure database and N-glycan primary structure verification service. Nucleic Acid Res. 2008, 36: D368-D371. 10.1093/nar/gkm833
This work was supported in part by the Program of Founding Research Centers for Emerging and Reemerging Infectious Diseases and the National Project on Functional Glycoconjugates Research for New Industry, MEXT Japan and a grant for a "Development of System and Technology for Advanced Measurement an Analysis (SENTAN)" from Japan Science and Technology Agency (JST). This study was also supported in part by Grants-in-Aid for Regional R&D Proposal-Based Program from Northern Advancement Center for Science & Technology of Hokkaido Japan. The authors also thank Ms. Chikage Chikaoka and Dr. Yasuko Tanaka for their dedicated help in the curation of the data. The authors also thank Mr. Daisuke Murayama for the technical support in the implementation and the correction of the database. NM especially thanks to Ms. Kana Tosho for her dedicated help in preparation of the manuscript.
RH designed and constructed the database. NM drafted the manuscript. KH, TS, and NF participated in data curation. SN supervised the whole project. All authors read and approved the final manuscript.