Pathway database
The focus of data management in Pathway is the experiment, identified by an experiment ID (identifier) and its associated metadata (including LIPID MAPS laboratory, the broad category of lipid studied by the lab, biological system, type of experiment, and experiment date). An experiment contains all of the measurements and compounds or gene symbols for treatments performed by a particular laboratory on a particular date in the form of sets of time course data, or datasets. A dataset consists of a string representing a treatment type and a series of time points (0, 0.5, 1, etc). The dataset contains one or more measurements at each time point. Time units are considered as part of the enveloping experiment, while measurement units are contained within the dataset.
The Pathway database also contains references to small molecules as identified by LIPID MAPS structure database identifiers [20, 21] and to KEGG small molecule compound IDs, to KEGG protein compound IDs, and to gene symbols referenced by Entrez Gene [22]. Microarray and protein array measurements within the database are associated with both a protein compound ID and a gene symbol. That is, the Pathway Editor treats array data as belonging to proteins. Nucleic acid annotations are not currently found in Pathway.
When experiments are loaded into the Pathway database, each dataset becomes associated with a corresponding small molecule ID, protein ID, or combined protein identifier plus gene symbol identifier. The dataset of a compound thus becomes linked to the characteristics of the compound, such as small molecule category, molecular weight, synonyms, comments, and the external database identifiers described in the preceding paragraph. Further details on experiments and datasets may be found in the on-line tutorial [16].
The Pathway database contains KEGG reference metabolic pathways for lipid metabolism. The components of the pathways are parsed from KEGG XML files originally written in the KEGG Markup Language (KGML), and include pathway name and compound or protein name assigned to pathway nodes, node connectivity and stoichiometry, their respective map coordinates, and KEGG database IDs for KEGG LIGAND COMPOUND and KEGG COMPOUND ENZYME [11]. If the KEGG compound has been identified as equivalent to a structure contained in the LIPID MAPS structure database [20], the Pathway database characterization of the node in the KEGG pathway also contains the Pathway small molecule or protein ID described in the previous paragraph, allowing access to all annotations available for that molecular entity.
Visualization and data management
A view of the Pathway Editor showing a sample LIPID MAPS pathway (the Arachidonic acid pathway) is shown in Figure 1. A drawing area in the lower portion of the window frame comprises most of the display. Drawing is performed using JOGL (Java Open Graphics Language) [23] to simulate a three dimensional display. The display of time course data for every node, for all nodes of a particular type, or for individual nodes may be configured as either heatmap or line chart. Heatmaps permit more efficient usage of screen space than line charts. Above the drawing area is a toolbar containing buttons that may be clicked to provide many commonly-used functions (i.e., node select, node create, and node connect modes, zoom in, zoom out, a zoom display area, fit to screen, tilt view plane up, tilt view plane down, and show/hide data display). The user may move scroll horizontally or vertically across a plane in 3D space perpendicular to the user's line of sight using scroll bars. Above the toolbar, and topmost in the frame, is a menu containing common program interface menus, as well as specialized submenus.
View preferences, such as node/interaction color, shape, font size, and others may be configured by a preference assignment dialog accessed from the popup menu, or by global preference assignment dialogs accessed on the View menu.
The program also uses the JFreeChart code package [24] to generate high-quality charts in separate frames for close inspection of data.
Downloading pathways
The user may initiate a session with the Pathway Editor in a number of ways. Perhaps the most powerful mode may be entered by downloading a pathway. A pathway file in the Biopathways Workbench format (ending with the extension .path) may be downloaded from the LIPID MAPS website server [16] by means of a dialog presented via the File menu (Figure 2). .path files are built using the Pathway Editor and contain a listing of the nodes and processes of a pathway, along with their geometrical layout and Pathway database identifiers of compounds referenced by each node. The files may also contain experiment IDs, if a pathway designer desires. The presence of an experiment ID directs the Editor to download all datasets for the experiment from the Pathway database. If the experiment happens to be a microarray or protein array experiment, only the time course data for specific gene symbols that are referenced in the file are downloaded, because of the large number of datasets in these kinds of experiments. The Editor traverses each participant in the pathway, and determines whether a Pathway database identifier for the participant compound is present in the list of compounds in each experiment. If data is found, datasets are copied from the experiment to the participant object that is represented by the node. The data is displayed in the drawing area along with the pathway (Figure 2). The default display format is heat map; this may be changed to line chart as desired for each node, or for all nodes of any type. .path files on the user's machine may also be accessed.
KEGG pathway files may be downloaded directly from Pathway, again using a dialog available through the File menu (Figure 3). If the database representation of the KEGG pathway is cross-referenced to Pathway compound identifiers, experiment data is assigned by the Pathway Editor in a manner similar to LIPID MAPS .path files. However, references to experiments are not contained within KEGG pathway files, and experiment data must be accessed in separate steps (see the following discussion).
Nodes on the drawing surface of the Pathway Editor are of four types, following KEGG terminology: small molecule, protein, nucleic acid, or unknown (the default type). Each node (i.e., its internal representation as a participant) may contain a referenced compound ID. Protein nodes may also contain a list of one or more gene symbols. Processes, or interactions that connect nodes, are of type metabolic, signaling, or unknown (the default type). These compound IDs, symbols, and types may be assigned using dialogs accessible via mouse buttons and top menu items.
The user assigns compounds (i.e., small molecules or proteins) to nodes by accessing a database search dialog and searching the Pathway database. The search may utilize compound and synonym names or fragments of names. Once assigned, the Pathway database compound ID and compound metadata becomes associated with the node. Compound metadata is updated automatically from the database whenever a .path file is downloaded or opened.
Nodes are created by combined use of toolbar buttons and mouse actions. To create a node in any part of the drawing area, the Node create button in the toolbar is pressed and the drawing area is clicked. To connect nodes, the Node connect button is pressed, a node is selected, and, keeping the mouse button pressed, the mouse is moved to a second node, and the mouse is released. The resulting interaction, or process, may be assigned information using the mouse and pop-up dialogs, in a manner similar to nodes.
Within the Pathway Editor, experiments are managed independently of pathway components. When an experiment is presented to the Pathway Editor, the Pathway Editor automatically associates datasets with nodes containing the compound ID or gene symbol for the dataset. Experiments may be downloaded from the Pathway database separately from pathways (Figure 4), or may be accessed from the local file system. In the latter instance, the files may be in LIPID MAPS data file format (containing one or more experiments that have been previously downloaded in the Pathway Editor and saved by the user), or the files may be in comma-separated value (CSV) format and constructed by the user by way of another program, such as a spreadsheet application [16]. The CSV format does not contain database identifiers (experiment or compound). Such numerical identifiers are created and assigned by the Pathway Editor as necessary. In the case of compounds (node type protein or small molecule), identifiers are assigned on the basis of 1:1 correspondence between names in the CSV file and node labels in the pathway. Microarray or protein array data in CSV files is identified by a node type designation of protein, in combination with a gene symbol.
Assigning a compound to a node
During the process of constructing a pathway, a user constructs a node of the desired type, and assigns a compound to a node by searching Pathway for the name of a compound or gene symbol, using a compound information dialog for the node type. Once found and selected, the Pathway database ID for the compound and/or gene symbol then becomes incorporated into the participant object that is contained in the local pathway. If experiment data is loaded in the program, the program then automatically traverses the experiment data contained within the internal pathway object and assigns datasets containing the same database compound ID or gene symbol to the participant contained within the node for display.
A user accepts placement of the compound in the node on the basis of whether the measured data that is presented meets with expectations according to domain knowledge, including early or late responsiveness to a stimulus, and the magnitude of the response. Measured absolute values and ratios may be inspected, and the presentation changed, by right-clicking on an interaction glyph and selecting from a pop-up menu with the mouse, or alternatively, by accessing menu items on the Tools menu for the window. For microarray experiments, the presented data includes statistical p-values of the significance of biological replicates (experiments performed on different dates), technical replicates (multiple within-experiment replicates), or treatment significance (i.e., Kdo2-lipid A vs. control treatments) on the Select tab of the Node information dialog (Figure 5).
A menu item in the View menu enables animated bar charts for visualization of time-dependent changes for each node in the display. This allows dynamic comparisons of the magnitude and direction (up or down) of changes to compounds in the system under study.
The user is further aided in the process of compound assignment by having the ability to change the visibility of entire experiments, of datasets for individual compounds, and of datasets for individual gene symbols in the display, as desired. The user may choose to reject assignment of a compound to a node because of poor reproducibility of measurement, or may reject an individual experiment because of inconsistency within the experimental data, thus accepting the compound.
Further information on the compound may be obtained by double-clicking a table row containing an identifier in the databases tab of the compound's information dialog, as in Figure 6. The action causes the user's preferred browser program to open a web page presenting database information from LIPID MAPS, KEGG, or Entrez Gene for that identifier.
SBML
The Pathway Editor reads and writes Systems Biology Markup Language (SBML) models through level 2, version 3 [17, 18, 25]. SBML species and reactions can be assigned to pathway nodes and interactions. When saving an SBML model, the LIPID MAPS pathway layout is contained within the model as an annotation. This layout is automatically utilized when the file is subsequently read. If a pathway layout is not found, the user may select an automated layout feature patterned after that for JDesigner [26], or the user may manually select nodes from a list and paste them on the drawing surface one at a time. Interactions are automatically drawn, utilizing lists of SBML species and reactions contained within the model.
The SBML plugin in Pathway Editor is designed to be fully compatible with the SBML specification [18]. Mathematical expressions relating SBML species, parameters, and compartments can be written and viewed. Of particular interest is the ability to enter and read MIRIAM annotations (Minimal Information Required for the Annotation of Mathematical Models) and SBO (Systems Biology Ontology) references [27–29] in user-selectable table and list format. Upon double-clicking on a row in the table, the user is presented with the database web page for the compound in the user's browser, in a manner similar to database references presented for .path files, above.