ExprEssence - Revealing the essence of differential experimental data in the context of an interaction/regulation net-work
© Warsow et al; licensee BioMed Central Ltd. 2010
Received: 5 July 2010
Accepted: 30 November 2010
Published: 30 November 2010
Experimentalists are overwhelmed by high-throughput data and there is an urgent need to condense information into simple hypotheses. For example, large amounts of microarray and deep sequencing data are becoming available, describing a variety of experimental conditions such as gene knockout and knockdown, the effect of interventions, and the differences between tissues and cell lines.
To address this challenge, we developed a method, implemented as a Cytoscape plugin called ExprEssence. As input we take a network of interaction, stimulation and/or inhibition links between genes/proteins, and differential data, such as gene expression data, tracking an intervention or development in time. We condense the network, highlighting those links across which the largest changes can be observed. Highlighting is based on a simple formula inspired by the law of mass action. We can interactively modify the threshold for highlighting and instantaneously visualize results. We applied ExprEssence to three scenarios describing kidney podocyte biology, pluripotency and ageing: 1) We identify putative processes involved in podocyte (de-)differentiation and validate one prediction experimentally. 2) We predict and validate the expression level of a transcription factor involved in pluripotency. 3) Finally, we generate plausible hypotheses on the role of apoptosis, cell cycle deregulation and DNA repair in ageing data obtained from the hippocampus.
Reducing the size of gene/protein networks to the few links affected by large changes allows to screen for putative mechanistic relationships among the genes/proteins that are involved in adaptation to different experimental conditions, yielding important hypotheses, insights and suggestions for new experiments. We note that we do not focus on the identification of 'active subnetworks'. Instead we focus on the identification of single links (which may or may not form subnetworks), and these single links are much easier to validate experimentally than submodules. ExprEssence is available at http://sourceforge.net/projects/expressence/.
The pace of data generation in the life sciences is steadily increasing. Primary data sets grow in depth and accuracy, covering more and more aspects of life. In molecular biology and biomedicine, these include large-scale measurements of DNA/Histone acetylation, transcriptional activity, gene expression and protein abundance (e.g. ). Measuring epigenetic patterns (DNA methylation, DNA/Histone acetylation) on a large scale has become possible only recently [1, 2]. Measuring transcription is entering a new era with the introduction of deep (or next-generation, RNA-seq) sequencing [3, 4]. Proteomics is becoming possible at unprecedented depth, covering ever-larger parts of the proteome on a routine basis . For these primary data, repositories such as the Gene Expression Omnibus database (GEO ) or ArrayExpress  are constantly expanding.
Often, measurements are differential: they are made for two or more conditions (such as gene knockdown or knockout ), for two or more time points (such as time series tracking the consequences of some experimental intervention, ), or for two or more species (such as mouse and human, ). Exploiting differential measurements is one key to cope with the flood of data, by focusing on the most pronounced differences.
Life scientists also have to handle a deluge of secondary data, in the form of papers, reviews and curated databases. These may be integrated by automated systems such as STRING , or by manual efforts [12–14]. Exploiting secondary data provides another key to cope with the flood of primary data, by putting them into context and focusing on the most pronounced confirmations and contradictions to what is known already.
In this paper, we propose to interpret differential data in the context of knowledge, yielding the 'essence' of an experiment. Differential data may be provided by two microarrays, and knowledge may be provided by a network describing gene/protein interaction and regulation. In this case, data tracking gene expression in the course of an experiment can be used to identify the most pronounced putative mechanisms. They are identified as those known links between genes/proteins along which expression changes indicate that there may have been some regulatory change, such as the startup or shutdown of an interaction, a stimulation or an inhibition. ExprEssence highlights these links, and it enables the user to filter out all links with no or negligible change. The higher the filter threshold on the amount of change to be displayed, the fewer links are shown, making it straightforward to examine the 'essence' of the experiment. Network condensations are illustrated by pairs of figures (original network - condensed network) in the section on Case Studies. The condensed network contains good candidates for interpreting the experiment in mechanistic terms, giving rise to the design of new experiments. However, all inferences are hypotheses derived from correlations in the experimental data in the context of the a priori knowledge encoded in the network, and it must be kept in mind that correlative data do not necessarily entail mechanistic causality. Moreover, the validity of the hypotheses generated by our method will depend on the coverage and correctness of the network, and on the accuracy of the experimental data.
Starting with the pioneering work of Ideker et al. , there is a plethora of methods that combine network data with high-throughput data (such as microarrays), in order to highlight pathways or subnetworks, see the excellent recent reviews of Minguez & Dopazo , Wu et al.  and Yu & Li . Notably, few of these methods are readily available as publicly accessible software packages, plugins or web services (see Table and in ). Also, there does not seem to be a gold standard that can be used for validation purposes (see, e.g., Tarca et al.  for a recent discussion). Some methods lack validation except for the example for which they were developed for, while others are studied for an array of specific examples. In these cases, strong enrichment in plausible Gene Ontology categories or detection of known pathways or annotations is often used to demonstrate utility, as in [19–25]. We found two articles including a comparison of different subnetwork identification methods. The first one by Parkkinen and Kaski  introduces variants of the Interaction Component Model (ICM) method, comparing them to the original ICM method, to a method based on hidden modular random fields (HMoF)  and to Matisse , using identification of Gene Ontology classes and coverage of protein complexes for two selected data sets (osmotic shock response and DNA damage data) to judge one method over the other. An evaluation of ClustEx , jActiveModules , GXNA  and a simple approach based on fold change can be found in , taking identification of gene sets, pathways and microarray targets known from the literature and from the Gene Ontology for comparison.
In general, it is exceedingly difficult to validate the detection of (sub-) networks or (sub-) pathways: these are complex entities, and ultimate experimental validation is impossible because of this complexity: experimentalists are usually limited to investigating only few components in isolation at any given time. Nevertheless, we will compare results of our method with results obtained by jActiveModules, in a separate section following the case studies. In contrast, by just highlighting single links in networks, we tackle a more primitive task, but in this case results can be validated directly by experiment, or by identifying corroborative statements in the literature. In particular, as can be seen from our case studies, the single links that we highlight give rise to predictions about single genes and about single one-step mechanisms that can be investigated in isolation. Therefore, we would like to emphasize the direct utility of our focus on single links and genes, complementing the (sub-)network centric view that is usually employed; to the best of our knowledge, the 'single link and gene' focus is not employed by other methods combining network and high-throughput ('omics') data. In fact, we propose a 'winning combination' of 'network'/'omics' and 'classical' biology, using networks and high-throughput data to highlight single genes and links that may then be validated directly by classical molecular biology, as will be demonstrated in our case studies.
As future work, our formula for link highlighting can, however, be integrated into current methods for pathway/subnetwork detection, possibly improving these considerably. In particular, no such method treats inhibitions and stimulations in a distinct way, as we do. In particular, we envision that the edge score formula of Guo et al. , which is based on measuring co-variance, may be replaced by our formula (see below), emphasizing a different aspect of differential gene expression: While Guo et al. identify coordinated changes using their formula, integration of our formula into their framework would identify subnetworks with changes that are consistent with an input network of interactions, stimulations and inhibitions. In any case, we wish to stress that for the identification of coordinated changes, correlation coefficients are most suitable. Our approach, however, identifies a different biological message, namely startups/shutdowns of interactions, stimulations and inhibitions, using an input network that is informative about biological relationships such as stimulations and inhibitions.
ExprEssence is implemented in Java Standard Edition 6. It is a plugin for Cytoscape , an easy-to-install tool for biological network analysis and visualization. Cytoscape is an open source software project and provides basic features such as network layout and modification. Cytoscape can be enhanced for analysis purposes by straightforward installation of plugins.
ExprEssence analyses are based on a network of genes and/or proteins, in a format readable by Cytoscape, such as cys, sif, xgmml or gpml. It may be imported from databases using web services such as the Pathway Commons Web Service Client or the WikiPathways Web Service Client [31, 32] as a 'simplified binary model' (see Fig. Five in ) or it may be downloaded directly from the web. Usually, it reflects expert-curated interaction/regulation data concerning a particular signaling pathway or molecular phenomenon.
Each link (edge) must be typed to represent either an interaction, stimulation or inhibition. It is possible that all links represent physical interactions, as is the case in a pure protein-protein interaction network. Stimulations and inhibitions are directional, whereas interactions can be interpreted to be un-directional as well as bi-directional.
For each gene (node) at least two numerical values must be given on which a meaningful comparison can be based. For example, these may be expression values, derived from measurements in two experiments E 1 and E 2.
By default, for better data interchangeability, ExprEssence recognizes Systems Biology Ontology terms , also included in the activity flow language of the Systems Biology Graphical Notation (SBGN, ), for the specification of interaction types. Thus, each link (edge) must include an attribute called Interactiontype, whose values can be either stimulation (corresponding to SBO:0000170), inhibition (SBO:0000169) or interaction (SBO:0000231). In the networks discussed in this article, a single node is used for a gene and its protein product, and the exact nature of the links (edges) denoting stimulations, inhibitions and interactions depends on the evidence underlying the link. For example, a stimulation may be due to the modification of one protein by another, but it may also be the transcriptional stimulation of a target gene by a transcription factor.
The differential measurement data used for comparison may be integrated into the network as described in the Cytoscape manual . Usually, integration is accomplished by mapping unique gene/protein identifiers in the data to unique gene/protein identifiers in the network. The measurements may be gene expression values, but they may also denote protein abundance, methylation levels, etc.
If the numerical data result from multiple measurements (replicates), the number of replicates has to be declared for each experiment, and for each experiment and for each node (gene/protein), the mean value and its corresponding variance have to be given. More specifically, for two experiments E1 and E2 to be compared, node A has either two or four numerical values: If the data consist of a single measurement, for node A these are the two values , . If replicates are analyzed, the two values , are the mean values and the two variances , are also provided. The number of replicates are n1 and n2. ExprEssence analyses based on replicated measurements, where mean values and variances are used as input, are more reliable than analyses based on single measurements. Specifically, as the variances are used for calculations, feature variation within and between groups is considered and evaluated appropriately. However, also comparisons based on single measurements can be used to suggest underlying mechanisms.
Identifying change in a network, motivation
For each link in the network we want to measure the amount of change between experiments E1 and E2, where 'change' is a modification in the intensity with which one gene/protein may be influencing another gene/protein; depending on the input data, such influence may be direct physical interaction (in the case of proteins), transcriptional stimulation or inhibition. Therefore, for all links connecting two genes/proteins A and B in the network under consideration, ExprEssence uses the measurements , and , for the two experiments E1 and E2 to calculate a link score proportional to the amount of change from E1 to E2. The formulae are given in the next section. The sign of the score corresponds to the direction of change giving a positive score for startups and a negative score for shutdowns. The magnitude of this signed change corresponds to the absolute value of the score. Links with a link score whose absolute value does not exceed a user-defined threshold are deleted from the network. Hence, only those links are kept, where changes (startups or shutdowns) are pronounced.
Following the heatmap metaphor, large measurement values for genes are indicated by red color and small values are indicated by green color. Similarly, links with a positive value of the link score are colored in red and indicate startups. Links with a negative value are colored in green and indicate shutdowns.
Note that stimulations are treated in a symmetrical way: S → T is treated the same way as T → S. Indeed, we do not and cannot distinguish S → T and T → S, because in both cases we expect increments in S to be correlated with increments in T : Higher amounts of the stimulator go hand in hand with higher amounts of the target. A similar argument holds for decrements. Motivated by this argument, interaction links (S ↔ T ) are treated in the same way as stimulation links. This makes sense in general, because the amount of A and B interacting with each other increases in proportion to the amount of both interactors. More generally, if the interaction represents a biochemical reaction, a straightforward interpretation of our reasoning is given by the law of mass action, see the next section 'Calculation of the amount of change'.
Calculation of the amount of change
where , : Mean value of gene/protein A under experimental condition E1, E2;
, : Variance of values of gene A under experimental condition E1, E2;
n1, n2: Number of replicates done in experiment E1, E2.
Taking the difference and not reflects the motivation to denote startups of interactions by a positive score and shutdowns by a negative score.
In the specific case of a physical interaction between two proteins, and log-transformed data, the formula above corresponds to the law of mass action, as follows. The 'activity' of a physical interaction of protein A with protein B can be expressed by the product of the abundances of both, assuming that the expression values correspond to the 'amount' of protein. The 'amount' of the complex AB in experiment 1 can then be compared to the 'amount' of the complex AB in experiment 2, by taking the ratio. Large changes in this ratio indicate that there will be much more or much less of the protein complex, comparing experiment 1 with experiment 2. (Note that we do neither calculate equilibrium constants nor reaction kinetics.) As we have two experimental conditions and are interested in the change from E 1 to E 2, startup of 'activity' is thus proportional to the ratio of the products of the abundances of A and B, taking experiment E 2 over experiment E 1: . In case of log-transformed values, this is the difference of the sums of the measurement values under both conditions: . This can be written as and corresponds to D A + D B from formula (3). Hence, our formula for the link score of interaction links can be connected directly to the law of mass action.
- 2.As explained above, we can treat the stimulation of a gene/protein A by a gene/protein B in the same way as an interaction of the two proteins with each other and therefore use the same formula to determine the link score:(4)
- 3.Formula (4) can be modified to capture inhibitions A → B (A inhibits B), where A and B are expected to be anticorrelated in their expression/amount:(5)
This equation honors the case where higher amounts of the inhibitor A go hand in hand with lower amounts of the target B and vice versa, whereas correlated changes are penalized (see Figure 2(c) and 2(d)).
Our formulae deliver justified hypotheses also in the cases that are not as straightforward as the cases in Figure 1(a)-(b)/Figure 2(a)-(d), given two additional assumptions, that we call the source principle and the target principle. It is important to note that these complicated cases are characterized by relatively low link scores and additionally they will be marked by wavy lines. Furthermore, they can be identified by inspecting the color-coded measurement values (Figure 2(e)-(k)), which can be made explicit by addition of gene/node labels as in the condensed networks of case studies 2 and 3.
The source principle maintains that changes in the source, if they are large enough, are sufficient for a hypothesis regarding startup/shutdown of a stimulation/inhibition. Even if the value of the target is inconsistent, putting trust into the network data (that is, the stimulation/inhibition link is not questioned), the link then describes a startup/shutdown which is assumed to act on the target, even though it is counteracted by other effectors (Figure 2(e),(f),(i)). The other effectors may or may not be included in the network: we assume that the network is correct, but not necessarily complete. In case of transcriptional stimulations/inhibitions, a simple example for counteracting effectors are transcription factors that act in an opposite way at a different position of the regulatory region of the target gene. Here, we view gene regulation as a 'transcription factor battlefield' [38, 39]. In fact, the target gene may not be observable (expressed) at all without the stimulation that is highlighted. There is alternative interpretation for an inconsistent target value: The stimulation may not be in the scope of what is being measured. For example, if the values refer to expression levels, a stimulation of the target by phosphorylation goes undetected.
The target principle holds that large changes in the target are sucient for a hypothesis regarding startup/shutdown of a stimulation/inhibition, even if the value of the source (the stimulator/inhibitor) is inconsistent. Again trusting the network data, the link then describes a startup/shutdown that is becoming relevant because other effectors are now cooperating on the target (Figure 2(g),(h),(k)). Then, strictly speaking, in all these three cases we hypothesize that it is not the stimulation itself that goes up, but its effect on the target gene. Again, we view gene regulation as a 'transcription factor battlefield'. Also, the other effectors may or may not be part of the network. Of course, the inconsistent change in the source has to be lower than the tale-telling change in the target. Also, the startup of the stimulating effect is assumed to require only a low amount of the stimulator, which is however still exceeded. There is an alternative interpretation for an inconsistent source value: The stimulating effect may simply be delayed in case of a time series, where the stimulator (protein) needs time to accumulate, which may also happen during a period of constant or down-regulated gene expression of the stimulator.
To distinguish straightforward from inconsistent cases by inspection, and to aid the interpretation of links, our plugin offers multi-colored nodes, inlineing directly the measurement values of a gene for a pair of experiments within a single node as a pie-chart as explained in Figure 1 and inlineed in Figure 2. To calculate the color for visualization of the values in the pie-chart, we take the 10%, 50% and 90% quantiles of the ordered list of all attribute values. The value associated with the 10% quantile defines the lower threshold. All values below this threshold are visualized by green color of same intensity. Values above this threshold and up to the value corresponding to the 50% quantile get a color defined by linear interpolation between the 10% quantile (green color) and the 50% quantile (white). Analogously, values are visualized by a color between white (50% quantile) and red (90% quantile). Values above the 90% quantile are represented by red color of same intensity. The thresholds and the coloring scheme can be redefined by the user. Furthermore, our plugin provides labeling of selected genes/nodes with the measurement data used for node coloring as shown in the condensed networks of case studies 2 and 3.
We will use this link score to identify those links along which there is a large change between E1 and E2. Links with a link score exceeding a user-defined threshold are colored in red or green; the other links are deleted from the network.
Condensation of networks
After importing the network and measurement data into Cytoscape, the ExprEssence dialog window is used to define which data shall be taken for calculation of the link score and hence for network condensation. As discussed above, the network must include at least two numerical attributes for each gene/protein, so that the formulae can be employed. These two attributes are explicitly selected by the user, indicating their order (E1 versus E2, or E2 versus E1). After selecting two attributes, the user may then indicate that there is variance data available and specify the number of replicates. In this case, the measured values are implicitly assumed to be the mean values for which the variances are provided. Finally, calculations are started and results are inlineed in a new network window in Cytoscape. Links with a positive change (startups) are rendered in red, and negative change (shutdown) is rendered in green. Color saturation and link thickness are directly linked to the link score calculated.
We present results of the application of ExprEssence in three case studies.
We will describe three application scenarios, condensing networks and describing the insights gained from these. As a first example, we condense a network based on literature-curated interaction data of proteins involved in structure and function of the podocyte, which is the cell forming the kidney filtration barrier. The second example will describe how a hand-curated network of interaction and regulation of genes maintaining the pluripotent state of stem cells can be condensed using microarray data tracking an early transition process of embryonic stem cells, yielding a mechanistic hypothesis that was then confirmed experimentally. In a third application, we will take a biological network describing ageing-related processes from the WikiPathways database, integrate publicly available microarray data, and confirm some basic insights into ageing. Cytoscape session files PodocyteCellMatrix.cys, Epiblast.cys and DNA_Damage.cys are provided as Additional Files 1, 2 and 3, and they enable reproduction of figures following the instructions given there.
Case Study 1 - Interaction network of podocyte cell-matrix proteins
Podocytes cover the outer aspect of the capillaries in the kidney glomerulus, where the ultra filtration of blood takes place. The filtration barrier is composed of endothelial cells, the glomerular basement membrane (GBM) and podocytes. The proper function of podocytes is essential for the ultrafiltration process. Podocytes synthesize the majority of extracellular matrix molecules that are present in the GBM. The podocyte-GBM interface is crucial for mechanical anchorage and inside-out as well as outside-in signaling. Damage or loss of podocytes is estimated to be responsible for about 90% of kidney diseases in humans . To date several hereditary kidney diseases are known that are caused by mutations in genes involved in the podocyte-GBM interface, e.g. Alport syndrome. Thus, the podocyte-GBM interface is of central importance in kidney biology and pathology.
We constructed a protein interaction network of the podocyte-GBM interface based on expert knowledge.
Podocyte cell lines are a frequently employed tool to study podocyte biology. However, it is well known that podocyte cell lines are partially dedifferentiated as compared to in vivo podocytes. To extract the main differences between the podocyte-GBM interface of in vivo vs. cultured podocytes, we mapped microarray gene expression data of in vivo and cultured mouse podocytes onto the extended network shown in Figure 4. We used publicly available microarray data (GSE10017, ) generated from a podocyte cell line and from in vivo podocytes, which were isolated as podocalyxin-positive cells in a cell suspension of enzymatically digested mouse glomeruli. By condensing a protein interaction network using gene expression data, we implicitly assume that protein abundance is correlated to gene expression. We log-transformed and quantile-normalized these data.
Pinch and parvin participate in integrin signaling via integrin-linked kinase. This pathway is essential for podocyte function, since mice with podocyte-specific knockout of integrin-linked kinase die from renal failure at the age of 16 weeks . The pinch/parvin interaction is shut down in cultured podocytes (see Figure 5), making it a candidate key interaction reflecting podocyte dedifferentiation in cell culture. In the healthy kidney, pinch and parvin may have an important role in transmitting signals from the extracellular matrix through integrin-linked kinase, to maintain podocytes in a differentiated state .
Neuropilin and its interaction with the guidance molecule semaphorin have been implicated in podocyte differentiation [44, 45]. The interaction of neuropilin with several proteins, including semaphorin, is greatly diminished in cultured podocytes (see Figure 5). ExprEssence uncovers that loss of neuropilin interaction with extracellular molecules also participates in the dedifferentiation of podocytes in culture as suggested by the in vivo findings .
Massive up-regulation in cultured (= dedifferentiating) podocytes of the interaction between fibronectin 1 and the membrane protein Mag, suggest an important and hitherto unknown function of Mag in the regulation of podocyte differentiation through the podocyte-GBM interface. Indeed, we could confirm podocyte expression of myelin-associated glycoprotein (Mag) (Figure 6), which has so far not been implicated in podocyte biology. Since myelin proteins are known to be expressed only in glial cells of the nervous system, it is also notable that knockout of myelin protein zero, another myelin protein preferentially expressed in podocytes within the glomerulus, has been shown to result in proteinuria .
Case Study 2 - Analysis of a pluripotency-related experiment
As Klf4 and Nanog are known to be stimulated by Esrrb [55, 56], these stimulations are also shut down (Figure 8target principle). Finally, interactions between the transcription factors Stat3, Hdac1, c-Myc & Nanog and Trim28 (also known as TIF1β, a transcription co-regulator (co-repressor) and chromatin modifier [57, 58]) are started. These startups are highlighted because the Trim28 expression value goes up strongly, from 7041 to 9124. The role of these startups is unknown, though they may reflect the general repression of components of the ES cell-specific self-renewal network by Trim28.
Case Study 3 - Analysis of ageing-related experiments
Subnetwork identification by jActiveModules for Case Studies 1-3
To put the results obtained in case studies 1-3 into the context of related work, we used jActiveModules to analyze the same data, identifying 'active modules', that are subnetworks where the constituent genes show significant changes in expression over the two conditions we investigate. As discussed in the section on 'Related Work', the aim of ExprEssence is quite different, namely the identification of single links (interactions, stimulations, inhibitions) and genes affected in the course of an experiment, where the links do not necessarily have to build up a connected subnetwork. Furthermore, ExprEssence exploits the knowledge about stimulations and inhibitions that may be encoded in the network.
We used jActiveModules with default parameters. In contrast to ExprEssence, which takes two expression values per gene (one for each experimental condition), jActiveModules requires one p-value per gene (describing the statistical significance of the expression change between the two experimental conditions; p-values were used as calculated while processing the raw expression data for the case studies).
Overall, we observe an overlap of results between our tool and jActiveModules. In all case studies, jActiveModules did not identify many of the links/effects on genes that we discovered and validated. However, it identified interesting subnetworks (around Nrp1; Klf4-Arid3a; around TP53) that are plausible and worth investigating. Most importantly, however, ExprEssence can distinguish stimulations and inhibitions, and by marking links in thick green or red color, we enable a more informed focus on single links and genes, directly yielding suggestions for experiments that may test the hypotheses we generate.
The most important limitation of our approach is that highlighting is neither necessary nor sufficient for detecting mechanistic change. More specifically, it is quite possible that no change (no startup or shutdown of an interaction, stimulation or inhibition) happens across a highlighted link, or that change happens across a link that is not highlighted. The main reason for this problem is missing accuracy (in terms of sensitivity, i.e. false negatives, and specificity, i.e. false positives) of both network and measured data. In particular, many networks are seriously incomplete, so that we cannot highlight the 'essential' mechanisms simply because there are no links in the network that represent them. For example, the main mechanism may be mediated by a regulatory RNA, which may be neither represented in the network, nor in the expression data gained by microarray experiments. Then, we simply cannot discover it, and the mechanisms that are highlighted will be either minor, or simply false positive. To give another example, imagine that the network data do not cover a gene C that acts on both A and B, but it includes the link A → B. Then, the link may be highlighted even though C is acting on both A and B, and nothing more. Suce it to say, hypotheses generated with the help of ExprEssence have to be validated experimentally. On the other hand, in a signaling cascade, the mode of change (information flow) may be via phosphorylation events that cannot be measured by expression data. Then, A may stimulate B via the link A → B, but no change is detectable in the differential expression data, and no highlighting occurs.
With our approach towards identification of the critical parts of a gene/protein network using differential data, we offer a means to easily become aware of changes in gene/protein relationships that can be observed by contrasting two experimental conditions. We do not only consider physical interactions between proteins but are able to take into account stimulations or inhibitions and treat them accordingly in order to get specific insights into regulatory aspects. ExprEssence identifies startup/shutdown along all three different link types (interaction, stimulation, inhibition) in a coherent manner. Our method does not depend on a specific type of network or experimental data as long as edges in the network connect entities influencing each other and the experimental data can be interpreted as measurements proportional to the abundance of the entities.
The statistical basis for comparison of link scores of different edges depends on the input data: if no replicates are available, the plugin works without any measurement of variability, and allows exploration of the dataset. If replicates are given, the plugin uses Welch's formula to improve comparability of link scores by considering the variability of the measurements.
Despite its limitations, we developed a simple, straightforward and easy-to-use tool for hypothesis building, towards a mechanistic interpretation of experiments, seeing the forest for the trees in a large amount of data.
PodocyteCellMatrix.cys, Epiblast.cys, DNA_Damage.cys. Cytoscape Session files containing the original network, expression data and condensed network from case studies 1-3.
Funding by the DFG SPP 1356, Pluripotency and Cellular Reprogramming (FU583/2-1), by the BMBF (01GN0901 & 01GN0805 Generation of pluri- and multipotent stem cells) and by European Foundation for the Study of Diabetes (EFSD)/Novo Nordisk is gratefully acknowledged.
- Lu R, Markowetz F, Unwin RD, Leek JT, Airoldi EM, MacArthur BD, Lachmann A, Rozov R, Ma'ayan A, Boyer LA, Troyanskaya OG, Whetton AD, Lemischka IR: Systems-level dynamic analyses of fate change in murine embryonic stem cells. Nature. 2009, 462 (7271): 358-362. 10.1038/nature08575PubMed CentralView ArticlePubMedGoogle Scholar
- Laurent L, Wong E, Li G, Huynh T, Tsirigos A, Ong CT, Low HM, Sung KWK, Rigoutsos I, Loring J, Wei CL: Dynamic changes in the human methylome during differentiation. Genome Res. 2010, 20 (3): 320-331. 10.1101/gr.101907.109PubMed CentralView ArticlePubMedGoogle Scholar
- Mortazavi A, Williams BA, McCue K, Schaeer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008, 5 (7): 621-628. 10.1038/nmeth.1226View ArticlePubMedGoogle Scholar
- Sultan M, Schulz MH, Richard H, Magen A, Klingenho A, Scherf M, Seifert M, Borodina T, Soldatov A, Parkhomchuk D, Schmidt D, O'Keeffe S, Haas S, Vingron M, Lehrach H, Yaspo ML: A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science. 2008, 321 (5891): 956-960. 10.1126/science.1160342View ArticlePubMedGoogle Scholar
- Graumann J, Hubner NC, Kim JB, Ko K, Moser M, Kumar C, Cox J, Schöler H, Mann M: Stable isotope labeling by amino acids in cell culture (SILAC) and proteome quantitation of mouse embryonic stem cells to a depth of 5, 111 proteins. Mol Cell Proteomics. 2008, 7 (4): 672-683.View ArticlePubMedGoogle Scholar
- Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Muertter RN, Edgar R: NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res. 2009, D885-D890. 37 DatabaseGoogle Scholar
- Parkinson H, Kapushesky M, Kolesnikov N, Rustici G, Shojatalab M, Abeygunawardena N, Berube H, Dylag M, Emam I, Farne A, Holloway E, Lukk M, Malone J, Mani R, Pilicheva E, Rayner TF, Rezwan F, Sharma A, Williams E, Bradley XZ, Adamusiak T, Brandizi M, Burdett T, Coulson R, Krestyaninova M, Kurnosov P, Maguire E, Neogi SG, Rocca-Serra P, Sansone SA, Sklyar N, Zhao M, Sarkans U, Brazma A: ArrayExpress update-from an archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Res. 2009, D868-D872. 37 DatabaseGoogle Scholar
- Sridharan R, Tchieu J, Mason MJ, Yachechko R, Kuoy E, Horvath S, Zhou Q, Plath K: Role of the murine reprogramming factors in the induction of pluripotency. Cell. 2009, 136 (2): 364-377. 10.1016/j.cell.2009.01.001PubMed CentralView ArticlePubMedGoogle Scholar
- Schulz H, Kolde R, Adler P, Aksoy I, Anastassiadis K, Bader M, Billon N, Boeuf H, Bourillot PY, Buchholz F, Dani C, Doss MX, Forrester L, Gitton M, Henrique D, Hescheler J, Himmelbauer H, Hübner N, Karantzali E, Kretsovali A, Lubitz S, Pradier L, Rai M, Reimand J, Rolletschek A, Sachinidis A, Savatier P, Stewart F, Storm MP, Trouillas M, Vilo J, Welham MJ, Winkler J, Wobus AM, Hatzopoulos AK: The FunGenES database: a genomics resource for mouse embryonic stem cell differentiation. PLoS One. 2009, 4 (9): e6804- 10.1371/journal.pone.0006804PubMed CentralView ArticlePubMedGoogle Scholar
- Cai J, Xie D, Fan Z, Chipperfield H, Marden J, Wong WH, Zhong S: Modeling co-expression across species for complex traits: insights to the difference of human and mouse embryonic stem cells. PLoS Comput Biol. 2010, 6 (3): e1000707- 10.1371/journal.pcbi.1000707PubMed CentralView ArticlePubMedGoogle Scholar
- Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T, Julien P, Roth A, Simonovic M, Bork P, von Mering C: STRING 8-a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res. 2009, D412-D416. 37 DatabaseGoogle Scholar
- Xu H, Schaniel C, Lemischka IR, Ma'ayan A: Toward a complete in silico, multi-layered embryonic stem cell regulatory network. Wiley Interdisciplinary Reviews: Systems Biology and Medicine. 2010Google Scholar
- Macarthur BD, Ma'ayan A, Lemischka IR: Systems biology of stem cell fate and cellular reprogramming. Nat Rev Mol Cell Biol. 2009, 10 (10): 672-681.PubMed CentralPubMedGoogle Scholar
- Som A, Harder C, Greber B, Siatkowski M, Paudel Y, Warsow G, Cap C, Scholer H, Fuellen G: The PluriNetWork: An electronic representation of the network underlying pluripotency in mouse, and its applications. PLoS One. 2010Google Scholar
- Ideker T, Ozier O, Schwikowski B, Siegel AF: Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics. 2002, 18 (Suppl 1): S233-S240.View ArticlePubMedGoogle Scholar
- Minguez P, Dopazo J: Functional genomics and networks: new approaches in the extraction of complex gene modules. Expert Rev Proteomics. 2010, 7: 55-63. 10.1586/epr.09.103View ArticlePubMedGoogle Scholar
- Wu Z, Zhao X, Chen L: Identifying responsive functional modules from protein-protein interaction network. Mol Cells. 2009, 27 (3): 271-277. 10.1007/s10059-009-0035-xView ArticlePubMedGoogle Scholar
- Yu H, Li YY: Recovering context-specific gene network modules from expression data: A brief review. Frontiers of Biology in China. 2009, 4 (4): 414-418. 10.1007/s11515-009-0036-3.View ArticleGoogle Scholar
- Tarca AL, Draghici S, Khatri P, Hassan SS, Mittal P, Kim JS, Kim CJ, Kusanovic JP, Romero R: A novel signaling pathway impact analysis. Bioinformatics. 2009, 25: 75-82. 10.1093/bioinformatics/btn577PubMed CentralView ArticlePubMedGoogle Scholar
- Guo Z, Wang L, Li Y, Gong X, Yao C, Ma W, Wang D, Li Y, Zhu J, Zhang M, Yang D, Rao S, Wang J: Edge-based scoring and searching method for identifying condition-responsive protein-protein interaction sub-network. Bioinformatics. 2007, 23 (16): 2121-2128. 10.1093/bioinformatics/btm294View ArticlePubMedGoogle Scholar
- Nacu S, Critchley-Thorne R, Lee P, Holmes S: Gene expression network analysis and applications to immunology. Bioinformatics. 2007, 23 (7): 850-858. 10.1093/bioinformatics/btm019View ArticlePubMedGoogle Scholar
- Thomas R, Gohlke JM, Stopper GF, Parham FM, Portier CJ: Choosing the right path: enhancement of biologically relevant sets of genes or proteins using pathway structure. Genome Biol. 2009, 10 (4): R44- 10.1186/gb-2009-10-4-r44PubMed CentralView ArticlePubMedGoogle Scholar
- Ulitsky I, Shamir R: Identifying functional modules using expression profiles and confidence-scored protein interactions. Bioinformatics. 2009, 25 (9): 1158-1164. 10.1093/bioinformatics/btp118View ArticlePubMedGoogle Scholar
- Qiu YQ, Zhang S, Zhang XS, Chen L: Detecting disease associated modules and prioritizing active genes based on high throughput data. BMC Bioinformatics. 2010, 11: 26- 10.1186/1471-2105-11-26PubMed CentralView ArticlePubMedGoogle Scholar
- James K, Wipat A, Hallinan J: Integration of Full-Coverage Probabilistic Functional Networks with Relevance to Specific Biological Processes. DILS '09: Proceedings of the 6th International Workshop on Data Integration in the Life Sciences, Volume 5647 of Lecture Notes in Computer Science. 2009, 31-46. Berlin, Heidelberg: Springer-VerlagGoogle Scholar
- Parkkinen JA, Kaski S: Searching for functional gene modules with interaction component models. BMC Syst Biol. 2010, 4: 4- 10.1186/1752-0509-4-4PubMed CentralView ArticlePubMedGoogle Scholar
- Shiga M, Takigawa I, Mamitsuka H: Annotating gene function by combining expression data with a modular gene network. Bioinformatics. 2007, 23 (13): i468-i478. 10.1093/bioinformatics/btm173View ArticlePubMedGoogle Scholar
- Ulitsky I, Shamir R: Identification of functional modules using network topology and high-throughput data. BMC Syst Biol. 2007, 1: 8- 10.1186/1752-0509-1-8PubMed CentralView ArticlePubMedGoogle Scholar
- Gu J, Chen Y, Li S, Li Y: Identification of responsive gene modules by network-based gene clustering and extending: application to inflammation and angiogenesis. BMC Syst Biol. 2010, 4: 47- 10.1186/1752-0509-4-47PubMed CentralView ArticlePubMedGoogle Scholar
- Cline MS, Smoot M, Cerami E, Kuchinsky A, Landys N, Workman C, Christmas R, Avila-Campilo I, Creech M, Gross B, Hanspers K, Isserlin R, Kelley R, Killcoyne S, Lotia S, Maere S, Morris J, Ono K, Pavlovic V, Pico AR, Vailaya A, Wang PL, Adler A, Conklin BR, Hood L, Kuiper M, Sander C, Schmulevich I, Schwikowski B, Warner GJ, Ideker T, Bader GD: Integration of biological networks and gene expression data using Cytoscape. Nat Protoc. 2007, 2 (10): 2366-2382. 10.1038/nprot.2007.324PubMed CentralView ArticlePubMedGoogle Scholar
- Kelder T, Pico AR, Hanspers K, van Iersel MP, Evelo C, Conklin BR: Mining biological pathways using WikiPathways web services. PLoS One. 2009, 4 (7): e6447- 10.1371/journal.pone.0006447PubMed CentralView ArticlePubMedGoogle Scholar
- Cytoscape Web Service Clients Workflow. --- Either ISSN or Journal title must be supplied.. http://cytoscape.wodaklab.org/wiki/WebServiceWorkflow
- Cerami E, Demir E, Schultz N, Taylor BS, Sander C: Automated network analysis identifies core pathways in glioblastoma. PLoS One. 2010, 5 (2): e8918- 10.1371/journal.pone.0008918PubMed CentralView ArticlePubMedGoogle Scholar
- Novère NL: Model storage, exchange and integration. BMC Neurosci. 2006, 7 (Suppl 1): S11- 10.1186/1471-2202-7-S1-S11PubMed CentralView ArticlePubMedGoogle Scholar
- Novère NL, Hucka M, Mi H, Moodie S, Schreiber F, Sorokin A, Demir E, Wegner K, Aladjem MI, Wimalaratne SM, Bergman FT, Gauges R, Ghazal P, Kawaji H, Li L, Matsuoka Y, Villéger A, Boyd SE, Calzone L, Courtot M, Dogrusoz U, Freeman TC, Funahashi A, Ghosh S, Jouraku A, Kim S, Kolpakov F, Luna A, Sahle S, Schmidt E, Watterson S, Wu G, Goryanin I, Kell DB, Sander C, Sauro H, Snoep JL, Kohn K, Kitano H: The Systems Biology Graphical Notation. Nat Biotechnol. 2009, 27 (8): 735-741. 10.1038/nbt.1558View ArticlePubMedGoogle Scholar
- Cytoscape 2.7 Manual. --- Either ISSN or Journal title must be supplied.. http://www.cytoscape.org/manual/Cytoscape2_7Manual.pdf
- Welch BL: The generalisation of student's problems when several different population variances are involved. Biometrika. 1947, 34 (1-2): 28-35. 10.1093/biomet/34.1-2.28View ArticlePubMedGoogle Scholar
- Silva J, Smith A: Capturing pluripotency. Cell. 2008, 132 (4): 532-536. 10.1016/j.cell.2008.02.006PubMed CentralView ArticlePubMedGoogle Scholar
- Fuellen G: Evolution of gene regulation-on the road towards computational inferences. Brief Bioinform. 2010Google Scholar
- Wiggins RC: The spectrum of podocytopathies: a unifying view of glomerular diseases. Kidney Int. 2007, 71 (12): 1205-1214. 10.1038/sj.ki.5002222View ArticlePubMedGoogle Scholar
- Akilesh S, Huber TB, Wu H, Wang G, Hartleben B, Kopp JB, Miner JH, Roopenian DC, Unanue ER, Shaw AS: Podocytes use FcRn to clear IgG from the glomerular basement membrane. Proc Natl Acad Sci USA. 2008, 105 (3): 967-972. 10.1073/pnas.0711515105PubMed CentralView ArticlePubMedGoogle Scholar
- El-Aouni C, Herbach N, Blattner SM, Henger A, Rastaldi MP, Jarad G, Miner JH, Moeller MJ, St-Arnaud R, Dedhar S, Holzman LB, Wanke R, Kretzler M: Podocyte-specific deletion of integrin-linked kinase results in severe glomerular basement membrane alterations and progressive glomerulosclerosis. J Am Soc Nephrol. 2006, 17 (5): 1334-1344. 10.1681/ASN.2005090921View ArticlePubMedGoogle Scholar
- Yang Y, Guo L, Blattner SM, Mundel P, Kretzler M, Wu C: Formation and phosphorylation of the PINCH-1-integrin linked kinase-alpha-parvin complex are important for regulation of renal glomerular podocyte adhesion, architecture, and survival. J Am Soc Nephrol. 2005, 16 (7): 1966-1976. 10.1681/ASN.2004121112View ArticlePubMedGoogle Scholar
- Bondeva T, Rüster C, Franke S, Hammerschmid E, Klagsbrun M, Cohen CD, Wolf G: Advanced glycation end-products suppress neuropilin-1 expression in podocytes. Kidney Int. 2009, 75 (6): 605-616. 10.1038/ki.2008.603View ArticlePubMedGoogle Scholar
- Reidy KJ, Villegas G, Teichman J, Veron D, Shen W, Jimenez J, Thomas D, Tufro A: Semaphorin3a regulates endothelial cell number and podocyte differentiation during glomerular development. Development. 2009, 136 (23): 3979-3989. 10.1242/dev.037267PubMed CentralView ArticlePubMedGoogle Scholar
- Guan F, Villegas G, Teichman J, Mundel P, Tufro A: Autocrine class 3 semaphorin system regulates slit diaphragm proteins and podocyte survival. Kidney Int. 2006, 69 (9): 1564-1569. 10.1038/sj.ki.5000313View ArticlePubMedGoogle Scholar
- Plaisier E, Mougenot B, Verpont MC, Jouanneau C, Archelos JJ, Martini R, Kerjaschki D, Ronco P: Glomerular permeability is altered by loss of P0, a myelin protein expressed in glomerular epithelial cells. J Am Soc Nephrol. 2005, 16 (11): 3350-3356. 10.1681/ASN.2005050509View ArticlePubMedGoogle Scholar
- Lau F, Ahfeldt T, Osafune K, Akustsu H, Cowan CA: Induced pluripotent stem (iPS) cells: an up-to-the-minute review. F1000 Biology Reports 2009. 2009, 1: 84-Google Scholar
- Do JT, Schöler HR: Regulatory circuits underlying pluripotency and reprogramming. Trends Pharmacol Sci. 2009, 30 (6): 296-302. 10.1016/j.tips.2009.03.003View ArticlePubMedGoogle Scholar
- Zhao XY, Li W, Lv Z, Liu L, Tong M, Hai T, Hao J, long Guo C, wen Ma Q, Wang L, Zeng F, Zhou Q: iPS cells produce viable mice through tetraploid complementation. Nature. 2009, 461 (7260): 86-90. 10.1038/nature08267View ArticlePubMedGoogle Scholar
- Hanna J, Wernig M, Markoulaki S, Sun CW, Meissner A, Cassady JP, Beard C, Brambrink T, Wu LC, Townes TM, Jaenisch R: Treatment of sickle cell anemia mouse model with iPS cells generated from autologous skin. Science. 2007, 318 (5858): 1920-1923. 10.1126/science.1152092View ArticlePubMedGoogle Scholar
- Greber B, Wu G, Bernemann C, Joo JY, Han DW, Ko K, Tapia N, Sabour D, Sterneckert J, Tesar P, Schöler HR: Conserved and divergent roles of FGF signaling in mouse epiblast stem cells and human embryonic stem cells. Cell Stem Cell. 2010, 6 (3): 215-226. 10.1016/j.stem.2010.01.003View ArticlePubMedGoogle Scholar
- Jiang J, Chan YS, Loh YH, Cai J, Tong GQ, Lim CA, Robson P, Zhong S, Ng HH: A core Klf circuitry regulates self-renewal of embryonic stem cells. Nat Cell Biol. 2008, 10 (3): 353-360. 10.1038/ncb1698View ArticlePubMedGoogle Scholar
- Zhou Q, Chipperfield H, Melton DA, Wong WH: A gene regulatory network in mouse embryonic stem cells. Proc Natl Acad Sci USA. 2007, 104 (42): 16438-16443. 10.1073/pnas.0701014104PubMed CentralView ArticlePubMedGoogle Scholar
- Feng B, Jiang J, Kraus P, Ng JH, Heng JCD, Chan YS, Yaw LP, Zhang W, Loh YH, Han J, Vega VB, Cacheux-Rataboul V, Lim B, Lufkin T, Ng HH: Reprogramming of fibroblasts into induced pluripotent stem cells with orphan nuclear receptor Esrrb. Nat Cell Biol. 2009, 11 (2): 197-203. 10.1038/ncb1827View ArticlePubMedGoogle Scholar
- van den Berg DLC, Zhang W, Yates A, Engelen E, Takacs K, Bezstarosti K, Demmers J, Chambers I, Poot RA: Estrogen-related receptor beta interacts with Oct4 to positively regulate Nanog gene expression. Mol Cell Biol. 2008, 28 (19): 5986-5995. 10.1128/MCB.00301-08PubMed CentralView ArticlePubMedGoogle Scholar
- Kidder BL, Yang J, Palmer S: Stat3 and c-Myc genome-wide promoter occupancy in embryonic stem cells. PLoS One. 2008, 3 (12): e3932- 10.1371/journal.pone.0003932PubMed CentralView ArticlePubMedGoogle Scholar
- Satou A, Taira T, Iguchi-Ariga SM, Ariga H: A novel transrepression pathway of c-Myc. Recruitment of a transcriptional corepressor complex to c-Myc by MM-1, a c-Myc-binding protein. J Biol Chem. 2001, 276 (49): 46562-46567. 10.1074/jbc.M104937200View ArticlePubMedGoogle Scholar
- Berchtold NC, Cribbs DH, Coleman PD, Rogers J, Head E, Kim R, Beach T, Miller C, Troncoso J, Trojanowski JQ, Zielke HR, Cotman CW: Gene expression changes in the course of normal brain aging are sexually dimorphic. Proc Natl Acad Sci USA. 2008, 105 (40): 15605-15610. 10.1073/pnas.0806883105PubMed CentralView ArticlePubMedGoogle Scholar
- Matheu A, Maraver A, Serrano M: The Arf/p53 pathway in cancer and aging. Cancer Res. 2008, 68 (15): 6031-6034. 10.1158/0008-5472.CAN-07-6851View ArticlePubMedGoogle Scholar
- Geng Y, Whoriskey W, Park MY, Bronson RT, Medema RH, Li T, Weinberg RA, Sicinski P: Rescue of cyclin D1 deficiency by knockin cyclin E. Cell. 1999, 97 (6): 767-777. 10.1016/S0092-8674(00)80788-6View ArticlePubMedGoogle Scholar
- Fitch ME, Cross IV, Ford JM: p53 responsive nucleotide excision repair gene products p48 and XPC, but not p53, localize to sites of UV-irradiation-induced DNA damage, in vivo. Carcinogenesis. 2003, 24 (5): 843-850. 10.1093/carcin/bgg031View ArticlePubMedGoogle Scholar
- Schiwek D, Endlich N, Holzman L, Holthöfer H, Kriz W, Endlich K: Stable expression of nephrin and localization to cell-cell contacts in novel murine podocyte cell lines. Kidney Int. 2004, 66: 91-101. 10.1111/j.1523-1755.2004.00711.xView ArticlePubMedGoogle Scholar