ProtNet
ProtNet [12] is a cell automaton that permits the simulation and analysis the dynamic interactions occurring in a cell lattice under a set of interaction rules The simulator iterates for a number of time steps following the interaction rules contained in an input graph in which the edges link interacting proteins. A probability of forming and breaking a bond at each time step is associated to each pair of proteins. The algorithm runs the following steps:
-
An empty lattice containing a given number of cubic sites is created. The linear dimension of the lattice is given as input to ProtNet.
-
A single randomly oriented monomeric protein can occupy a lattice site. The number of proteins for each molecular species is fixed at the beginning of the simulation. Protein monomers are distributed randomly in the lattice cells.
Each simulation step is composed of two phases leading to a change in the cell configuration:
Interaction phase: The entire grid is visited site-by-site. If the site is occupied, the protein in it can make a connection to proteins occupying neighboring sites, thus forming a new complex. Alternatively an existing bond can be broken thus disrupting an existing complex. Binding is stochastic and the probability of forming a complex (pon) is related to the association rate constant (kon) of the interacting protein pair. After association a complex moves as a single entity. Similarly a binding between two proteins can break down with a dissociation probability (poff) related to their dissociation rates. Since kinetic constants describing the association and dissociation of all pairs of proteins are not currently available such probability has been set to an arbitrary value that is the same for all the proteins as suggested by [12].
Diffusion phase: A second site by site visit is carried out. If the site is occupied, the protein and the complex it belongs to are first rotated and then translated. Complexes or proteins can rotate by 90 degrees around their center of mass in a randomly chosen direction. Rotations are rigid, that is, in multi protein structures, the whole complex undergoes a rotation and there is no torsion. For multi protein complexes the probability of rotation is inversely proportional to their diameter. During the rotation and translation proteins occupying target sites can be recursively moved with a probability that is a function of the ratio between the masses of the hitting and hit protein/complex.
As the diffusion coefficient of a molecule in a dilute solution increases linearly with the inverse of the radius, monomers have the highest probability of moving to one of the neighbouring sites at each time step. As a consequence complexes diffuse more slowly. In addition we assumed that all proteins have identical diffusion coefficients.
Proteins in ProtNet have six binding sites, thus a single protein can bind to, at most, six other partners. To prevent the formation of very large complexes, each protein species can participate in a given complex only once.
For each simulation step ProtNet produces a list of the complexes in the reference pseudo-cell, a dynamic and stochastic representation of the information contained in static protein interaction graphs.
Experimental settings
For our experiments the three-dimensional pseudocell linear dimension has been fixed to 50. Thus the whole lattice contains 125000 sites. The lattice is filled with proteins such that the occupancy is 20%. The interaction rules for natural networks are obtained from the human and yeast interactomes. The probability of creating a bond is set to 0.7 for all the protein pairs, while the probability of breaking a bond is set to 0.002. We have carried out simulations for 150000 steps on natural and random pseudocells. A simulation step, with a lattice containing 125000 sites at a protein concentration of 20%, takes 0.03 seconds.
To each natural network we associate a random network obtained by the Erdös-Rényi (ER) model [22] denoted as G(N, p). The method starts with N vertices and randomly links two nodes with probability p. In order to have the same number of edges of the natural networks we set where m is the desired number of edges and N is the number of proteins in the network. We set (m = 4714 and N = 1890).
Network construction
The input used in ProtNet is a list of binary interactions sorted by relevance score [14]. yeast_net is a network extracted from mentha [14]. The complete list of interactions was filtered in order to minimize false-positives and use only interactions described by more than one method. The resulting network is composed of 1890 protein species and 4714 interactions.
human_net, similarly to yeast_net is a network extracted from mentha, and containing a ranked list of human protein interactions that was limited to a size comparable to that of yeast_net.
We associate a node to each protein species in the list and draw an edge for each experimentally reported interaction. The interactomes used for our experiments can be formally represented as undirected unweighted graphs.
Network analysis
To calculate and compare the topological properties of the different networks we used Cytoscape [33], a software platform for the visualization of molecular interaction networks extended with the NetworkAnalyzer plug-in [34]. From the plethora of available metrics we considered Clustering coefficient [30], Transitivity [31], Modularity [35], Connectivity and Degree distribution [36], Average path length, Diameter and Scale-free fitting index [29].
Self-Similarity Index
In order to analyze the similarity between complexes we have introduced a measure called Self-Similarity Index. For our purpose we define a complex as a set of interacting proteins. Let and be the set of distinct protein species in two complexes. Their similarity index is defined as:
where Δ is the symmetric difference of the two complexes. The intersection between the two complexes is raised to the second power to give more importance to larger complexes.
Let be the list of complexes within a pseudocell, we define the Self-Similarity Index of the pseudocell as follows:
-
We first generate a similarity matrix consisting of n rows and n columns as in Additional file 5.
-
Each entry (i,j) of the similarity matrix contains the value of the Similarity Index (S.I.) between the complexes Ci and Cj.
-
Then the Self-Similarity Index can be computed using the following formula:
In other words the Self-Similarity Index of a pseudo-cell C is the mean value of the best similar pairwise similarities of the complexes in C.
Inter-Cells Similarity Index (ICSI)
Given the definition of the similarity index between two complexes it is possible to compare the complex composition of different pseudo-cells and measure their similarity value.
Let and be the list of complexes of two pseudo-cells A and B. The Inter-cells Similarity Index can be computed as follows:
First generate a similarity matrix with n rows and m columns (Additional file 6). Then the algorithm proceeds as follows:
-
Select the two complexes with the largest Similarity Index
-
Store their Similarity Index into a List L
-
Remove from the matrix the row and the column associated to those complexes
-
Repeat the previous three steps until either columns or rows are finished
In other words at each step the algorithm finds the best matching complexes and computes their Similarity Indexes. The Inter-Cells Similarity Index (ICSI) is then computed as the mean value of the Similarity Indexes stored into the list.
Generation of perturbed random networks
The comparison of different types of pseudo-cells to identify the topological metrics responsible for self-organization is based on the use of different initial networks, with tunable topological parameters as input for ProtNet. Different algorithms are available in the literature to generate random networks with tunable parameters [23–27]. Most of them are based on the configuration model proposed by Molloy and Reed which produces a random graph with a prescribed degree sequence. In the configuration model each node has an assigned potential number of edges called stubs. Edges are created randomly choosing two nodes with free stubs. An algorithm used to generate random networks with tunable degree distribution and transitivity is given in [28]. It is based on two mechanisms known as preferential attachment and dynamic growth. The input values are the number of nodes, a list of degrees to assign to nodes and the desired transitivity value. The algorithm works as follows:
-
Start initializing all nodes with a degree drawn from the degree list.
-
A starting node vo is randomly selected from the list of nodes
-
Neighbors are matched in the following way:
-
Form a list called PotentialTriads of all the nodes at distance 2 from the current node vi
-
For each node in PotentialTriads form a connection with the current node with a probability depending on the desired Transitivity coefficient.
-
If no neighbors were selected from PotentialTriads, randomly select a new node to add to the network.
The network grows until all nodes have neighbors.