### The adjacency matrix and notation

We study the properties of an adjacency matrix (network) *A* that satisfies the following three conditions:

(A.1) *A* is symmetric and has dimension *n × n*.

(A.2) The entries of *A* are bounded within [0, 1], that is, 0 ≤ *a*
_{
ij
}≤ 1 for all 1 ≤ *i*,*j* ≤ *n*.

(A.3) The diagonal elements of *A* are all 1, that is, *a*
_{
ii
}= 1 for all 1 ≤ *i* ≤ *n*.

### Uniqueness of the conformity for an exactly factorizable network

One can easily show that the vector CF is not unique if an exactly factorizable network contains only *n* = 2 nodes. However, for *n >* 2 the conformity is uniquely defined when dealing with a weighted network where *a*
_{
ij
}
*>* 0.

Specifically, we prove the following statement. If

*A* is an

*n* ×

*n* (

*n* ≥ 3) dimensional adjacency matrix with positive entries (

*a*
_{
ij
}> 0), then the system of equations in (7) has at most one solution

CF with positive entries. If the solution exists, it is given by

where
denotes the 'product connectivity' of the *i*-th node.

Proof: by assumption, we have

*a*
_{
ij
}=

*CF*
_{
i
}
*CF*
_{
j
}for a positive vector

CF and

*n* ≥ 3. Multiplying both sides of equation (7) yields

. Since

is positive, we find

. Similarly, eliminating the

*i*-th row and column from

*A* yields

. Since

, we conclude that

*CF*
_{
i
}is uniquely defined by

### Network concept functions and fundamental network concepts

In general, we define a *network concept function* to be a tensor valued function (e.g. the connectivity vector) that takes a square matrix (e.g. the network adjacency matrix) as input.

Denote by

*M* = [

*m*
_{
ij
}] a general

*n* ×

*n* matrix. Then we will study the following network concept functions:

where the components of matrix *B*
_{
M
}in the denominator of the clustering coefficient function are given by *b*
_{
ij
}= 1 if *i* ≠ *j* and *b*
_{
ii
}= *Ind*(*m*
_{
ii
}> 0). Here the indicator function *Ind*(·) takes on the value 1 if the condition is satisfied and 0 otherwise.

For the sake of brevity, we study only a limited selection of network concept functions and do not claim that these are more important than others studied in the literature. Our general formalism for relating fundamental network concepts to their approximate CF-based analogs should allow the reader to adapt our derivations to alternative concepts as well.

Now we are ready to define the fundamental network concepts that are studied in this article.

**Definition 5 (Fundamental Network Concept)***The fundamental network concepts of a network A are defined by evaluating the network functions (equation*(21))*on A* - *I, i.e.**FundamentalNetworkConcept* = *NetworkConcept*(*A* - *I*).

As special cases of this definition, we find the following concepts. The

**connectivity** (also known as degree) of the

*i*-th node is given by

The

**line density** [

13] equals the mean adjacency, i.e

For notational convenience, we sometimes omit the reference to the adjacency matrix and simply write *Density* to denote the fundamental network concepts.

The normalized connectivity

**centralization** (also known as degree centralization) [

14] is given by

Our definition of the network

**heterogeneity** equals the coefficient of variation of the connectivity distribution, i.e.

Note that *Heterogeneity*(*b* * *M*) = *Heterogeneity*(*M*) for a scalar *b* ≠ 0.

The

**clustering coefficient** of node

*i* is a density measure of local connections, or 'cliquishness' [

19,

20]. Specifically,

The

**topological overlap** between nodes

*i* and

*j* reflects their relative interconnectedness. It is defined by

where *l*
_{
ij
}= ∑_{u≠i,j}
*a*
_{
iu
}
*a*
_{
uj
}.

### Network concepts in exactly factorizable networks

In the following, we will present explicit formulas for the fundamental network concepts in Definition 5 when the adjacency matrix *A* is exactly factorizable, i.e. if *a*
_{
ij
}= *CF*
_{
i
}
*CF*
_{
j
}. We define the CF-based adjacency matrix as follows

*A*_{
CF
}:= CF CF^{
τ
}- *diag*(CF^{2}) + *I*, (27)

where

*diag*(

CF
^{2}) denotes the diagonal matrix with diagonal elements

,

*i* = 1 ...

*n*. Then one can easily show that for exactly factorizable networks

Using our definition of network concept functions in equations (

21), one can easily derive the following formulas for

*NetworkConcept*(

*A*
_{
CF
}-

*I*) in terms of the quantities

*S*
_{
p
}(

CF) = ∑

_{
i
}
.

### Approximate CF-based network concepts in general networks

When *A*
_{
CF
}- *I* is used as input of a network concept function, it gives rise to a CF-based network concept as detailed in the following

**Definition 6 (CF-based Network Concepts)***Assume that the conformity vector*CF*can be defined for a general adjacency matrix A. Then the CF-based network concepts are defined by evaluating the network concept functions on A*_{
CF
}- *I* = CF CF^{
τ
}- *diag*(CF^{2})*, i.e.*

*NetworkConcept*_{
CF
}:= *NetworkConcept*(*A*_{
CF
}- *I*).

By definition, fundamental network concepts are equal to their CF-based analogs if *A* is exactly factorizable.

In the following, we define *approximate* CF-based analogs of the fundamental network concepts. The theoretical advantage of these approximate CF-based concepts is that they satisfy simple relationships. Define the *approximate CF-based adjacency matrix* as follows

*A*_{CF,app}= CF CF^{
τ
}. (30)

Note that only the diagonal elements differ between *A*
_{CF,app}and *A*
_{
CF
}. We define the approximate CF-based network concepts by using *AC*
_{F,app}as input of the network concept functions as detailed in the following

**Definition 7 (Approximate CF-based Network Concepts)***The approximate CF-based network concepts of a network A with conformity*CF*are defined by evaluating the network functions (equations* (21))*on A*_{CF,app}= CF CF^{
τ
}*, i.e.**NetworkConcept*_{CF,app}:= *NetworkConcept*(*A*_{CF,app}).

### In approximately factorizable networks, fundamental network concepts are approximately equal to their approximate CF-based analogs

Here we will provide a heuristic derivation of Observation 2. Since the components of CF are positive, one can easily show that *S*
_{4}(CF) ≤ *S*
_{2}(CF)^{2}. For many large, exactly factorizable networks, the ratio *S*
_{4}(CF)*/S*
_{2}(CF)^{2} is close to 0. Since *S*
_{4}(CF)/*S*
_{2}(CF)^{2} =
, this implies that *A*
_{
CF
}- *I* ≈ *A*
_{CF,app}. Since the network concept functions are continuous functions, this implies *NetworkConcept*(*A*
_{
CF
}- *I*) ≈ *NetworkConcept*(*A*
_{CF,app}). These derivations are summarized in the following

**Observation 8 (Approximate Formulas for CF-based Concepts)***If S*_{4}(CF)/*S*_{2}(CF)^{2} ≈ 0, *then*

*NetworkConcept*(*A*_{
CF
}- *I*) ≈ *NetworkConcept*(*A*_{CF,app}). (31)

In particular, for exactly factorizable networks (i.e. *A* - *I* = *A*
_{
CF
}- *I*), this means that the fundamental network concepts can be approximated by their approximate CF-based analogs.

In our real data applications, we show empirically that equation (31) holds even in networks that satisfy the assumptions of Observation 8 only approximately.

In the appendix (equation (43)), we define a measure of network factorizability as follows

Thus, in approximately factorizable networks (i.e. *F*(*A*) close to 1), *A* - *I* can be approximated by *A*
_{
CF
}- *I*. For a continuous network functions, this implies*NetworkConcept*(*A* -*I*) ≈ *NetworkConcept*(*A*
_{
CF
}- *I*),

i.e. the fundamental network concepts are approximately equal to their CF-based analogs in approximately factorizable networks. Observation 8 states that*NetworkConcept*(*A*
_{
CF
}-*I*) ≈ *NetworkConcept*(*A*
_{CF,app}).

Combining the last two equations leads to *NetworkConcept*(*A* - *I*) ≈ *NetworkConcept*(*A*
_{CF,app}). These derivations are summarized as follows.

In approximately factorizable networks, the fundamental network concepts are approximately equal to their approximate CF-based analogs, i.e.*FundamentalNetworkConcept* ≈ *NetworkConcept*
_{CF,app}.

### Construction of gene co-expression networks

Gene co-expression networks are constructed from microarray data that measures the transcriptional response of cells to changing conditions. We consider the case of *n* genes with gene expression profiles across *m* microarray samples. Thus, the gene expression profiles are given by an *n* × *m* matrix

X = [*x*_{
ij
}] = (x_{1}x_{2} ... x_{
n
})^{
τ
}, *i* = 1, ..., *n*; *j* = 1, ..., *m*, (33)

where the *i*-th row
is the transcriptional responses of the *i*-th gene.

Recently, several groups have suggested thresholding the pairwise Pearson correlation coefficient *cor*(x
_{
i
}, x
_{
j
}) in order to arrive at gene co-expression networks, which are sometimes referred to as 'relevance' networks [11, 32]. In these networks, a node corresponds to the gene expression profile of a given gene. The corresponding adjacency matrix is determined from a measure of co-expression between the genes. In the examples below, we will use the absolute value of the Pearson correlation coefficient between the gene expression profiles to measure co-expression.

To transform the co-expression measure into an adjacency, one can make use of an *adjacency function*. The choice of the adjacency function determines whether the resulting network will be weighted (soft-thresholding) or unweighted (hard-thresholding). The adjacency function is a monotonically increasing function that maps the interval [0, 1] into [0, 1]. A widely used adjacency function is the signum function which implements 'hard' thresholding involving the threshold parameter *τ*. Specifically,

*a*_{
ij
}= *Signum*(|*cor*(x_{
i
}, x_{
j
})|, *τ*) = *Ind*(|*cor*(x_{
i
}, x_{
j
})| ≥ *τ*), (34)

where the indicator function *Ind*(·) takes on the value 1 if the condition is satisfied and 0 otherwise. Hard thresholding using the signum function leads to intuitive network concepts (e.g., the node connectivity equals the number of direct neighbors), but it may lead to a loss of information: if τ has been set to 0.8, there will be no connection between two nodes if their similarity equals 0.79.

To avoid the disadvantages of hard thresholding, we proposed a 'soft' thresholding approach that raises the absolute value of the correlation to the power *β* ≥ 1 [21], i.e.

*a*_{
ij
}= *Power*(|*cor*(x_{
i
}, x_{
j
})|, *β*) = |*cor*(x_{
i
}, x_{
j
})|^{
β
}. (35)

In our yeast cell cycle gene co-expression network analysis, we followed the analysis steps described in [21]. Briefly, we used the 2001 most varying and connected genes. Next, we used the power adjacency function with *β* = 7 (equation (35)) to construct a weighted gene co-expression network and the signum adjacency function with *τ* = 0.65 (equation (34)) to construct an unweighted network.

Using our R software tutorial, the reader can easily verify that our conclusions are highly robust with respect to a) different ways of constructing co-expression networks and b) different ways of constructing modules.