Given an RNA-seq dataset, we construct a co-splicing network where nodes represent exons and edges are weighted by the correlation between two exon inclusion rate profiles. Given

*m* co-splicing networks with the same

*n* nodes but different edge weights, we can represent the whole system as a 3

^{rd}-order tensor

. A

*frequent co-splicing cluster* (FSC) in the tensor

can be defined by two membership vectors: (i) the

*exon membership vector*
**x** = (

*x*
_{1}, ...,

*x*
_{
n
})

^{
T
}, where

*x*
_{
i
}= 1 if exon

*i* belongs to the cluster and

*x*
_{
i
}= 0 otherwise; and (ii) the

*network membership vector*
**y** = (

*y*
_{1}, ...,

*y*
_{
m
})

^{
T
}, where

*y*
_{
j
}= 1 if the exons of the cluster are heavily interconnected in network

*j* and

*y*
_{
j
}= 0 otherwise. The summed weight of all edges in the FSC is

Note that only the weights of edges *a*
_{
ijk
}with *x*
_{
i
}= *x*
_{
j
}= *y*
_{
k
}= 1 are counted in
. Thus,
measures the "heaviness" of the FSC defined by **x** and **y**. The problem of discovering a frequent co-splicing cluster can be formulated as a discrete combinatorial optimization problem: *among all patterns of fixed size (K*
_{1}
*member exons and K*
_{2}
*member networks), we look for the heaviest*. This is also an integer programming problem: find the binary membership vectors **x** and **y** that jointly maximize
under the constraints
and
. However, there are several major drawbacks to this discrete formulation.

The first is

*parameter dependence,* meaning that the size parameters

*K*
_{1} and

*K*
_{2} are hard for users to provide and control. The second is

*high computational complexity*; the task is proved to be NP-hard (see additional file

1: Section S1) and therefore not solvable in a reasonable time even for small datasets. Therefore, the discrete optimization problem is infeasible for an integrative analysis of many massive networks. Instead, we solve a continuous optimization problem with the same objective by relaxing integer constraints to continuous constraints. That is, we look for non-negative real vectors

**x** and

**y** that jointly maximize

. This optimization problem is formally expressed as follows:

where ℝ_{+} is a non-negative real space, and *f*(**x**) and *g*(**y**) are vector norms. After solving Eq. (2), users can easily identify the top-ranking networks (after sorting the tensor by **y**) and top-ranking exons (after sorting each network by **x**) contributing to the objective function. After rearranging the networks in this manner, the FSC with the largest heaviness occupies a corner of the 3D tensor. We can then mask all edges in the heaviest FSC with zeros, and optimize Eq. (2) again to search for the next FSC.

The choice of vector norms in Eq. (2) has a significant impact on the outcome of the optimization. A vector norm defined as

, where

*p* > 0, is also called an "

*L*
_{
p
}-vector norm". In general, the closer

*p* is to zero, the sparser the solution favored by the

*L*
_{
p
}-norm; that is, fewer components of the optimized vectors are significantly different from zero [

41]. In contrast, as

*p* increases, the solution favored by the

*L*
_{
p
}-norm grows smoother; in the extreme case

*p* → ∞, the elements of the optimized vector are approximately equal to each other. For more details on these vector norms, refer to the additional file

1: Section S2. Our ideal membership vector selects

*a small number of exons ("sparse") whose values are close to each other in magnitude ("smooth"), while the rest of exons are close to zero.* Our past research [

12] has shown that this goal can be achieved using the mixed norm

*L*
_{0,}
_{∞}(

**x**) =

*α*∥

**x**∥

_{0} + (1 -

*α*)∥

**x**∥

_{∞} (0 <

*α* < 1) for

*f*(

**x**). The norm

*L*
_{0} favors sparsity while the norm

*L*
_{∞} encourages smoothness in the non-zero components of

**x**. In practice, we approximate

*L*
_{0,}
_{∞}(

**x**) with another mixed norm:

*L*
_{
p,2}(

**x**) =

*α*∥

**x**∥

_{
p
}+ (1 -

*α*)∥

**x**∥

_{2}, where

*p* < 1. Our criteria for the network membership vector are similar. We want the exon cluster to appear in as many networks as possible, so the network membership values should be non-zero and close to each other. This is the typical outcome of optimization using the

*L*
_{∞} norm. In practice, we approximate

*L*
_{∞} with

*L*
_{
q
}(

**y**), where

*q* > 1 for

*g*(

**y**). Therefore, the vector norms

*f*(

**x**) and

*g*(

**y**) are fully specified as follows,

We performed simulations to determine suitable values for the parameters *p*, *α*, and *q*, by applying our tensor method to collections of random weighted networks. We randomly placed FSCs of varying size, recurrence, and heaviness in a subset of the random networks. We then tried different combinations of *p*, *α*, and *q*, and adopted the combination (*p* = 0.8, *α* = 0.2, and *q* = 10) that led to the discovery of the most FSCs. More details on these simulations are provided in the additional file 1 (Section S4).

Since the vector norm *f*(**x**) is non-convex, our tensor method requires an optimization protocol that can deal with non-convex constraints. The quality of the optimum discovered for a non-convex problem depends heavily on the numerical procedure. Standard numerical techniques such as gradient descent converge to a local minimum of the solution space, and different procedures often find different local minima. Thus, it is important to find a theoretically justified numerical procedure. We use an advanced framework known as multi-stage convex relaxation, which has good numerical properties for non-convex optimization problems [41]. In this framework, concave duality is used to construct a sequence of convex relaxations that give increasingly accurate approximations to the original non-convex problem. We approximate the sparse constraint function *f*(**x**) by the convex function
, where *h*(**x**) is a specific convex function *h*(*x*) = *x*
^{2} and
is the concave dual of the function
(defined as
). The vector **v** contains coefficients that will be automatically generated during the optimization process. After each optimization, the new coefficient vector **v** yields a convex function
that more closely approximates the original non-convex function *f*(**x**). Details of our tensor-based optimization method can be found in the additional file 1 (Section S3).

Once the membership vectors (i.e., the solution of Eq. (2)) have been found by optimization, the frequent co-splicing clusters can be intuitively obtained by including those exons and networks with large membership values. However, any given solution can result in multiple overlapping patterns whose "heaviness" is greater than a specified threshold. Here, *heaviness* is defined as the average weight of all edges in the pattern. To identify the most representative pattern, we first rank exons and networks in decreasing order of their membership values in
and
. Then we extract two representative patterns that satisfy the heaviness threshold: the pattern that occurs in the most networks while having at least the minimum number of top-ranking exons (e.g., 5), and the pattern with the largest number of top-ranking exons while appearing in at least the minimum number of top-ranking networks (e.g., 3). Both patterns are included as co-splicing clusters in our results. After discovering a pattern, we can mask its edges in those networks where they occur (replacing those elements of the tensor with zeroes) and optimize Eq. (2) again to search for the next frequent co-splicing cluster.