Skip to main content

Table 1 Overview of methods for estimating regulatory activity from transcriptome data comparing input data, modelling, computational aspects and outcome variables

From: Estimating genome-wide regulatory activity from multi-omics data sets using mathematical optimization

Method Input Model Computation Output
Approach by Schacht et al. - mRNA expression data
- TF binding information
Linear model
\( \widehat{g_{i, s}}= c+{\displaystyle \sum_t}{\beta}_t{b}_{t, i}\left({\theta}_{a, t} ac{t}_{t, s}+{\theta}_{g, t}{g}_{t, s}\right) \)
with
\( a c{t}_{t, s}=\frac{{\displaystyle {\sum}_i}{b}_{t, i}{g}_{i, s}}{{\displaystyle {\sum}_i}{b}_{t, i}},\ {\theta}_{a, t}+{\theta}_{g, t}=1,{\theta}_{a, t},{\theta}_{g, t}\in \left\{0,1\right\} \)
- Optimization criterion: minimize sum of absolute errors
- Mixed-integer linear programming
- Optimization via Gurobi 5.5
- parameter for each TF: β t
- decision for each TF if θ a,t or θ g,t was chosen
RACER - mRNA expression data
- copy number variation
- DNA methylation
- miRNA expression signals
- TF binding information
- miRNA target site info (c)
Linear models:
1) \( \widehat{g_{i, s}}= c+{\theta}_{CNV, s} C N{V}_{i, s}+{\theta}_{DM, s} D{M}_{i, s}+{\displaystyle \sum_{t\ }}{\beta}_{t, s}\ {b}_{t, i} + {\displaystyle \sum_{mi\ }}{\beta}_{mi, s}\ {c}_{i, mi} miRN{A}_{mi, s} \)
2) \( \widehat{g_{i, s}}=\tilde{c}+{\tilde{\theta}}_{i, CNV} C N{V}_{i, s}+{\tilde{\theta}}_{i, DM} D{M}_{i, s}+{\displaystyle \sum_{t\ }}{\gamma}_{i, t}\ {\beta}_{t, s} + {\displaystyle \sum_{mi\ }}{\gamma}_{i, mi}\ {\beta}_{mi,\mathrm{s}} \)
- Optimization criterion: minimize sum of squared errors with L1 norm penalty on linear coefficients
- Elastic-net regularized generalized linear models and LASSO
1) sample-specific TF and miRNA activities β t,s and β mi,s
2) TF-gene γ i,t and miRNA-gene γ i,mi interactions across all samples
RABIT - differential mRNA expression data
- somatic mutations
- DNA methylation
- copy number variation
- TF binding info
- recognition motifs for RNA-binding protein (RBP)
Linear model:
\( \widehat{g_t} = {\displaystyle \sum_f}{\theta}_f{B}_{f, i} + {\displaystyle \sum_t}{\beta}_t{b}_{t, i} \)
With B: background factors (gene CNA, promoter DNA methylation, promoter degree promoter CpG content)
- Frisch-Waugh-Lovell method, select subset of significant TFs via model selection procedure and remove TFs with insignificant correlation across tumors - regulatory activity score for each TF (t value of linear regression coefficient of t-test)
ISMARA - gene expression or chromatin state measurements
- annotation of promoters (number of predicted sites for motifs)
- transcripts and associated promoters
- miRNA target site predictions
Linear model
\( \widehat{g_{p, s}}={c}_p+{c}_s+{\displaystyle \sum_m}{N}_{p, m}\ {\beta}_{m, s} \)
- Optimization criterion: minimize sum of errors
- Bayesian procedure, ridge regression
- Gaussian prior for β m,s to avoid overfitting
- inferred motif activity profiles β m,s with set of TFs and miRNAs binding to sites of these motifs (= key regulators)
- predicted target promoters, associated transcripts and genes
- Network of known interactions between predicted targets and predicted regulatory interactions
- enriched ontology categories
biRte - mRNA differential expression
- miRNA, TF measurements, CNV (optionally)
- regulator (R) – target network
Likelihood model:
\( {L}_{D,\theta}(R)= p\left( D\Big| R,\theta \right)={\displaystyle \prod_{\widehat{D}} p}\left(\widehat{D}\Big| R,\theta \right) = {\displaystyle \prod_{\widehat{D}}{\displaystyle \prod_c{\displaystyle \prod_i p\left({\widehat{D}}_{i c}\Big|{R}_c,\theta \right)}}} \)
- data specific marginal likelihoods using estimation of hidden state variables with via MCMC
- Nested effects model structure Learning to reconstruct transcriptional network
- Estimation of active regulators
- Estimation of associated transcriptional network
ARACNE - microarray expression profiles none - local estimation of pairwise gene expression profile mutual information - Reconstruction of gene regulatory network
  1. Gene expression data is named “g” with index i, estimated parameters with “β”, TF binding information with “b”, TFs with “t”, samples with “s”, miRNAs with “mi” and model constants with “c”. Other variables are explained in the text