A network based covariance test for detecting multivariate eQTL in saccharomyces cerevisiae

Yuan, Huili; Li, Zhenye; Tang, Nelson L.S.; Deng, Minghua

doi:10.1186/s12918-015-0245-0

PROCEEDINGS
Open access
Published: 11 January 2016

A network based covariance test for detecting multivariate eQTL in saccharomyces cerevisiae

Huili Yuan¹,
Zhenye Li¹,
Nelson L.S. Tang² &
…
Minghua Deng^1,3,4

BMC Systems Biology volume 10, Article number: S8 (2016) Cite this article

2283 Accesses
1 Citations
2 Altmetric
Metrics details

Abstract

Background

Expression quantitative trait locus (eQTL) analysis has been widely used to understand how genetic variations affect gene expressions in the biological systems. Traditional eQTL is investigated in a pair-wise manner in which one SNP affects the expression of one gene. In this way, some associated markers found in GWAS have been related to disease mechanism by eQTL study. However, in real life, biological process is usually performed by a group of genes. Although some methods have been proposed to identify a group of SNPs that affect the mean of gene expressions in the network, the change of co-expression pattern has not been considered. So we propose a process and algorithm to identify the marker which affects the co-expression pattern of a pathway. Considering two genes may have different correlations under different isoforms which is hard to detect by the linear test, we also consider the nonlinear test.

Results

When we applied our method to yeast eQTL dataset profiled under both the glucose and ethanol conditions, we identified a total of 166 modules, with each module consisting of a group of genes and one eQTL where the eQTL regulate the co-expression patterns of the group of genes. We found that many of these modules have biological significance.

Conclusions

We propose a network based covariance test to identify the SNP which affects the structure of a pathway. We also consider the nonlinear test as considering two genes may have different correlations under different isoforms which is hard to detect by linear test.

Background

GWAS aims to detect the association between genetic variation and complex diseases. Recent years, GWAS has found 2000 loci associated to complex diseases by statistical methods [1]. As the development of the next-generation sequencing and other high-throughput technology, various types of genome-scale datasets have been collected, providing opportunity to find the mechanism of genetic variation leading to complex diseases by connect the high-throughout data to GWAS. The eQTL study is one of them, which aims to uncover the genetic effects to gene expression and have been conducted in many organisms [2]–[5]. A common approach in eQTL data analysis is to consider association between each expression trait and each genetic marker through regression analysis. Despite great success with this approach, some regulatory signals may not be detected due to complex interaction between SNPs like epistasis.

Although most eQTL studies considered the expression levels of individual genes as response (single outcome variable), the change of correlation between genes under different genetic status still contains some biological information. For example, post-transcriptional regulations such as phosphorylations and dephosphorylations often affect the activities of transcriptional factors (TFs), which further affect the correlation among TF genes and TF target genes, also the co-expression patterns of the targets of TFs. However, such regulations are hard to be detected if only individual gene considered because there may be little change at the expression levels of individual TF genes. The approach considering “liquid association” (LA) between a pair of genes proposed by [6] is a method to identify such loci, which is later introduced into eQTL study [7]. Subsequently, conditional bi-variate normal model has been developed to capture the change of correlation between a pair of genes [8]–[10].

However, a biological process is usually performed by a group of genes (more than two genes as in the bi-variate model). Network approaches should be used to study these interactions [11]–[13]. If we want to see the effects of a cellular change to the organism, it is better for us to consider the change in a functional gene-set such as a pathway. Therefore, some papers has considered the multivariate circumstances by applying CCA to gene expressions and SNP (or CNV) data [14]–[16]. However, these methods do not consider the network structure when finding the association between gene sets and genetic variant, which will miss the information contained in the network. Li et al. [17], Kim and Xing [18], Zhang and Kim [19], Casale et al. [20] have considered pathway structure when studied the association between genetic variation and gene expression. However, they assume the network structure is the same (static) under different genetic variant. In fact, network structure may be dynamic and biologists have realized that differential network analysis will become a standard mode in network analysis and insightful discoveries could be made with differential network analysis [21]. For example, [22] identified a cancer point mutation in the kinase domain of RET, which causes multiple endocrine neoplasia type 2B by leading to a switch in peptide specificity and then altering the network structure.

So we propose a method to test whether the co-expression pattern in a pathway is affected by a SNP. Our goal is to test for a global change in covariance structure in each pathway, which is different from other network-based methods, which tries to detect non-zero edges from all pairs of genes. When we applied our method to a yeast eQTL dataset, we were able to find some pathway-SNP modules that have biological significance.

Methods

Let (X ₁,X ₂,…,X _p ) be the expression levels of a group of genes and (Z ₁,Z ₂,…,Z _m ) be the set of SNPs. Suppose that there are n independent samples and let (x ₁_i ,x ₂_i ,…,x _pi )_i_=1,…,n denote the expression level of (X ₁,X ₂,…,X _p ) in the ith sample and (Z ₁_i ,Z ₂_i ,…,Z _mi ) denote the SNP types of the SNP set in the ith sample. Since the mean expression levels of (X ₁,X ₂,…,X _p ) are also possibly affected by some SNPs in (Z ₁,Z ₂,…,Z _m ), we can imitate the procedure in [9] that we first perform regression analysis or penalized regression analysis such as Lasso [23] or SCAD [24] to adjust the effects of (Z ₁,Z ₂,…,Z _m ) on the means and then model the residuals. We assume that the covariate-adjusted expression levels are appropriately centered to have mean values of zero and our interest is to test whether the covariate-adjusted covariance of expression levels is changed under each SNP. In our analysis, the group of genes are a pathway in KEGG [25]. Figure 1 describes our strategy to detect pathway-SNP associations. In this manuscript, we define a module as the collection of a SNP and a pathway, and our objective is to find pathway-SNP modules where the SNP affect the co-expression patterns among the genes in the pathway.

Model

We use covariance test to find the pathway-SNP modules. There are three key elements of covariance test for a given gene set S. We consider the strategy similar to [26].

• Calculation of T statistics. We calculate a T statistics that reflects the difference of the covariance matrix of the two classes of samples. The statistics is calculated by estimating the Frobenius norm of the difference of the covariance matrix. We first perform the method by [27] to do the test:
$H_{0} : Σ_{1} = Σ_{2}, H_{1} : Σ_{1} \neq Σ_{2}$
(1)

where Σ ₁ is the covariance matrix of gene expression under one genotype and Σ ₂ is that of gene expression under the other genotype. Then we consider the nonlinear relationship between gene expressions by applying kernel method.
• Estimation of significance level of T statistics. We estimate the statistical significance (nominal P value) of the T statistics by using an empirical SNP-based permutation test procedure that preserves the complex correlation structure of the gene expression data. Specifically, we permute the SNP labels and recompute the T statistics of the gene set for the permuted data, which generates a null distribution for the T statistics. The empirical, nominal P value of the observed T statistics is then calculated relative to this null distribution. Importantly, the permutation of class labels preserves gene-gene correlations and, thus, provides a more biologically reasonable assessment of significance than would be obtained by permuting genes.
• Adjustment for multiple hypothesis testing. When an entire database of gene sets is evaluated, we adjust the estimated significance level to account for multiple hypothesis testing. We first normalize the T statistics for each gene set to account for the size of the set, yielding a normalized T statistics. We then control the proportion of false positives by calculating the false discovery rate (FDR) corresponding to each NT statistics. The FDR is the estimated probability that a set with a given NT statistics represents a false positive finding; it is computed by comparing the tails of the observed and null distributions for the NT statistics. To capture the change of the structure of the gene network, we consider the covariance of the gene expression.

Test for high-dimensional covariance matrices

To simplify the problem, we just consider there are two possible values of each SNP. Covariance matrices under two genotypes of the SNP are denoted as Σ ₁ and Σ ₂, respectively. The primary interest is to test

H_{0} : Σ_{1} = Σ_{2}, H_{1} : Σ_{1} \neq Σ_{2}

which is a nontrivial statistical problem because the number of genes is greater than the number of samples sometimes. The test statistic for the hypothesis is formulated by targeting on tr (Σ ₁−Σ ₂)², the squared Frobenius norm of Σ ₁−Σ ₂ [27]. Specifically, the test statistic is

T_{n_{1}, n_{2}} = A_{n_{1}} + A_{n_{2}} - 2 C_{n_{1} n_{2}}

\begin{matrix} A_{n_{h}} & = \frac{1}{n_{h} (n_{h} - 1)} \sum_{i \neq j} {(X_{hi}^{'} X_{hj})}^{2} \\ - \frac{2}{n_{h} (n_{h} - 1) (n_{h} - 2)} \sum_{i, j, k}^{*} X_{hi}^{'} X_{hj} X_{hj}^{'} X_{hk} \\ + \frac{1}{n_{h} (n_{h} - 1) (n_{h} - 2) (n_{h} - 3)} \sum_{i, j, k, l}^{*} X_{hi}^{'} X_{hj} X_{hk}^{'} X_{hl} \end{matrix}

\begin{matrix} C_{n_{1} n_{2}} & = \frac{1}{n_{1} (n_{2})} \sum_{i} \sum_{j} {(X_{1 i}^{'} X_{2 j})}^{2} \\ - \frac{1}{n_{1} n_{2} (n_{1} - 1)} \sum_{i, k}^{*} \sum_{j} X_{1 i}^{'} X_{2 j} X_{2 j}^{'} X_{1 k} \\ - \frac{1}{n_{1} n_{2} (n_{2} - 1)} \sum_{i, k}^{*} \sum_{j} X_{2 i}^{'} X_{1 j} X_{1 j}^{'} X_{2 k} \\ + \frac{1}{n_{1} n_{2} (n_{1} - 1) (n_{2} - 1)} \sum_{i, k}^{*} \sum_{j, l}^{*} X_{1 i}^{'} X_{2 j} X_{1 k}^{'} X_{2 l} \end{matrix}

where h refers to a subpopulation with a particular SNP.

For test H ₀:Σ ₁=Σ ₂=Σ ₃,H ₁:Σ ₁≠Σ ₂ or Σ ₂≠Σ ₃ We consider tr (Σ ₁−Σ ₂)² + tr (Σ ₂−Σ ₃)². Specifically, the test statistic is

T_{n_{1}, n_{2}} + T_{n_{2}, n_{3}}

where $T_{n_{2}, n_{3}}$ is defined similar to $T_{n_{1}, n_{2}}$ .

Kernel method

We generalize the method of [27] to the kernel space inspired by the method of [28]. We give the similar definition of Frobenius norm and covariance matrix. Let p _x and p _y be Borel probability measures defined on a domain Ω. Given observations X:={x ₁,…,x _m } and Y:={y ₁,…,y _n }, drawn independently and identically distributed(i.i.d.) from p _x and p _y , respectively.

Definition (HSDCC) Given separable reproducing kernel Hilbert space (RKHS) $F$ , and measures p _x ,p _y over ( $X, Γ$ ), we define the Hilbert-Schmidt Different Covariance Criterion(HSDCC) as the squared HS-norm of the difference of covariance Σ _xx and Σ _yy :

HSDCC (p_{x}, p_{y}, F) : = ∥ Σ_{xx} - Σ_{yy} ∥_{HS}^{2}

The detailed computation of above norm can be found in text of the Additional file 1. We give the unbiased statistics to $HSDCC (P_{x}, P_{y}, F)$ like [27]

\begin{matrix} A_{n_{h}} & = \frac{1}{n_{h} (n_{h} - 1)} \sum_{i \neq j} k {(X_{hi}, X_{hj})}^{2} \\ - \frac{2}{n_{h} (n_{h} - 1) (n_{h} - 2)} \sum_{i, j, k}^{*} k (X_{hi}, X_{hj}) k (X_{hj}, X_{hk}) \\ + \frac{1}{n_{h} (n_{h} - 1) (n_{h} - 2) (n_{h} - 3)} \sum_{i, j, k, l}^{*} k (X_{hi}, X_{hj}) k (X_{hk}, X_{hl}) \end{matrix}

\begin{matrix} C_{n_{1} n_{2}} & = \frac{1}{n_{1} (n_{2})} \sum_{i} \sum_{j} k {(X_{1 i}, X_{2 j})}^{2} \\ - \frac{1}{n_{1} n_{2} (n_{1} - 1)} \sum_{i, k}^{*} \sum_{j} k (X_{1 i}, X_{2 j}) k (X_{2 j}, X_{1 k}) \\ - \frac{1}{n_{1} n_{2} (n_{2} - 1)} \sum_{i, k}^{*} \sum_{j} k (X_{2 i}, X_{1 j}) k (X_{1 j}, X_{2 k}) \\ + \frac{1}{n_{1} n_{2} (n_{1} - 1) (n_{2} - 1)} \sum_{i, k}^{*} \sum_{j, l}^{*} k (X_{1 i}, X_{2 j}) k (X_{1 k}, X_{2 l}) \end{matrix}

T_{n_{1}, n_{2}} = A_{n_{1}} + A_{n_{2}} - 2 C_{n_{1} n_{2}}

For test

H ₀:Σ _xx =Σ _yy =Σ _zz ,H ₁:Σ _xx ≠Σ _yy or Σ _yy ≠Σ _zz

We consider $∥ Σ_{xx} - Σ_{yy} ∥_{HS}^{2} + ∥ Σ_{yy} - Σ_{zz} ∥_{HS}^{2}$ .

Specifically, the test statistic is

T_{n_{1}, n_{2}} + T_{n_{2}, n_{3}}

where $T_{n_{2}, n_{3}}$ is defined similar to $T_{n_{1}, n_{2}}$ .

Results

Simulation

Comparison between linear method and kernel method

We performed a simulation study to evaluate the power of the proposed kernel methods, and compared the results with the primary method by [27]. Three models have been considered, as below.

Model 1: X _ijk =Z _ijk + θ Z _i j k₊₁, where Z _ijk were i.i.d. standard normally distributed, and θ=0.5 in the null hypothesis while 0.2 or 0.3 in the alternative hypothesis.

Model 2: $X_{ijk} = Z_{ijk}^{3} + θ Z_{ijk + 1}^{3}$ , where Z _ijk and θ were defined the same as that in Model 1.

Model 3: $X_{ijk} = e^{Z_{ijk}} + θ e^{Z_{ijk + 1}}$ , where Z _ijk and θ were defined the same as that in Model 1.

The correlation between variables are linear in model 1, while the correlation between variables are nonlinear in model 2 and 3.

We chose (p, n ₁,n ₂)=(40, 60, 60) and (80, 120, 120) respectively. The power of the tests are shown by ROC curves (Fig. 2). All the simulation results reported were based on 1000 simulations. We can see from the simulation that kernel methods with some parameters have higher power than the linear test when the true relationships between variables are nonlinear (Model 2 and Model 3). A similar simulation results with different setup of parameters can be found in Additional file 1: Figure S3.

Comparison between Chen et al.’s linear method and other method

We conducted a simulation to compare the power of Chen et al.’s method [27] and Tony Cai et al.’s method [29]. We consider four simulation setups represented different signal quantities and strength, the first of which is the same as the model 2 in [29].

Model 1: Let

\begin{matrix} Σ^{*} & = (σ_{ij}^{*}), where ω_{ij}^{*} = 0 . 5^{| i - j |} for 1 \leq i, j \leq p. \\ Σ & = D^{1 / 2} Σ^{*} D^{1 / 2}, where D = (d_{ij}), d_{ii} = Unif (0.5, 2.5), 1 \leq i \leq p \\ Σ_{1} & = Σ + δI, Σ_{2} = Σ + U + δI, where δ \\ = | min {λ_{min} (Σ + U), λ_{min} (Σ)} | + 0.05, \end{matrix}

U=(u _kl ) be a matrix with eight random nonzero entries, each with a magnitude generated from U n i f(0,4)∗ max1≤j≤p σ _jj . The number of each class samples is 50 and the number of variables is 50.

Model 2: U=(u _kl ) be a matrix with eight random nonzero entries, each with a magnitude generated from U n i f(0,400)∗ max1≤j≤p σ _jj .

Model 3: U=(u _kl ) be a matrix with 500 random nonzero entries, each with a magnitude generated from U n i f(0,4)∗ max1≤j≤p σ _jj .

Model 4: U=(u _kl ) be a matrix with 500 random nonzero entries, each with a magnitude generated from U n i f(0,400)∗ max1≤j≤p σ _jj .

As shown in Fig. 3, under the sparse setups (Model 1 and 2), the results of Tony Cai et al.’s method is much better than those of Chen et al.’s method. Chen et al.’s method is better than Tony Cai et al.’s method when the setups are not sparse (Model 3 and 4). Since Tony Cai et al.’s method corresponds to testing each element in the covariance matrix by Hoteling’s test and then give the judgement according to the maximum statistic of all of the Hoteling’s tests, so Chen et al.’s linear method has higher power than bi-variate model when the setups are not sparse. A similar simulation results with different number of samples can be found in Additional file 1: Figure S2.

Real data results

Associated SNP and pathways

We analyzed the yeast dataset collected by Kruglyak and colleagues [30]. The expression data were downloaded from http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.0060083, with 4482 genes measured in 109 segregants derived from a cross between BY and RM. The experiments were performed under two conditions, glucose and ethanol. We did the pre-processing like [10], after which 4419 genes and 820 merged markers remained. We mapped 4419 genes to 103 pathways and analyzed the effect of each SNP to each pathway. Therefore, we tested 103*820 times. The algorithm was implemented in R, which can be found at http://www.math.pku.edu.cn/teachers/dengmh/NetworkBiomarker.

We performed covariance test and kernel covariance with parameter 1 and 10 respectively to the pathway-SNP adjusted modules. We consider 103 pathways in KEGG [25] (the number of genes in each pathway can be found in the Additional file 1: Figure S1) and 820 merged markers under ethanol and glucose condition respectively. We found 72 pathway-SNP modules under ethanol condition and 94 modules under glucose condition. Specifically, we found 36 modules by covariance test, 9 modules by kernel covariance test with parameter 1 and 51 modules by kernel covariance test with parameter 10 under ethanol condition, while 86 modules by covariance test, 3 modules by kernel covariance test with parameter 10 and 12 modules by kernel covariance test with parameter 1 under glucose condition (Fig. 4). Table 1 showed the associated pathways and SNPs under ethanol condition while Table 2 showed the associated pathways and SNPs under glucose condition.

Table 1 New associated pathways and SNPs under ethanol condition

Full size table

Table 2 New associated pathways and SNPs under glucose condition

Full size table

Kernel Method found isoform-specific structure change

In our result, we found Valine, leucine and isoleucine biosynthesis pathway was associated with YCL023C marker only by kernel method under ethanol condition. Figure 5 shows the non-linear correlation between two pairs of genes, YER086W-YCL064C and YLR355C-YCL064C were nonlinear correlated with genotypes of YCL023C. And more than 10 isoforms of YER086W and 6 isoforms of YLR355C have been found (Saccharomyces Genome Database, http://www.yeastgenome.org/). The nonlinear correlation between two pairs of genes might be caused by samples in different isoforms. Specifically, two genes may be positive correlated under one isoform while negetive correlated under another isoform. However, the correlation of two genes might be missed if when we only considered linear correlation.

Linoleic acid metabolism is associated to cell cycle

Our method found YFL029C is associated with linoleic acid metabolism pathway under ethanol condition. With single gene correlation analysis, both of the mean of expression levels of YKR089C (TGL4) and YJR155W (AAD10) were not associated with YFL029C (Fig. 6 middle and right). Specifically, with p-value 0.5174 and 0.002804 (not significant for multiple test). However, the scatterplot after correction shows the correlation of two genes change apparently under YFL029C (Fig. 7 left). Under one status of SNP, the two genes are positive correlated while under the other status of SNP, the two are nearly independent. To understand this from the biological meaning which was showned in Fig. 8 b, we found that marker locates in gene CAK1 (The expression of CAK1 is slightly different under two SNP status which was shown in Fig. 6 left.), the product of which can increase the activity of CDC28 [31]. CDC28 plays an important role in cell cycle. It can control the progress of cell cycle by phosphorylate different transcription factor. In our case, CDC28 phosphorylate ACE2 [32] which can increase the activity of transcription factor, SUA7 [33]. SUA7 is the transcription factor of TGL4, which is a lipase in linoleic acid metabolism pathway. Meanwhile, CDC28 and FKH1 can form complex [34] and FKH1 is the transcription factor of AAD10, which is another enzyme in linoleic acid metabolism pathway. The correlation between YKR089C and its TF was shown in Fig. 7 middle and the correlation between YJR155W and its TF was shown in Fig. 7 right. From the structure of the pathway in KEGG [25] as shown in Fig. 8 a, the different status of the SNP YFL029C might lead to different amounts of intermediate product in the pathway.

Discussion and conclusion

We propose a network based covariance test to identify the marker which affects the structure of a pathway. It has an advantage that a static network structure is not assumed. The biomarker we defined is the SNP associated to the structure of genes in the pathway. Considering two genes may have different correlations under different isoforms which is hard to detect by linear test, so we also consider the nonlinear test. We identified a total of 166 modules, with each module consisting of a group of genes and one eQTL where the eQTL regulate the co-expression patterns of the group of genes. We found that many of these modules have biological interpretations. Till now, we consider the difference of two networks by covariance matrix and covariance operators. We will focus on difference of precision matrix in the future research.

Declarations

This article has been published as part of BMC Systems Biology Volume 10 Supplement 1, 2016: Selected articles from the Fourteenth Asia Pacific Bioinformatics Conference (APBC 2016): Systems Biology. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcsystbiol/supplements/10/S1.

Additional file

References

Visscher PM, Brown MA, McCarthy MI, Yang J: Five years of gwas discovery. Am J Hum Genet. 2012, 90 (1): 7-24. 10.1016/j.ajhg.2011.11.029.
Article CAS PubMed PubMed Central Google Scholar
Albert FW, Kruglyak L: The role of regulatory variation in complex traits and disease. Nat Rev Genet. 2015, 16: 197-212.
Article CAS PubMed Google Scholar
Cookson W, Liang L, Abecasis G, Moffatt M, Lathrop M: Mapping complex disease traits with global gene expression. Nat Rev Genet. 2009, 10 (3): 184-94. 10.1038/nrg2537.
Article CAS PubMed PubMed Central Google Scholar
Nica AC, Dermitzakis ET: Expression quantitative trait loci: present and future. Phil Trans R Soc B: Biol Sci. 2013, 368 (1620): 20120362-10.1098/rstb.2012.0362.
Article Google Scholar
Wang P, Dawson JA, Keller MP, Yandell BS, Thornberry NA, Zhang BB, et al: A model selection approach for expression quantitative trait loci (eqtl) mapping. Genetics. 2011, 187 (2): 611-21. 10.1534/genetics.110.122796.
Article CAS PubMed PubMed Central Google Scholar
Li KC: Genome-wide coexpression dynamics: theory and application. Proc Natl Acad Sci. 2002, 99 (26): 16875-80. 10.1073/pnas.252466999.
Article CAS PubMed PubMed Central Google Scholar
Sun W, Yuan S, Li KC: Trait-trait dynamic interaction: 2d-trait eqtl mapping for genetic variation study. BMC Genomics. 2008, 9 (1): 242-10.1186/1471-2164-9-242.
Article PubMed PubMed Central Google Scholar
Ho YY, Parmigiani G, Louis TA, Cope LM: Modeling liquid association. Biometrics. 2011, 67 (1): 133-41. 10.1111/j.1541-0420.2010.01440.x.
Article PubMed Google Scholar
Chen J, Xie J, Li H: A penalized likelihood approach for bivariate conditional normal models for dynamic co-expression analysis. Biometrics. 2011, 67 (1): 299-308. 10.1111/j.1541-0420.2010.01413.x.
Article PubMed PubMed Central Google Scholar
Wang L, Zheng W, Zhao H, Deng M: Statistical analysis reveals co-expression patterns of many pairs of genes in yeast are jointly regulated by interacting loci. PLoS Genet. 2013, 9 (3): 1003414-10.1371/journal.pgen.1003414.
Article Google Scholar
Basso K, Margolin AA, Stolovitzky G, Klein U, Dalla-Favera R, Califano A: Reverse engineering of regulatory networks in human b cells. Nat Genet. 2005, 37 (4): 382-90. 10.1038/ng1532.
Article CAS PubMed Google Scholar
Bonneau R, Facciotti MT, Reiss DJ, Schmid AK, Pan M, Kaur A, et al: A predictive model for transcriptional control of physiology in a free living cell. Cell. 2007, 131 (7): 1354-65. 10.1016/j.cell.2007.10.053.
Article CAS PubMed Google Scholar
Pereira-Leal JB, Enright AJ, Ouzounis CA: Detection of functional modules from protein interaction networks. Proteins Struct Function Bioinforma. 2004, 54 (1): 49-57. 10.1002/prot.10505.
Article CAS Google Scholar
Witten DM, Tibshirani R, Hastie T: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics. 2009, 10 (3): 515-34.
Article PubMed PubMed Central Google Scholar
Naylor MG, Lin X, Weiss ST, Raby BA, Lange C: Using canonical correlation analysis to discover genetic regulatory variants. PloS ONE. 2010, 5 (5): 10395-10.1371/journal.pone.0010395.
Article Google Scholar
Lin D, Zhang J, Li J, Calhoun VD, Deng HW, Wang YP: Group sparse canonical correlation analysis for genomic data integration. BMC Bioinformatics. 2013, 14 (1): 245-10.1186/1471-2105-14-245.
Article CAS PubMed PubMed Central Google Scholar
Li Y, Nan B, Zhu J. Multivariate sparse group lasso for the multivariate multiple linear regression with an arbitrary group structure. Biometrics. 2015.
Google Scholar
Kim S, Xing EP: Statistical estimation of correlated genome associations to a quantitative trait network. PLoS Genet. 2009, 5 (8): 1000587-10.1371/journal.pgen.1000587.
Article Google Scholar
Zhang L, Kim S: Learning gene networks under snp perturbations using eqtl datasets. PLoS Comput Biol. 2014, 10 (2): 1003420-10.1371/journal.pcbi.1003420.
Article Google Scholar
Casale FP, Rakitsch B, Lippert C, Stegle O. Efficient set tests for the genetic analysis of correlated traits. Nat Meth. 2015.
Google Scholar
Ideker T, Krogan NJ. Differential network biology. Mol Syst Biol. 2012;8.
Google Scholar
Zhou S, Carraway KL, Eck MJ, Harrison SC, Feldman RA, Mohammadi M, et al: Catalytic specificity of protein-tyrosine kinases is critical for selective signalling. Nature. 1995, 373 (6514): 536-9. 10.1038/373536a0.
Article Google Scholar
Tibshirani R: Regression shrinkage and selection via the lasso. J R Stat Soc. Series B (Methodological). 1996, 58 (1): 267-88.
Google Scholar
Fan J, Li R: Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc. 2001, 96 (456): 1348-60. 10.1198/016214501753382273.
Article Google Scholar
Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 1999, 27: 29-34. 10.1093/nar/27.1.29.
Article CAS PubMed PubMed Central Google Scholar
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci of the USA. 2005, 102 (43): 15545-50. 10.1073/pnas.0506580102.
Article CAS Google Scholar
Li J, Chen SX: Two sample tests for high-dimensional covariance matrices. Ann Stat. 2012, 40 (2): 908-40. 10.1214/12-AOS993.
Article Google Scholar
Jain S, Simon HU, Tomita E. Algorithmic Learning Theory. 16th International Conference, ALT 2005, Singapore, October 8–11, 2005, Proceedings. Springer-Verlag Berlin Heidelberg 2005.
Google Scholar
Cai T, Liu W, Xia Y: Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. J Am Stat Assoc. 2013, 108 (501): 265-77. 10.1080/01621459.2012.758041.
Article CAS Google Scholar
Smith EN, Kruglyak L: Gene–environment interaction in yeast gene expression. PLoS Biol. 2008, 6 (4): 83-10.1371/journal.pbio.0060083.
Article Google Scholar
Kurat CF, Wolinski H, Petschnigg J, Kaluarachchi S, Andrews B, Natter K, et al: Cdk1/cdc28-dependent activation of the major triacylglycerol lipase tgl4 in yeast links lipolysis to cell-cycle progression. Mol Cell. 2009, 33 (1): 53-63. 10.1016/j.molcel.2008.12.019.
Article CAS PubMed Google Scholar
Ubersax JA, Woodbury EL, Quang PN, Paraz M, Blethrow JD, Shah K, et al: Targets of the cyclin-dependent kinase cdk1. Nature. 2003, 425 (6960): 859-64. 10.1038/nature02062.
Article CAS PubMed Google Scholar
Zhang DY, Dorsey MJ, Voth WP, Carson DJ, Zeng X, Stillman DJ, et al: Intramolecular interaction of yeast tfiib in transcription control. Nucleic Acids Res. 2000, 28 (9): 1913-20. 10.1093/nar/28.9.1913.
Article CAS PubMed PubMed Central Google Scholar
Venters BJ, Wachi S, Mavrich TN, Andersen BE, Jena P, Sinnamon AJ, et al: A comprehensive genomic binding map of gene and chromatin regulatory proteins in saccharomyces. Mol Cell. 2011, 41 (4): 480-92. 10.1016/j.molcel.2011.01.015.
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Nos. 31171262, 31428012, 31471246), and the National Key Basic Research Project of China (No. 2015CB910303). We thank Dr. Lin Wang for helpful feedback during the saccharomyces cerevisiae data pre-processing and Dr. Minping Qian and Dr. Kai Song’s discussions. The publication costs for this article are from the National Natural Science Foundation of China (No. 31471246).

Author information

Authors and Affiliations

LMAM, School of Mathematical Sciences, Peking University, Yiheyuan Road, Beijing, 100871, China
Huili Yuan, Zhenye Li & Minghua Deng
Department of Chemical Pathology, Prince of Wales Hospital, Faculty of Medicine, The Chinese University of Hong Kong, Shatin, Hong Kong, China
Nelson L.S. Tang
Center for Quantitative Biology, Peking University, Yiheyuan Road, Beijing, 100871, China
Minghua Deng
Center for Statistical Sciences, Peking University, Yiheyuan Road, Beijing, 100871, China
Minghua Deng

Authors

Huili Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Zhenye Li
View author publications
You can also search for this author in PubMed Google Scholar
Nelson L.S. Tang
View author publications
You can also search for this author in PubMed Google Scholar
Minghua Deng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Minghua Deng.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

MD and NT conceived of the project. The flowchart of the strategy was designed by MD and HY. HY and ZL developed the kernel method. HY conducted the simulation and real data analysis. HY wrote the first draft of the manuscript, which all authors revised and approved.

Electronic supplementary material

12918_2015_245_MOESM1_ESM.pdf

Additional file 1: Supplementary materials for the computation of HSDCC and additional figures Figures S1–S3. (PDF 1474 kb) (PDF 1 MB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver (https://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Yuan, H., Li, Z., Tang, N.L. et al. A network based covariance test for detecting multivariate eQTL in saccharomyces cerevisiae. BMC Syst Biol 10 (Suppl 1), S8 (2016). https://doi.org/10.1186/s12918-015-0245-0

Download citation

Published: 11 January 2016
DOI: https://doi.org/10.1186/s12918-015-0245-0

A network based covariance test for detecting multivariate eQTL in saccharomyces cerevisiae

Abstract

Background

Results

Conclusions

Background

Methods

Model

Test for high-dimensional covariance matrices

Kernel method

Results

Simulation

Comparison between linear method and kernel method

Comparison between Chen et al.’s linear method and other method

Real data results

Associated SNP and pathways

Kernel Method found isoform-specific structure change

Linoleic acid metabolism is associated to cell cycle

Discussion and conclusion

Declarations

Additional file

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ contributions

Electronic supplementary material

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Systems Biology

Contact us