Skip to main content
  • Methodology article
  • Open access
  • Published:

Using the ratio of means as the effect size measure in combining results of microarray experiments



Development of efficient analytic methodologies for combining microarray results is a major challenge in gene expression analysis. The widely used effect size models are thought to provide an efficient modeling framework for this purpose, where the measures of association for each study and each gene are combined, weighted by the standard errors. A significant disadvantage of this strategy is that the quality of different data sets may be highly variable, but this information is usually neglected during the integration. Moreover, it is widely known that the estimated standard deviations are probably unstable in the commonly used effect size measures (such as standardized mean difference) when sample sizes in each group are small.


We propose a re-parameterization of the traditional mean difference based effect measure by using the log ratio of means as an effect size measure for each gene in each study. The estimated effect sizes for all studies were then combined under two modeling frameworks: the quality-unweighted random effects models and the quality-weighted random effects models. We defined the quality measure as a function of the detection p-value, which indicates whether a transcript is reliably detected or not on the Affymetrix gene chip. The new effect size measure is evaluated and compared under the quality-weighted and quality-unweighted data integration frameworks using simulated data sets, and also in several data sets of prostate cancer patients and controls. We focus on identifying differentially expressed biomarkers for prediction of cancer outcomes.


Our results show that the proposed effect size measure (log ratio of means) has better power to identify differentially expressed genes, and that the detected genes have better performance in predicting cancer outcomes than the commonly used effect size measure, the standardized mean difference (SMD), under both quality-weighted and quality-unweighted data integration frameworks. The new effect size measure and the quality-weighted microarray data integration framework provide efficient ways to combine microarray results.


Microarray technology has been widely used in identifying differentially expressed genes [1, 2] and in building predictors for disease outcome diagnosis [37]. Although individual microarray studies can be highly informative for this purpose (e.g. van 'tVeer et al., [4]), it is difficult to make a direct comparison among the results obtained by different groups addressing similar biological problems, since laboratory protocols, microarray platforms and analysis techniques used in each study may not be identical [8, 9]. Moreover, most individual studies have relatively small sample sizes, and hence prediction models trained on individual studies by using cross-validation procedures are prone to over-fitting, leading to prediction accuracies that are over-estimated and lack generalizability [10].

Recent studies show that systematic integration of gene expression data from different sources can increase statistical power to detect differentially expressed genes while allowing for an assessment of heterogeneity [1118], and may lead to more robust, reproducible and accurate predictions [19]. Therefore, our ability to develop powerful statistical methods for efficiently integrating related genomic experiments is critical to the success of the massive investment made on genomic studies. Broadly speaking, the strategies to integrate microarray studies can be divided into three categories:

The first category is a combined analysis of all the data. Each data set is first preprocessed to clean and align the signals, and then these preprocessed datasets are put together so that the integrated data set can be treated as though it comes from a single study. In this way, the effective sample size is greatly increased. Several transformation methods have been proposed to process gene expression measures from different studies [9, 14, 17, 20]. For example, Jiang et al. [14] transformed the normalized data sets to have similar distributions and then put the data sets together. Wang et al. [17] standardized gene expression levels based on the means and standard deviations of expression measurements from the arrays of healthy prostate samples. These methods are simple and in many cases, if the transformation is carefully made, the performance of disease outcome prediction can be improved [14]. Nevertheless, there are no consensus or clear guidelines on the best way to perform the necessary data transformations.

The second strategy is to combine analysis results obtained from each study. The basic idea is to combine evidence of differential expression using a summary statistic, such as the p-value, across multiple gene profiling studies and then to adjust for multiple testing. For example, Rhodes et al. [11, 12] combined results from four prostate cancer microarray datasets analyzed on different platforms. Differential expression between the prostate tumor group and the normal group was first assessed independently for each gene in each dataset using the statistical confidence measure, the p-value. Then the study-specific p-values were combined, using the result that -2 log(p-value) has a chi-squared distribution under the null hypothesis of no differential expression. The analysis revealed that stronger significance was obtained from the combined analysis than from the individual studies. Combining p-values is useful in obtaining more precise estimates of significance, but this method does not indicate the direction of significance (e.g., up-or down-regulation) [21]. Instead of integrating p-values directly, some studies explored combining ranks of statistics from different studies [18, 22]. For example, DeConde et al. [22] proposed a rank-aggregation method to combine final microarray results from five prostate cancer studies. The method summarizes majority preferences between pairs of genes across ranked list from different studies. They found this method more reliably identifies differentially expressed genes across studies.

The third strategy involves taking inter-study variability into account when estimating the overall effect for each gene across studies, and then basing conclusions on the distribution of these overall measures. For example, Choi et al. [13] focused on integrating effect size estimates in individual studies into an overall estimate of the average effect size. The effect size is normally used to measure the magnitude of treatment effect in a given study. Inter-study variability was included in the model with an associated prior distribution. This type of model, also termed hierarchical Bayesian random effects, has been used broadly in non-microarray contexts (e.g., DuMouchel and Harris [23]; Smith et al., [24]). Using the same microarray datasets as those used by Rhodes et al. [11], they demonstrated that their method can lead to the discovery of small but consistent expression changes with increased sensitivity and reliability among the datasets. The hierarchical Bayesian random effects meta-analysis model has several favorable features: it provides an overall effect size, and it accounts for inter-study variability, which may improve accuracy of results.

The widely used effect size measure in this type of models is the standardized mean difference [25, 26]. It has been well-known in microarray data analysis that the estimated standard deviation is probably unstable when sample size in each group is small. Therefore, many efforts have been made to overcome the shortcoming by estimating a penalty parameter for smoothing the estimates using information from all genes rather than relying solely on the estimates from an individual gene [1, 27].

However, recent studies show that differentially expressed genes may be best identified using fold-change measures rather than t-like statistics [28]. Fold change is a commonly used measure in small laboratory experiments of gene expression; it is considered to be a natural measure for gene expression changes [29]. In high-throughput microarray analysis, properties of fold change statistics have received little attention. Therefore, more investigation on reparameterization of effect size measures is needed.

Most data integration papers in microarray analysis have not used measures of quality to refine their analyses [9, 1115, 17, 20, 22]. Nevertheless, in classical meta-analysis, quality measures have often been used when combining results across studies. It has been argued that studies of a higher quality will give more accurate estimates of the true parameter of interest, and therefore studies of high quality should receive a higher weight in the analysis summarizing across studies [30]. In gene expression microarrays, many genes may be "off" or not detectable in a particular adult tissue, and in addition, some genes may be poorly measured due to probes that are not sufficiently sensitive or specific. Therefore, the signal strength and clarity will vary across the genes, suggesting that a quality measurement could highlight strong clear signals [31, 32]. Although it is still an open question how to best measure the quality of a gene expression measurement, and how best to use such a quality measure, different strategies can be considered for incorporating quality weights into meta-analysis of microarray studies. For example, we can define a quality threshold and only include genes that are above this threshold in the meta-analysis. However, the choice of threshold will be arbitrary. In a recent study, we proposed a quality measure based on the detection p-values estimated from Affymetrix microarray raw data [16, 31]. Using an effect-size model, we demonstrated that the incorporation of quality weights into the study-specific test statistics, within a meta-analysis of two Affymetrix microarray studies, produced more biological meaningful results than the unweighted analysis [16].

In this paper, we reparameterize the effect size measure for each gene in each study as the log ratio of the mean expressions of the two groups being compared. Following the method proposed by Hu et al. [16], we then place the new effect size measure into a quality-weighted modeling framework. We evaluate and compare the effect size measures (new and old) under the quality-weighted and quality-unweighted data integration frameworks using simulated data sets and real data sets with focus on identifying differentially expressed biomarkers and their performance on cancer outcome prediction.


Quality score measure for Affymetrix microarray data

For Affymetrix expression data, we previously developed a quality measure based on the detection p-values [33] that reflects whether the transcript is reliably expressed above the background in at least one experimental group in each study [16, 31] (see Additional file 1). The sensitivity parameter, v, that alters the tolerance of the quality weight to the detection p-value significance levels, was set to 0.05.

Using log ratio of means as effect size measure

There are many ways to measure effect size for gene g in individual study [25]. A commonly used way is the standardized mean difference (SMD). Let r gl represent the raw expression value for gene g and subject l and x gl = log(r gl ). The standardized mean difference (SMD) of x gl is given by


where and are the sample means of logged gene expression values for gene g in treatment group (t) and control group (c) in a given study, respectively. is the pooled standard deviation for gene g. The estimated variance of the unbiased effect size yg 1is given by Cooper and Hedges [25]


For a study with n(n = n t +n c ) samples, an approximately unbiased estimate of yg 1is given by [26].

Here, we propose an alternative method to measure effect size based on the log ratio of means (ROM), that is, the log fold-change given by


In contrast to the previous approach, here and are the sample means of unlog -transformed gene expression values for gene g in treatment group (t) and control group (c) in a given study, respectively. The estimated variance of the effect size yg 2can be estimated using delta method [34] as follows


where and are the variances of the treatment and control groups, respectively.

Integrative analysis of effect sizes in a quality-adjusted modeling framework

Any defined quality measure can be incorporated into integrative analysis of gene expression profiles using a quality-adjusted meta-analysis framework [16]. The rationale of the framework is that studies of a high quality should receive a higher weight in the analysis summarizing across studies [30]. Here, we follow Hu et al. [16] to place either the SMD effect size measure yg 1or the ROM effect size measure yg 2into a hierarchical model and to test for differences between groups. For either measure, we can write, for study i and measure m (m SMD or ROM),


where is the between-study variability of gene g with effect size measure m, μ gm represents the average measure of differential expression across the I studies for gene g. Here, and μ gm are gene-specific while and y igm are gene and study-specific (i = 1,2,...,I). The quantity measures the effect size variance of gene g, measuring the sampling error for the ithstudy. Following Hu et al. [16], we can estimate μ gm by taking the quality q ig for gene g and study i into account


where q ig and y igm are quality measure and the estimated effect size based on measure m for gene g in study i, respectively. and is the between-study variability [13]. Here we used a random-effects model to combine the estimated effect sizes (see Additional file 1). The variance of this estimator is obtained by


A test statistic to evaluate differential expression of gene g across all I studies can then be computed as


We evaluated the statistical significance of gene g by calculating the p-value corresponding to the z statistic; then we estimated the false discovery rates (FDR) for each significance level, to take into account the number of tests performed [35]. A detailed description of the integrative analysis of effect sizes can be found in the see Additional file 1.

We refer the approaches of estimating z gm using either the log ratio of means (m = 2) or the standardized mean difference (m = 1) as WROM and WSMD, respectively, in the quality-adjusted modeling framework, and as UWROM and UWSMD, respectively, in the quality-unadjusted modeling framework, where q ig = 1.


Model probe-level gene expression profile in a single study

Following previous studies to generate Affymetrix probe level data [31, 36], we modeled the probe-level gene expression for different conditions (e.g. cancer and normal samples) in a single study as:


where Y jgk and W jgk are PM and MM intensities for the probe j in probeset g on array k respectively. O denotes optical noise, independently drawing from and [36]. represent non-specific binding (NSB) noise for PM (XX = PM) and MM (XX = MM), respectively. We set μMM= μPM= 4.6 and assumed that and follow a bivariate normal distribution with mean 0, variance 1, and correlation 0.88. We then generated identically and independently distributed random variates e ~ N(0,0.08), so that and similarly . are quantity proportional to RNA expression for PM (XX = PM) and MM (XX = MM), respectively, and the coefficient 0 < Φ < 1 accounts for the fact that for some probe-pairs the MM detects signal; When probe j of gene g is attached by picking up stray signal, Φ jg is generated as Φ jg ~Beta(0.5,5), otherwise, Φ gj = 0. Since S follows a power law, we set its base to 2. Therefore, if we denote γ g as the baseline log expression level for probeset g, we can select log2(γ g ) expression levels from 0 to 12, which can be generated from γ g ~12* Beta(1,3)+1. δ g is the expected differential expression of gene g in covariate X. α jgk is the signal detecting ability of probe j in gene g on array k, which is assumed to follow a normal distribution with mean zero and signal detection variance . We generated multiplicative errors and independently from N(0, ).

Generate simulated data sets for multiple studies

We generated two Affymetrix microarray data sets, which are assumed to be from two independent studies. In each of the two data sets, we assume treatment group t and control group c with and arrays in the ithstudy, respectively. We generated G genes and assume the proportion of expressed genes is q and the proportion of differentially expressed genes is d of the G*q expressed genes in each study. We ran three simulation models following the above design by varying treatment effects on the signal between 1.0 (small) and 2 (large) with interval 0.5. The specific parameters used in the five models are summarized in Table 1:

Table 1 Parameters used in simulation of probe-level gene expression profile

We used summarized receiver operating characteristic (SROC) curves to compare performance, where the test sensitivities and specificities (true positive and true negative proportions) for a range of p-value cutoffs were averaged over 500 simulated datasets in each study. The SROC curve's overall behavior can be measured by the area under the curve (AUC) [37].

Affymetrix Microarray data

We used gene expression data on prostate tumours and controls from four studies [3841]. The datasets will be referred to by the name of the first author. All these datasets are either publicly available or obtainable upon request. Information about these datasets, such as microarray platforms, the number of samples available, etc, is listed in Table 2. For these four data sets, we used the robust multi-array average (RMA) algorithm [42] to get summarized probeset-level expression data, and then we obtained the unlogged normalized expression data. There are 12,600 common probesets across the four data sets. We performed integrative analysis using the first three data sets in the table (the Welsh data, the LaTulippe data, and the Singh data) to identify differentially expressed genes and then developed our predictive models (the "training data") based on the selected genes. The fourth data set (the Stuart data) was used for testing the models (the "testing" data).

Table 2 Main characteristics of the Affymetrix microarray data sets


Analysis of simulated data sets

We evaluated the performance of our method using simulated Affymetrix probe level expression data generated from a model incorporating probe level effects, optical noise, and non-specific binding, as well as true signals [31, 36]. Following the simulation procedures described in Methods section, we run three simulation models for probe-level gene expression profiles generated from two independent studies. Treatment effects on the signal were varied between 1.0 (small) and 2.0 (large) in the three models. Table 3 shows AUCs for the three simulation models under different weighting and effect size parameterization strategies. As seen from the table, the quality-weighted data integration framework produces better performance than the quality-unweighted data integration framework for SMD and ROM-based effect size (It should be noted that the normalized gene expression values for SMD and ROM-based effect sizes are given in log2 and natural scale, respectively), respectively. In terms of the effect size measures, the proposed log ratio of mean method has higher sensitivity than the standardized mean difference method.

Table 3 Area under the curves of the four meta-analysis models (s = 0.05)

Analysis of prostate cancer Affymetrix microarray data sets

Comparing gene ranks among different meta-analytic procedures

To evaluate the significance of genes identified by quality-adjusted and quality-unadjusted data integration frameworks under ROM and SDM effect size measures, we compared the ranks of a set of known prostate tumor genes. This set of prostate cancer genes are from two sources: The first one is from Welsh study [38], where they discussed four prostate tumor markers or experimentally validated genes in detail (see page 5977 of their paper); the second one is from Tricoli study [43]. In this study, they surveyed the potential markers in prostate cancers diagnosis and presented a detailed analysis of five of them, which were believed to be the most likely candidates. Here we compared the ranks of the nine genes selected by each of the four meta-analysis methods as shown in Table 4. Comparing WROM with WSMD, seven of the nine genes selected by WROM have better ranks (ranked on the top) than those selected by WSMD. Comparing UWROM with UWSMD, six of the nine genes selected by UWROM have better ranks than those selected by UWSMD. This suggests the genes selected by ROM-based meta-analytic frameworks (quality-adjusted and quality-unadjusted) might be more biologically interesting than those selected by SMD-based meta-analytic frameworks.

Table 4 Rank of known and validated prostate cancer markers

It should be noted that some of the known tumor genes identified by our new methods have much better ranks than the conventional methods. For example, the ranks of tumor genes FASN and TACSTD1 are 15 and 6 by WROM and 13 and 6 by UWROM while the ranks of these genes are 72 and 289 by WSMD and 231 and 413 by UWSMD.

In order to evaluate the overlap between genes identified by our meta-analysis procedures and those identified in a single study, we analyzed each of the three training data sets (Singh study, Welsh study and LaTulipper study) as shown in Table 2 using LIMMA (LIMMA: linear models for microarray data analysis), a widely used method for identifying differentially expressed genes in a single study [2]. Here we report results using data from Singh's study because this study has relatively large sample size (50 normal and 52 tumor samples). Table 4 and Figure 1 show comparison of results identified from analyzing Singh study alone and those from a meta-analysis of the three studies. As shown in Table 4, the ranks of the 9 known tumor genes based on only Singh study are relatively low and closer to those based on SMD-based meta-analysis procedures than those based on the ROM-based meta-analysis procedures, suggesting ROM-based meta-analysis procedures may have better performance than SMD-based meta-analysis procedures. Therefore, it is not surprising that the overlap between genes identified by LIMMA and our SMD-based meta-analysis procedures is higher than those identified by ROM- based meta-analysis procedures as shown in Figure 1.

Figure 1
figure 1

Comparison of meta-analysis and single study analysis. Overlap of top-ranked genes by meta-analysis and single study analysis

Comparing prediction performance of top-ranked meta-signatures among meta-analytic procedures

To further confirm the validity and biological relevance of the meta-signatures identified by the proposed effect size measures and different data integration frameworks, we evaluated the discriminative power for the top 150 differentially expressed genes identified by the four meta-analysis methods, respectively, using an independent data set listed in Table 2 (Stuart study). We varied the number of predictors between 1 and all the 150 selected genes and built the SVM prediction models on the training dataset listed in Table 2 (Singh study, Welsh study and LaTulippe study), the models were then tested separately for each number of genes included as predictors on the test data (Stuart et al. 2004). Figures 2 and 3 show the classification accuracies based on SVM models with linear and radial kernels, respectively. It can be seen that meta-signatures identified by ROM-based meta-analytic procedures (e.g. WROM and UWROM) usually have better prediction accuracies than those identified by SMD-based meta-analytic procedures (e.g. WSMD and UWSMD). We also tried other simpler classification methods, such as diagonal linear discriminant analysis (DLDA) [5], to build the prediction models, and similar results were observed (data not shown).

Figure 2
figure 2

Prediction accuracy of the SVM model with linear kernel. Prediction accuracy of the SVM models as a function of the number of differentially expressed genes selected by the four meta-analytic procedures, respectively

Figure 3
figure 3

Prediction accuracy of the SVM model with radial kernel. Prediction accuracy of the SVM models as a function of the number of differentially expressed genes selected by the four meta-analytic procedures, respectively


Many microarray experiments include only a few replications, therefore, it is critical to improve the effect size estimation in meta-analytic procedure. With small sample sizes, the traditional SMD estimates are prone to unpredictable changes, since gene-specific variability can easily be underestimated resulting in large statistics values. In this study, we re-parameterized the traditional SMD-based effect size measure by using a log ratio of means as an effect size measure for each gene in each study. Our results show the new effect size measure has better performance than the traditional one.

Traditional wisdom for statistical analysis recommends that highly skewed data should be transformed prior to analysis. It is therefore unexpected, perhaps, that the ROM measure (where log transforms are taken after calculating means) gives better prediction accuracy than the SMD measure (where log transformation is done prior to calculating means). Since the signals from Affymetrix are expected to be a mixture of background or non-specific binding and true signal, and only the true signal is expected to follow a power law, using the log transformation up front may be introducing variability, in particular for genes with low levels of expression. Furthermore, for genes whose expression levels change dramatically between experimental groups, the apriori log transformation may be inappropriate in the group with low expression levels.

We noticed that the ranks of some of the known tumor genes (e.g. five candidate markers discussed by Tricoli et al. [43] are relatively low in all four data integration methods (WROM, WSMD, ROM and SMD). There are several possible reasons for this. For example, since the patients used in these studies were collected in different places, there may be clinical heterogeneity, which may result in very different expression profiles of the same gene in different studies. It is also possible that the lower ranks of these tumor genes result from the relatively small sample sizes. Integration of more microarray data sets may lead to the discovery of more robust prostate cancer biomarkers.

Our results show that different predictors, including various combinations of differentially expressed genes can lead to similar prediction accuracy. This can make it challenging to select optimal biomarker sets for clinical use. Our recent study [19] showed that many of the differentially expressed genes which have similar classification results are involved in the same or similar biological pathways. In other words, the genes with the best discriminative power likely correspond to a limited set of biological functions or pathways. Hence, the selection of biomarkers for prediction may need to be based on a combination of statistical results and knowledge of pathways.

It is widely known that data from various sources might contain different informativity for a given biological task (such as differential analysis of gene expression levels between case and control). Some data sources might, for example, be more informative than others. A statistically sound data integration framework should, therefore, take these into account. One approach towards this goal is to develop suitable quality measures for different data types and these measures are then integrated into the statistical models. We used a simple quality measure associated with both log-ratio of means based and standardized mean difference based effect sizes. Our analysis showed this measure works well in the real and simulated data sets.


In summary, we combined estimated ROM-based effect sizes for all studies under two data integration frameworks: the quality-unweighted random effects models and the quality-weighted random effects models [16]. Comparing with the SMD-based effect size measure, our real examples and simulation studies showed that the proposed methods have better power to identify differential expressed genes and the detected genes have better accuracies in predicting cancer outcomes. In conclusion, the new effect size measure and the quality-weighted microarray data integration framework provide efficient way to combine microarray results.



ratio of mean


log ratio of mean used as the effect size measure in weighted meta-analysis Framework


log ratio of mean used as the effect size measure in unweighted meta-analysis framework


standardized mean difference


standardized mean difference used as the effect size measure in weighted meta-analysis framework


standardized mean difference used as the effect size measure in unweighted meta-analysis framework


perfect match




maximum likelihood estimation


non-specific binding


robust multi-array average


summarized receiver operating characteristic


area under the curve


false discovery rate


support vector machines


diagonal linear discriminant analysis.


  1. Tusher V, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA. 2001, 98: 5116-5121. 10.1073/pnas.091062498

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  2. Smyth GK: Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Application in Genetics and Molecular Biology. 2004, l: 3-

    Google Scholar 

  3. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999, 286: 531-536. 10.1126/science.286.5439.531

    Article  CAS  PubMed  Google Scholar 

  4. van't Veer LJ, Dai H, Vijver van de MJ, He YD, Hart AA, Mao M, Peterse HL, Kooy van der K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH: Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002, 419: 624-629. 10.1038/415530a

    Article  Google Scholar 

  5. Dudoit S, Fridlyand J, Speed TP: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association. 2002, 97: 77-87. 10.1198/016214502753479248.

    Article  CAS  Google Scholar 

  6. Xu L, Tan AC, Naiman DQ, Geman D, Winslow RL: Robust prostate cancer gene emerge from direct integration of inter-study microarray data. Bioinformatics. 2005, 21: 3905-3911. 10.1093/bioinformatics/bti647

    Article  CAS  PubMed  Google Scholar 

  7. Tan Y, Shi L, Tong W, Wang C: Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data. Nucleic acids research. 2005, 33: 56-65. 10.1093/nar/gki144

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  8. Bloom G, Yang IV, Boulware D, Kwong KY, Coppola D, Eschrich S, Quackenbush J, Yeatman TJ: Multi-platform, multi-site, microarray-based human tumor classification. American Journal of Pathology. 2004, 164: 9-16.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  9. Warnat P, Eils R, Brors B: cross-platform analysis of cancer micorarray data improves gene expression based classification of phenotypes. BMC Bioinformatics. 2005, 6: 265- 10.1186/1471-2105-6-265

    Article  PubMed Central  PubMed  Google Scholar 

  10. Cruz JA, Wishart DS: Applications of Machine Learning in Cancer Prediction and Prognosis. Cancer Informatics. 2006, 2: 59-78.

    PubMed Central  Google Scholar 

  11. Rhodes DR, Barrette TR, Rubin MA, Ghosh D, Chinnaiyan AM: Meta-analysis of microarrays: inter-study validation of gene expression profiles reveals pathway dysregulation in prostate cancer. Cancer Research. 2002, 62: 4427-4433.

    CAS  PubMed  Google Scholar 

  12. Rhodes D, Yu J, Shanker K, Deshpande N, Varambally R, Ghosh D, Barrette T, Pandey A, Chinnaiyan A: Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proc Natl Acad Sci USA. 2004, 101: 9309-14. 10.1073/pnas.0401994101

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  13. Choi JK, Yu U, Kim S, Yoo OJ: Combining multiple microarray studies and modeling inter-study variation. Bioinformatics. 2003, 19 (Suppl): i84-i90. 10.1093/bioinformatics/btg1010.

    Article  PubMed  Google Scholar 

  14. Jiang H, Deng Y, Chen H, Tao L, Sha Q, Chen J, Tsai C, Zhang S: Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes. BMC Bioinformatics. 2004, 5: 81- 10.1186/1471-2105-5-81

    Article  PubMed Central  PubMed  Google Scholar 

  15. Stevens JR, Doerge RW: Combining Affymetrix microarray results. BMC Bioinformatics. 2005, 6: 57- 10.1186/1471-2105-6-57

    Article  PubMed Central  PubMed  Google Scholar 

  16. Hu P, Celia GMT, Beyene J: Integrative analysis of multiple gene expression profiles with quality-adjusted effect size models. BMC Bioinformatics. 2005, 6: 128- 10.1186/1471-2105-6-128

    Article  PubMed Central  PubMed  Google Scholar 

  17. Wang J, Do KA, Wen S, Tsavachidis S, McDonnell TJ, Logothetis CJ, Coombes KR: Merging microarray data, robust feature selection, and predicting prognosis in prostate cancer. Cancer Informatics. 2006, 2: 87-97.

    PubMed Central  Google Scholar 

  18. Yang X, Sun X: Meta-analysis of several gene lists for distinct types of cancer: A simple way to reveal common prognostic markers. BMC Bioinformatics. 2007, 8: 118- 10.1186/1471-2105-8-118

    Article  PubMed Central  PubMed  Google Scholar 

  19. Hu P, Celia GMT, Beyene J: Integrative Analysis of Gene Expression Data Including an Assessment of Pathway enrichment for Predicting Prostate Cancer. Cancer Informatics. 2006, 2: 289-300.

    PubMed Central  CAS  Google Scholar 

  20. Shabalin AA, Tjelmeland H, Fan C, Perou CM, Nobel AB: Merging two gene expression studies via cross platform normalization. Bioinformatics. 2008, 24: 1154-1160. 10.1093/bioinformatics/btn083

    Article  CAS  PubMed  Google Scholar 

  21. Hu P, Greenwood CMT, Beyene J: Statistical methods for meta-analysis of microarray data: a comparative study. Information Systems Frontiers. 2006, 8: 9-20. 10.1007/s10796-005-6099-z.

    Article  Google Scholar 

  22. DeConde RP, Hawley S, Falcon S, Clegg N, Knudsen B, Etzioni R: Combining results of microarray experiments: A rank aggregation approach. Statistical Application in Genetics and Molecular Biology. 2006, 5: 15-

    Article  Google Scholar 

  23. DuMouchel WH, Harris JE: Bayes methods for combining the results of cancer studies in humans and other species. Journal of the American Statistical Association. 1983, 78: 293-315. 10.2307/2288631.

    Article  Google Scholar 

  24. Smith TC, Spiegelhalter DJ, Thomas A: Bayesian approaches to random-effects meta-analysis: a comparative study. Stat Med. 1995, 14: 2685-2699. 10.1002/sim.4780142408

    Article  CAS  PubMed  Google Scholar 

  25. Cooper H, Hedges LV: The handbook of research synthesis. 1994, New York: Russell Sage

    Google Scholar 

  26. Hedges LV, Olkin I: Statistical methods for meta-analysis. 1995, Orlando, FL: Academic Press

    Google Scholar 

  27. Parmigiani G, Garrett-Mayer ES, Anbazhagan R, Gabrielson E: A cross-study comparison of gene expression studies for the molecular classification of lung cancer. Clinical Cancer Research. 2004, 10: 2922-2927. 10.1158/1078-0432.CCR-03-0490

    Article  CAS  PubMed  Google Scholar 

  28. Guo L, Lobenhofer EK, Wang C, Shippy R, Harris SC, Zhang L, Mei N, Chen T, Herman D, Goodsaid FM, Hurban P, Phillips KL, Xu J, Deng X, Sun YA, Tong W, Dragan YP, Shi L: Rat toxicogenomic study reveals analytical consistency across microarray platforms. Nature Biotechnology. 2006, 24: 1162-1169. 10.1038/nbt1238

    Article  CAS  PubMed  Google Scholar 

  29. Shi , et al.: The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nature biotechnology. 2006, 24: 1151-1161. 10.1038/nbt1239

    Article  CAS  PubMed  Google Scholar 

  30. Tritchler D: Modelling study quality in meta-analysis. Statistics in Medicine. 1999, 18: 2135-2145. 10.1002/(SICI)1097-0258(19990830)18:16<2135::AID-SIM183>3.0.CO;2-5

    Article  CAS  PubMed  Google Scholar 

  31. Hu P, Beyene J, Greenwood CMT: Tests for differential gene expression using weights in oligonucleotide microarray experiments. BMC Genomics. 2006, 8: 9-20.

    Google Scholar 

  32. Heber S, Sick B: Quality assessment of Affymetrix GeneChip data. OMICS: A Journal of Integrative Biology. 2006, 10: 358-368. 10.1089/omi.2006.10.358

    Article  CAS  PubMed  Google Scholar 

  33. Affymetrix -- Technical Manual.

  34. Oehlert GW: A Note on the delta method. The American Statistician. 1992, 46: 27-29. 10.2307/2684406.

    Google Scholar 

  35. Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B. 1995, 85: 289-300.

    Google Scholar 

  36. Wu Z, Irizarry RA, Gentleman R, Martinez MF, Spencer F: A Model Based Background Adjustement for Oligonucleotide Expression Arrays. Journal of the American Statistical Association. 2004, 99: 909-915. 10.1198/016214504000000683.

    Article  Google Scholar 

  37. Hanley JA, McNeil BJ: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982, 143: 29-36.

    Article  CAS  PubMed  Google Scholar 

  38. Welsh JB, Sapinoso LM, Su AI, Kern SG, Wang J, et al.: Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. Cancer Research. 2001, 61: 5974-5978.

    CAS  PubMed  Google Scholar 

  39. Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D'Amico AV, Richie JP, Lander ES, Loda M, Kantoff PW, Golub TR, Sellers WR: Gene expression correlates of clinical prostate cancer behaviour. Cancer Cell. 2002, 1: 203-209. 10.1016/S1535-6108(02)00030-2

    Article  CAS  PubMed  Google Scholar 

  40. LaTulippe E, Satagopan J, Smith A, Scher H, Scardino P, Reuter V, Gerald WL: Comprehensive gene expression analysis of prostate cancer reveals distinct transcriptional programs associated with metastatic disease. Cancer Research. 2002, 62: 4499-4506.

    CAS  PubMed  Google Scholar 

  41. Stuart RO, Wachsman W, Berry CC, Wang-Rodriguez J, Wasserman L, Klacansky I, Masys D, Arden K, Goodison S, McClelland M, Wang Y, Sawyers A, Kalcheva I, Tarin D, Mercola D: In silico dissection of cell-type-associated patterns of gene expression in prostate cancer. Proc Natl Acad Sci USA. 2004, 101: 615-620. 10.1073/pnas.2536479100

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  42. Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP: Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Research. 2003, 31: E15- 10.1093/nar/gng015

    Article  PubMed Central  PubMed  Google Scholar 

  43. Tricoli JV, Schoenfeldt M, Conley BA: Detection of prostate cancer and predicting progression: current and future diagnostic markers. Clinical cancer research. 2004, 10: 3943-3953. 10.1158/1078-0432.CCR-03-0200

    Article  PubMed  Google Scholar 

Download references


This work was partially supported by grants from the Canadian Institutes of Health Research (CIHR) (grant number 84392), the Natural Sciences and Engineering Research, Council of Canada (NSERC), the Mathematics of Information Technology and Complex Systems (MITACS), and Genome Canada through the Ontario Genomics Institute. We would like to thank three anonymous reviewers for their helpful comments and suggestions.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Joseph Beyene.

Additional information

Authors' contributions

JB initiated the study and proposed the ratio of means effect size measure and weighted data integration framework. CG proposed the quality weight measure and simulation framework. PH carried out all the data analysis and drafted the manuscript. All authors read, contributed to, and approved the final manuscript.

Electronic supplementary material

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Hu, P., Greenwood, C.M. & Beyene, J. Using the ratio of means as the effect size measure in combining results of microarray experiments. BMC Syst Biol 3, 106 (2009).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: