 Methodology article
 Open Access
 Published:
Using the ratio of means as the effect size measure in combining results of microarray experiments
BMC Systems Biology volume 3, Article number: 106 (2009)
Abstract
Background
Development of efficient analytic methodologies for combining microarray results is a major challenge in gene expression analysis. The widely used effect size models are thought to provide an efficient modeling framework for this purpose, where the measures of association for each study and each gene are combined, weighted by the standard errors. A significant disadvantage of this strategy is that the quality of different data sets may be highly variable, but this information is usually neglected during the integration. Moreover, it is widely known that the estimated standard deviations are probably unstable in the commonly used effect size measures (such as standardized mean difference) when sample sizes in each group are small.
Results
We propose a reparameterization of the traditional mean difference based effect measure by using the log ratio of means as an effect size measure for each gene in each study. The estimated effect sizes for all studies were then combined under two modeling frameworks: the qualityunweighted random effects models and the qualityweighted random effects models. We defined the quality measure as a function of the detection pvalue, which indicates whether a transcript is reliably detected or not on the Affymetrix gene chip. The new effect size measure is evaluated and compared under the qualityweighted and qualityunweighted data integration frameworks using simulated data sets, and also in several data sets of prostate cancer patients and controls. We focus on identifying differentially expressed biomarkers for prediction of cancer outcomes.
Conclusion
Our results show that the proposed effect size measure (log ratio of means) has better power to identify differentially expressed genes, and that the detected genes have better performance in predicting cancer outcomes than the commonly used effect size measure, the standardized mean difference (SMD), under both qualityweighted and qualityunweighted data integration frameworks. The new effect size measure and the qualityweighted microarray data integration framework provide efficient ways to combine microarray results.
Background
Microarray technology has been widely used in identifying differentially expressed genes [1, 2] and in building predictors for disease outcome diagnosis [3–7]. Although individual microarray studies can be highly informative for this purpose (e.g. van 'tVeer et al., [4]), it is difficult to make a direct comparison among the results obtained by different groups addressing similar biological problems, since laboratory protocols, microarray platforms and analysis techniques used in each study may not be identical [8, 9]. Moreover, most individual studies have relatively small sample sizes, and hence prediction models trained on individual studies by using crossvalidation procedures are prone to overfitting, leading to prediction accuracies that are overestimated and lack generalizability [10].
Recent studies show that systematic integration of gene expression data from different sources can increase statistical power to detect differentially expressed genes while allowing for an assessment of heterogeneity [11–18], and may lead to more robust, reproducible and accurate predictions [19]. Therefore, our ability to develop powerful statistical methods for efficiently integrating related genomic experiments is critical to the success of the massive investment made on genomic studies. Broadly speaking, the strategies to integrate microarray studies can be divided into three categories:
The first category is a combined analysis of all the data. Each data set is first preprocessed to clean and align the signals, and then these preprocessed datasets are put together so that the integrated data set can be treated as though it comes from a single study. In this way, the effective sample size is greatly increased. Several transformation methods have been proposed to process gene expression measures from different studies [9, 14, 17, 20]. For example, Jiang et al. [14] transformed the normalized data sets to have similar distributions and then put the data sets together. Wang et al. [17] standardized gene expression levels based on the means and standard deviations of expression measurements from the arrays of healthy prostate samples. These methods are simple and in many cases, if the transformation is carefully made, the performance of disease outcome prediction can be improved [14]. Nevertheless, there are no consensus or clear guidelines on the best way to perform the necessary data transformations.
The second strategy is to combine analysis results obtained from each study. The basic idea is to combine evidence of differential expression using a summary statistic, such as the pvalue, across multiple gene profiling studies and then to adjust for multiple testing. For example, Rhodes et al. [11, 12] combined results from four prostate cancer microarray datasets analyzed on different platforms. Differential expression between the prostate tumor group and the normal group was first assessed independently for each gene in each dataset using the statistical confidence measure, the pvalue. Then the studyspecific pvalues were combined, using the result that 2 log(pvalue) has a chisquared distribution under the null hypothesis of no differential expression. The analysis revealed that stronger significance was obtained from the combined analysis than from the individual studies. Combining pvalues is useful in obtaining more precise estimates of significance, but this method does not indicate the direction of significance (e.g., upor downregulation) [21]. Instead of integrating pvalues directly, some studies explored combining ranks of statistics from different studies [18, 22]. For example, DeConde et al. [22] proposed a rankaggregation method to combine final microarray results from five prostate cancer studies. The method summarizes majority preferences between pairs of genes across ranked list from different studies. They found this method more reliably identifies differentially expressed genes across studies.
The third strategy involves taking interstudy variability into account when estimating the overall effect for each gene across studies, and then basing conclusions on the distribution of these overall measures. For example, Choi et al. [13] focused on integrating effect size estimates in individual studies into an overall estimate of the average effect size. The effect size is normally used to measure the magnitude of treatment effect in a given study. Interstudy variability was included in the model with an associated prior distribution. This type of model, also termed hierarchical Bayesian random effects, has been used broadly in nonmicroarray contexts (e.g., DuMouchel and Harris [23]; Smith et al., [24]). Using the same microarray datasets as those used by Rhodes et al. [11], they demonstrated that their method can lead to the discovery of small but consistent expression changes with increased sensitivity and reliability among the datasets. The hierarchical Bayesian random effects metaanalysis model has several favorable features: it provides an overall effect size, and it accounts for interstudy variability, which may improve accuracy of results.
The widely used effect size measure in this type of models is the standardized mean difference [25, 26]. It has been wellknown in microarray data analysis that the estimated standard deviation is probably unstable when sample size in each group is small. Therefore, many efforts have been made to overcome the shortcoming by estimating a penalty parameter for smoothing the estimates using information from all genes rather than relying solely on the estimates from an individual gene [1, 27].
However, recent studies show that differentially expressed genes may be best identified using foldchange measures rather than tlike statistics [28]. Fold change is a commonly used measure in small laboratory experiments of gene expression; it is considered to be a natural measure for gene expression changes [29]. In highthroughput microarray analysis, properties of fold change statistics have received little attention. Therefore, more investigation on reparameterization of effect size measures is needed.
Most data integration papers in microarray analysis have not used measures of quality to refine their analyses [9, 11–15, 17, 20, 22]. Nevertheless, in classical metaanalysis, quality measures have often been used when combining results across studies. It has been argued that studies of a higher quality will give more accurate estimates of the true parameter of interest, and therefore studies of high quality should receive a higher weight in the analysis summarizing across studies [30]. In gene expression microarrays, many genes may be "off" or not detectable in a particular adult tissue, and in addition, some genes may be poorly measured due to probes that are not sufficiently sensitive or specific. Therefore, the signal strength and clarity will vary across the genes, suggesting that a quality measurement could highlight strong clear signals [31, 32]. Although it is still an open question how to best measure the quality of a gene expression measurement, and how best to use such a quality measure, different strategies can be considered for incorporating quality weights into metaanalysis of microarray studies. For example, we can define a quality threshold and only include genes that are above this threshold in the metaanalysis. However, the choice of threshold will be arbitrary. In a recent study, we proposed a quality measure based on the detection pvalues estimated from Affymetrix microarray raw data [16, 31]. Using an effectsize model, we demonstrated that the incorporation of quality weights into the studyspecific test statistics, within a metaanalysis of two Affymetrix microarray studies, produced more biological meaningful results than the unweighted analysis [16].
In this paper, we reparameterize the effect size measure for each gene in each study as the log ratio of the mean expressions of the two groups being compared. Following the method proposed by Hu et al. [16], we then place the new effect size measure into a qualityweighted modeling framework. We evaluate and compare the effect size measures (new and old) under the qualityweighted and qualityunweighted data integration frameworks using simulated data sets and real data sets with focus on identifying differentially expressed biomarkers and their performance on cancer outcome prediction.
Methods
Quality score measure for Affymetrix microarray data
For Affymetrix expression data, we previously developed a quality measure based on the detection pvalues [33] that reflects whether the transcript is reliably expressed above the background in at least one experimental group in each study [16, 31] (see Additional file 1). The sensitivity parameter, v, that alters the tolerance of the quality weight to the detection pvalue significance levels, was set to 0.05.
Using log ratio of means as effect size measure
There are many ways to measure effect size for gene g in individual study [25]. A commonly used way is the standardized mean difference (SMD). Let r_{ gl }represent the raw expression value for gene g and subject l and x_{ gl }= log(r_{ gl }). The standardized mean difference (SMD) of x_{ gl }is given by
where and are the sample means of logged gene expression values for gene g in treatment group (t) and control group (c) in a given study, respectively. is the pooled standard deviation for gene g. The estimated variance of the unbiased effect size y_{g 1}is given by Cooper and Hedges [25]
For a study with n(n = n_{ t }+n_{ c }) samples, an approximately unbiased estimate of y_{g 1}is given by [26].
Here, we propose an alternative method to measure effect size based on the log ratio of means (ROM), that is, the log foldchange given by
In contrast to the previous approach, here and are the sample means of unlog transformed gene expression values for gene g in treatment group (t) and control group (c) in a given study, respectively. The estimated variance of the effect size y_{g 2}can be estimated using delta method [34] as follows
where and are the variances of the treatment and control groups, respectively.
Integrative analysis of effect sizes in a qualityadjusted modeling framework
Any defined quality measure can be incorporated into integrative analysis of gene expression profiles using a qualityadjusted metaanalysis framework [16]. The rationale of the framework is that studies of a high quality should receive a higher weight in the analysis summarizing across studies [30]. Here, we follow Hu et al. [16] to place either the SMD effect size measure y_{g 1}or the ROM effect size measure y_{g 2}into a hierarchical model and to test for differences between groups. For either measure, we can write, for study i and measure m (m ∈ SMD or ROM),
where is the betweenstudy variability of gene g with effect size measure m, μ_{ gm }represents the average measure of differential expression across the I studies for gene g. Here, and μ_{ gm }are genespecific while and y_{ igm }are gene and studyspecific (i = 1,2,...,I). The quantity measures the effect size variance of gene g, measuring the sampling error for the i^{th}study. Following Hu et al. [16], we can estimate μ_{ gm }by taking the quality q_{ ig }for gene g and study i into account
where q_{ ig }and y_{ igm }are quality measure and the estimated effect size based on measure m for gene g in study i, respectively. and is the betweenstudy variability [13]. Here we used a randomeffects model to combine the estimated effect sizes (see Additional file 1). The variance of this estimator is obtained by
A test statistic to evaluate differential expression of gene g across all I studies can then be computed as
We evaluated the statistical significance of gene g by calculating the pvalue corresponding to the z statistic; then we estimated the false discovery rates (FDR) for each significance level, to take into account the number of tests performed [35]. A detailed description of the integrative analysis of effect sizes can be found in the see Additional file 1.
We refer the approaches of estimating z_{ gm }using either the log ratio of means (m = 2) or the standardized mean difference (m = 1) as WROM and WSMD, respectively, in the qualityadjusted modeling framework, and as UWROM and UWSMD, respectively, in the qualityunadjusted modeling framework, where q_{ ig }= 1.
Simulations
Model probelevel gene expression profile in a single study
Following previous studies to generate Affymetrix probe level data [31, 36], we modeled the probelevel gene expression for different conditions (e.g. cancer and normal samples) in a single study as:
where Y_{ jgk }and W_{ jgk }are PM and MM intensities for the probe j in probeset g on array k respectively. O denotes optical noise, independently drawing from and [36]. represent nonspecific binding (NSB) noise for PM (XX = PM) and MM (XX = MM), respectively. We set μ^{MM}= μ^{PM}= 4.6 and assumed that and follow a bivariate normal distribution with mean 0, variance 1, and correlation 0.88. We then generated identically and independently distributed random variates e ~ N(0,0.08), so that and similarly . are quantity proportional to RNA expression for PM (XX = PM) and MM (XX = MM), respectively, and the coefficient 0 < Φ < 1 accounts for the fact that for some probepairs the MM detects signal; When probe j of gene g is attached by picking up stray signal, Φ_{ jg }is generated as Φ_{ jg }~Beta(0.5,5), otherwise, Φ_{ gj }= 0. Since S follows a power law, we set its base to 2. Therefore, if we denote γ_{ g }as the baseline log expression level for probeset g, we can select log_{2}(γ_{ g }) expression levels from 0 to 12, which can be generated from γ_{ g }~12* Beta(1,3)+1. δ_{ g }is the expected differential expression of gene g in covariate X. α_{ jgk }is the signal detecting ability of probe j in gene g on array k, which is assumed to follow a normal distribution with mean zero and signal detection variance . We generated multiplicative errors and independently from N(0, ).
Generate simulated data sets for multiple studies
We generated two Affymetrix microarray data sets, which are assumed to be from two independent studies. In each of the two data sets, we assume treatment group t and control group c with and arrays in the i^{th}study, respectively. We generated G genes and assume the proportion of expressed genes is q and the proportion of differentially expressed genes is d of the G*q expressed genes in each study. We ran three simulation models following the above design by varying treatment effects on the signal between 1.0 (small) and 2 (large) with interval 0.5. The specific parameters used in the five models are summarized in Table 1:
We used summarized receiver operating characteristic (SROC) curves to compare performance, where the test sensitivities and specificities (true positive and true negative proportions) for a range of pvalue cutoffs were averaged over 500 simulated datasets in each study. The SROC curve's overall behavior can be measured by the area under the curve (AUC) [37].
Affymetrix Microarray data
We used gene expression data on prostate tumours and controls from four studies [38–41]. The datasets will be referred to by the name of the first author. All these datasets are either publicly available or obtainable upon request. Information about these datasets, such as microarray platforms, the number of samples available, etc, is listed in Table 2. For these four data sets, we used the robust multiarray average (RMA) algorithm [42] to get summarized probesetlevel expression data, and then we obtained the unlogged normalized expression data. There are 12,600 common probesets across the four data sets. We performed integrative analysis using the first three data sets in the table (the Welsh data, the LaTulippe data, and the Singh data) to identify differentially expressed genes and then developed our predictive models (the "training data") based on the selected genes. The fourth data set (the Stuart data) was used for testing the models (the "testing" data).
Results
Analysis of simulated data sets
We evaluated the performance of our method using simulated Affymetrix probe level expression data generated from a model incorporating probe level effects, optical noise, and nonspecific binding, as well as true signals [31, 36]. Following the simulation procedures described in Methods section, we run three simulation models for probelevel gene expression profiles generated from two independent studies. Treatment effects on the signal were varied between 1.0 (small) and 2.0 (large) in the three models. Table 3 shows AUCs for the three simulation models under different weighting and effect size parameterization strategies. As seen from the table, the qualityweighted data integration framework produces better performance than the qualityunweighted data integration framework for SMD and ROMbased effect size (It should be noted that the normalized gene expression values for SMD and ROMbased effect sizes are given in log2 and natural scale, respectively), respectively. In terms of the effect size measures, the proposed log ratio of mean method has higher sensitivity than the standardized mean difference method.
Analysis of prostate cancer Affymetrix microarray data sets
Comparing gene ranks among different metaanalytic procedures
To evaluate the significance of genes identified by qualityadjusted and qualityunadjusted data integration frameworks under ROM and SDM effect size measures, we compared the ranks of a set of known prostate tumor genes. This set of prostate cancer genes are from two sources: The first one is from Welsh study [38], where they discussed four prostate tumor markers or experimentally validated genes in detail (see page 5977 of their paper); the second one is from Tricoli study [43]. In this study, they surveyed the potential markers in prostate cancers diagnosis and presented a detailed analysis of five of them, which were believed to be the most likely candidates. Here we compared the ranks of the nine genes selected by each of the four metaanalysis methods as shown in Table 4. Comparing WROM with WSMD, seven of the nine genes selected by WROM have better ranks (ranked on the top) than those selected by WSMD. Comparing UWROM with UWSMD, six of the nine genes selected by UWROM have better ranks than those selected by UWSMD. This suggests the genes selected by ROMbased metaanalytic frameworks (qualityadjusted and qualityunadjusted) might be more biologically interesting than those selected by SMDbased metaanalytic frameworks.
It should be noted that some of the known tumor genes identified by our new methods have much better ranks than the conventional methods. For example, the ranks of tumor genes FASN and TACSTD1 are 15 and 6 by WROM and 13 and 6 by UWROM while the ranks of these genes are 72 and 289 by WSMD and 231 and 413 by UWSMD.
In order to evaluate the overlap between genes identified by our metaanalysis procedures and those identified in a single study, we analyzed each of the three training data sets (Singh study, Welsh study and LaTulipper study) as shown in Table 2 using LIMMA (LIMMA: linear models for microarray data analysis), a widely used method for identifying differentially expressed genes in a single study [2]. Here we report results using data from Singh's study because this study has relatively large sample size (50 normal and 52 tumor samples). Table 4 and Figure 1 show comparison of results identified from analyzing Singh study alone and those from a metaanalysis of the three studies. As shown in Table 4, the ranks of the 9 known tumor genes based on only Singh study are relatively low and closer to those based on SMDbased metaanalysis procedures than those based on the ROMbased metaanalysis procedures, suggesting ROMbased metaanalysis procedures may have better performance than SMDbased metaanalysis procedures. Therefore, it is not surprising that the overlap between genes identified by LIMMA and our SMDbased metaanalysis procedures is higher than those identified by ROM based metaanalysis procedures as shown in Figure 1.
Comparing prediction performance of topranked metasignatures among metaanalytic procedures
To further confirm the validity and biological relevance of the metasignatures identified by the proposed effect size measures and different data integration frameworks, we evaluated the discriminative power for the top 150 differentially expressed genes identified by the four metaanalysis methods, respectively, using an independent data set listed in Table 2 (Stuart study). We varied the number of predictors between 1 and all the 150 selected genes and built the SVM prediction models on the training dataset listed in Table 2 (Singh study, Welsh study and LaTulippe study), the models were then tested separately for each number of genes included as predictors on the test data (Stuart et al. 2004). Figures 2 and 3 show the classification accuracies based on SVM models with linear and radial kernels, respectively. It can be seen that metasignatures identified by ROMbased metaanalytic procedures (e.g. WROM and UWROM) usually have better prediction accuracies than those identified by SMDbased metaanalytic procedures (e.g. WSMD and UWSMD). We also tried other simpler classification methods, such as diagonal linear discriminant analysis (DLDA) [5], to build the prediction models, and similar results were observed (data not shown).
Discussion
Many microarray experiments include only a few replications, therefore, it is critical to improve the effect size estimation in metaanalytic procedure. With small sample sizes, the traditional SMD estimates are prone to unpredictable changes, since genespecific variability can easily be underestimated resulting in large statistics values. In this study, we reparameterized the traditional SMDbased effect size measure by using a log ratio of means as an effect size measure for each gene in each study. Our results show the new effect size measure has better performance than the traditional one.
Traditional wisdom for statistical analysis recommends that highly skewed data should be transformed prior to analysis. It is therefore unexpected, perhaps, that the ROM measure (where log transforms are taken after calculating means) gives better prediction accuracy than the SMD measure (where log transformation is done prior to calculating means). Since the signals from Affymetrix are expected to be a mixture of background or nonspecific binding and true signal, and only the true signal is expected to follow a power law, using the log transformation up front may be introducing variability, in particular for genes with low levels of expression. Furthermore, for genes whose expression levels change dramatically between experimental groups, the apriori log transformation may be inappropriate in the group with low expression levels.
We noticed that the ranks of some of the known tumor genes (e.g. five candidate markers discussed by Tricoli et al. [43] are relatively low in all four data integration methods (WROM, WSMD, ROM and SMD). There are several possible reasons for this. For example, since the patients used in these studies were collected in different places, there may be clinical heterogeneity, which may result in very different expression profiles of the same gene in different studies. It is also possible that the lower ranks of these tumor genes result from the relatively small sample sizes. Integration of more microarray data sets may lead to the discovery of more robust prostate cancer biomarkers.
Our results show that different predictors, including various combinations of differentially expressed genes can lead to similar prediction accuracy. This can make it challenging to select optimal biomarker sets for clinical use. Our recent study [19] showed that many of the differentially expressed genes which have similar classification results are involved in the same or similar biological pathways. In other words, the genes with the best discriminative power likely correspond to a limited set of biological functions or pathways. Hence, the selection of biomarkers for prediction may need to be based on a combination of statistical results and knowledge of pathways.
It is widely known that data from various sources might contain different informativity for a given biological task (such as differential analysis of gene expression levels between case and control). Some data sources might, for example, be more informative than others. A statistically sound data integration framework should, therefore, take these into account. One approach towards this goal is to develop suitable quality measures for different data types and these measures are then integrated into the statistical models. We used a simple quality measure associated with both logratio of means based and standardized mean difference based effect sizes. Our analysis showed this measure works well in the real and simulated data sets.
Conclusion
In summary, we combined estimated ROMbased effect sizes for all studies under two data integration frameworks: the qualityunweighted random effects models and the qualityweighted random effects models [16]. Comparing with the SMDbased effect size measure, our real examples and simulation studies showed that the proposed methods have better power to identify differential expressed genes and the detected genes have better accuracies in predicting cancer outcomes. In conclusion, the new effect size measure and the qualityweighted microarray data integration framework provide efficient way to combine microarray results.
Abbreviations
 ROM:

ratio of mean
 WROM:

log ratio of mean used as the effect size measure in weighted metaanalysis Framework
 UWROM:

log ratio of mean used as the effect size measure in unweighted metaanalysis framework
 SMD:

standardized mean difference
 WSMD:

standardized mean difference used as the effect size measure in weighted metaanalysis framework
 UWSMD:

standardized mean difference used as the effect size measure in unweighted metaanalysis framework
 PM:

perfect match
 MM:

mismatch
 MLE:

maximum likelihood estimation
 NSB:

nonspecific binding
 RMA:

robust multiarray average
 SROC:

summarized receiver operating characteristic
 AUC:

area under the curve
 FDR:

false discovery rate
 SVM:

support vector machines
 DLDA:

diagonal linear discriminant analysis.
References
 1.
Tusher V, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA. 2001, 98: 51165121. 10.1073/pnas.091062498
 2.
Smyth GK: Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Application in Genetics and Molecular Biology. 2004, l: 3
 3.
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999, 286: 531536. 10.1126/science.286.5439.531
 4.
van't Veer LJ, Dai H, Vijver van de MJ, He YD, Hart AA, Mao M, Peterse HL, Kooy van der K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH: Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002, 419: 624629. 10.1038/415530a
 5.
Dudoit S, Fridlyand J, Speed TP: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association. 2002, 97: 7787. 10.1198/016214502753479248.
 6.
Xu L, Tan AC, Naiman DQ, Geman D, Winslow RL: Robust prostate cancer gene emerge from direct integration of interstudy microarray data. Bioinformatics. 2005, 21: 39053911. 10.1093/bioinformatics/bti647
 7.
Tan Y, Shi L, Tong W, Wang C: Multiclass cancer classification by total principal component regression (TPCR) using microarray gene expression data. Nucleic acids research. 2005, 33: 5665. 10.1093/nar/gki144
 8.
Bloom G, Yang IV, Boulware D, Kwong KY, Coppola D, Eschrich S, Quackenbush J, Yeatman TJ: Multiplatform, multisite, microarraybased human tumor classification. American Journal of Pathology. 2004, 164: 916.
 9.
Warnat P, Eils R, Brors B: crossplatform analysis of cancer micorarray data improves gene expression based classification of phenotypes. BMC Bioinformatics. 2005, 6: 265 10.1186/147121056265
 10.
Cruz JA, Wishart DS: Applications of Machine Learning in Cancer Prediction and Prognosis. Cancer Informatics. 2006, 2: 5978.
 11.
Rhodes DR, Barrette TR, Rubin MA, Ghosh D, Chinnaiyan AM: Metaanalysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. Cancer Research. 2002, 62: 44274433.
 12.
Rhodes D, Yu J, Shanker K, Deshpande N, Varambally R, Ghosh D, Barrette T, Pandey A, Chinnaiyan A: Largescale metaanalysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proc Natl Acad Sci USA. 2004, 101: 930914. 10.1073/pnas.0401994101
 13.
Choi JK, Yu U, Kim S, Yoo OJ: Combining multiple microarray studies and modeling interstudy variation. Bioinformatics. 2003, 19 (Suppl): i84i90. 10.1093/bioinformatics/btg1010.
 14.
Jiang H, Deng Y, Chen H, Tao L, Sha Q, Chen J, Tsai C, Zhang S: Joint analysis of two microarray geneexpression data sets to select lung adenocarcinoma marker genes. BMC Bioinformatics. 2004, 5: 81 10.1186/14712105581
 15.
Stevens JR, Doerge RW: Combining Affymetrix microarray results. BMC Bioinformatics. 2005, 6: 57 10.1186/14712105657
 16.
Hu P, Celia GMT, Beyene J: Integrative analysis of multiple gene expression profiles with qualityadjusted effect size models. BMC Bioinformatics. 2005, 6: 128 10.1186/147121056128
 17.
Wang J, Do KA, Wen S, Tsavachidis S, McDonnell TJ, Logothetis CJ, Coombes KR: Merging microarray data, robust feature selection, and predicting prognosis in prostate cancer. Cancer Informatics. 2006, 2: 8797.
 18.
Yang X, Sun X: Metaanalysis of several gene lists for distinct types of cancer: A simple way to reveal common prognostic markers. BMC Bioinformatics. 2007, 8: 118 10.1186/147121058118
 19.
Hu P, Celia GMT, Beyene J: Integrative Analysis of Gene Expression Data Including an Assessment of Pathway enrichment for Predicting Prostate Cancer. Cancer Informatics. 2006, 2: 289300.
 20.
Shabalin AA, Tjelmeland H, Fan C, Perou CM, Nobel AB: Merging two gene expression studies via cross platform normalization. Bioinformatics. 2008, 24: 11541160. 10.1093/bioinformatics/btn083
 21.
Hu P, Greenwood CMT, Beyene J: Statistical methods for metaanalysis of microarray data: a comparative study. Information Systems Frontiers. 2006, 8: 920. 10.1007/s107960056099z.
 22.
DeConde RP, Hawley S, Falcon S, Clegg N, Knudsen B, Etzioni R: Combining results of microarray experiments: A rank aggregation approach. Statistical Application in Genetics and Molecular Biology. 2006, 5: 15
 23.
DuMouchel WH, Harris JE: Bayes methods for combining the results of cancer studies in humans and other species. Journal of the American Statistical Association. 1983, 78: 293315. 10.2307/2288631.
 24.
Smith TC, Spiegelhalter DJ, Thomas A: Bayesian approaches to randomeffects metaanalysis: a comparative study. Stat Med. 1995, 14: 26852699. 10.1002/sim.4780142408
 25.
Cooper H, Hedges LV: The handbook of research synthesis. 1994, New York: Russell Sage
 26.
Hedges LV, Olkin I: Statistical methods for metaanalysis. 1995, Orlando, FL: Academic Press
 27.
Parmigiani G, GarrettMayer ES, Anbazhagan R, Gabrielson E: A crossstudy comparison of gene expression studies for the molecular classification of lung cancer. Clinical Cancer Research. 2004, 10: 29222927. 10.1158/10780432.CCR030490
 28.
Guo L, Lobenhofer EK, Wang C, Shippy R, Harris SC, Zhang L, Mei N, Chen T, Herman D, Goodsaid FM, Hurban P, Phillips KL, Xu J, Deng X, Sun YA, Tong W, Dragan YP, Shi L: Rat toxicogenomic study reveals analytical consistency across microarray platforms. Nature Biotechnology. 2006, 24: 11621169. 10.1038/nbt1238
 29.
Shi , et al.: The MicroArray Quality Control (MAQC) project shows inter and intraplatform reproducibility of gene expression measurements. Nature biotechnology. 2006, 24: 11511161. 10.1038/nbt1239
 30.
Tritchler D: Modelling study quality in metaanalysis. Statistics in Medicine. 1999, 18: 21352145. 10.1002/(SICI)10970258(19990830)18:16<2135::AIDSIM183>3.0.CO;25
 31.
Hu P, Beyene J, Greenwood CMT: Tests for differential gene expression using weights in oligonucleotide microarray experiments. BMC Genomics. 2006, 8: 920.
 32.
Heber S, Sick B: Quality assessment of Affymetrix GeneChip data. OMICS: A Journal of Integrative Biology. 2006, 10: 358368. 10.1089/omi.2006.10.358
 33.
Affymetrix  Technical Manual. http://www.affymetrix.com/support/technical/manual/expression_manual.affx
 34.
Oehlert GW: A Note on the delta method. The American Statistician. 1992, 46: 2729. 10.2307/2684406.
 35.
Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B. 1995, 85: 289300.
 36.
Wu Z, Irizarry RA, Gentleman R, Martinez MF, Spencer F: A Model Based Background Adjustement for Oligonucleotide Expression Arrays. Journal of the American Statistical Association. 2004, 99: 909915. 10.1198/016214504000000683.
 37.
Hanley JA, McNeil BJ: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982, 143: 2936.
 38.
Welsh JB, Sapinoso LM, Su AI, Kern SG, Wang J, et al.: Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. Cancer Research. 2001, 61: 59745978.
 39.
Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D'Amico AV, Richie JP, Lander ES, Loda M, Kantoff PW, Golub TR, Sellers WR: Gene expression correlates of clinical prostate cancer behaviour. Cancer Cell. 2002, 1: 203209. 10.1016/S15356108(02)000302
 40.
LaTulippe E, Satagopan J, Smith A, Scher H, Scardino P, Reuter V, Gerald WL: Comprehensive gene expression analysis of prostate cancer reveals distinct transcriptional programs associated with metastatic disease. Cancer Research. 2002, 62: 44994506.
 41.
Stuart RO, Wachsman W, Berry CC, WangRodriguez J, Wasserman L, Klacansky I, Masys D, Arden K, Goodison S, McClelland M, Wang Y, Sawyers A, Kalcheva I, Tarin D, Mercola D: In silico dissection of celltypeassociated patterns of gene expression in prostate cancer. Proc Natl Acad Sci USA. 2004, 101: 615620. 10.1073/pnas.2536479100
 42.
Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP: Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Research. 2003, 31: E15 10.1093/nar/gng015
 43.
Tricoli JV, Schoenfeldt M, Conley BA: Detection of prostate cancer and predicting progression: current and future diagnostic markers. Clinical cancer research. 2004, 10: 39433953. 10.1158/10780432.CCR030200
Acknowledgements
This work was partially supported by grants from the Canadian Institutes of Health Research (CIHR) (grant number 84392), the Natural Sciences and Engineering Research, Council of Canada (NSERC), the Mathematics of Information Technology and Complex Systems (MITACS), and Genome Canada through the Ontario Genomics Institute. We would like to thank three anonymous reviewers for their helpful comments and suggestions.
Author information
Additional information
Authors' contributions
JB initiated the study and proposed the ratio of means effect size measure and weighted data integration framework. CG proposed the quality weight measure and simulation framework. PH carried out all the data analysis and drafted the manuscript. All authors read, contributed to, and approved the final manuscript.
Electronic supplementary material
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
About this article
Received
Accepted
Published
DOI
Keywords
 Standardize Mean Difference
 Tumor Gene
 Effect Size Measure
 Diagonal Linear Discriminant Analysis
 Affymetrix Microarray Data