Meta-Analysis of Genome-Wide Association Studies: No Efficiency Gain in Using Individual Participant Data

D Y Lin; D Zeng

doi:10.1002/gepi.20435

. Author manuscript; available in PMC: 2014 Jan 2.

Published in final edited form as: Genet Epidemiol. 2010 Jan;34(1):10.1002/gepi.20435. doi: 10.1002/gepi.20435

Meta-Analysis of Genome-Wide Association Studies: No Efficiency Gain in Using Individual Participant Data

D Y Lin ¹, D Zeng ¹

PMCID: PMC3878085 NIHMSID: NIHMS141823 PMID: 19847795

Abstract

To identify genetic variants with modest effects on complex human diseases, a growing number of networks or consortia are created for sharing data from multiple genome-wide association studies on the same disease or related disorders. A central question in this enterprise is whether to obtain summary results or individual participant data from relevant studies. We show theoretically and numerically that meta-analysis of summary results is statistically as efficient as joint analysis of individual participant data (provided that both analyses are performed properly under the same modeling assumptions). We illustrate this equivalence with case-control data from the Finland-United States Investigation of NIDDM Genetics (FUSION) study. Collating only summary results will increase the number and representativeness of available studies, simplify data collection and analysis, reduce resource utilization, and accelerate discovery.

Keywords: complex diseases, GWAS consortia, joint analysis, mega analysis, SNPs, summary results

INTRODUCTION

Genome-wide association studies (GWAS) have yielded new findings for many complex human diseases. Because complex diseases are influenced by an array of genetic variants mostly with small to moderate effects, it is difficult for one GWAS to provide unequivocal findings. Indeed, the odds ratios of disease with SNPs that have been observed in GWAS thus far are typically less than 1.5, and the majority of positive findings have emerged only after aggressive data sharing across multiple studies. For example, the initial findings from individual type 2 diabetes GWAS were ambiguous, but a number of disease loci with odds ratios of 1.1 ~ 1.4 were identified conclusively after combining results from several studies (Saxena et al. 2007; Zeggini et al. 2007; Scott et al. 2007; Zeggini et al. 2008).

Recognizing the need and benefits of data sharing, GWAS investigators have formed various networks or consortia to share data on the same disease or related disorders (Kavvoural and Ioannidis 2008). For example, the Psychiatric GWAS Consortium we are involved with has enrolled 47 studies in 5 major disorders (The Psychiatric GWAS Consortium Steering Committee 2009). Some of these consortia have attempted to obtain raw data on individual participants, as opposed to summary results that are used in traditional meta-analysis. The raw data from all available studies can then be analyzed simultaneously. Such analysis is commonly called joint analysis or mega-analysis. We will use the term mega-analysis and refer to the traditional method of combining summary results as meta-analysis.

A major motivation for obtaining raw, individual-level data is the general perception that mega-analysis is statistically more efficient than meta-analysis since it utilizes much more detailed information. However, obtaining raw data is difficult, costly and time-consuming. Some investigators are unwilling or unable to share raw data. For the Tobacco and Genetics Consortium we are involved with, the majority of the investigators were unable to provide raw data due to IRB issues and/or study policies that prohibit the sharing of raw data. Excluding studies that do not contribute raw data will reduce statistical power and limit the generalizability of the findings. Furthermore, the sheer scale of GWAS data poses significant practical challenges in storing and analyzing raw data from a large number of studies.

We show in this article that meta-analysis (when performed properly) is as efficient as mega-analysis in that the estimates of any genetic effect produced by the two methods have approximately the same variance. Thus, there is no need to obtain raw data. Even if raw data are available, one can analyze the data for each study separately and then combine the summary results through meta-analysis. This will greatly facilitate the analysis, especially if raw data are available only on a subset of studies.

METHODS

We wish to combine results from K studies with n_k participants in the kth study. For the analysis of each SNP, the data consist of (Y_ki, X_ki), where Y_ki is the disease status (1 = disease, 0 = no disease) for the ith participant of the kth study, and X_ki is the corresponding genotype score. (Under the additive mode of inheritance, the genotype score is the number of minor alleles; under the dominant model, the genotype score indicates, by the values 1 versus 0, whether or not the individual has at least one minor allele; under the recessive model, the genotype score indicates, by the values 1 versus 0, whether or not the individual has two minor alleles. For an untyped SNP, the unknown genotype score may be imputed by the expected genotype score.) We assume the following logistic regression model:

\Pr (Y_{ki} = 1) = \frac{e^{α_{k} + β X_{ki}}}{1 + e^{α_{k} + β X_{ki}}},

(1)

where the α_k’s are study-specific intercepts, and β is the log odds ratio representing a common genetic effect across studies.

Let β̂_k be the maximum likelihood estimate of β by maximizing the likelihood function of the kth study

L (α_{k}, β) = \prod_{i = 1}^{n_{k}} \frac{e^{Y_{ki} (α_{k} + β X_{ki})}}{1 + e^{α_{k} + β X_{ki}}},

and let V_k be the variance estimate of β̂_k. Then the inverse-variance meta-analysis estimate of β is

{(\sum_{k = 1}^{K} V_{k}^{- 1})}^{- 1} \sum_{k = 1}^{K} V_{k}^{- 1} {\hat{β}}_{k},

and its variance is estimated by

{(\sum_{k = 1}^{K} V_{k}^{- 1})}^{- 1} .

To perform mega-analysis, we obtain the maximum likelihood estimate of β and its variance estimate by maximizing the joint likelihood function

\prod_{k = 1}^{K} L (α_{k}, β) .

We show in the Appendix that the meta-analysis and mega-analysis estimates of β have approximately the same variance, so the two methods have approximately the same efficiency.

We can add covariates to model (1) in both meta-analysis and mega-analysis. The covariates may include environmental factors or principal components (Price et al. 2006) used to adjust for population stratification. The numbers and types of covariates need not be the same across studies. Meta-analysis of covariate-adjusted genetic effects is approximately as efficient as mega-analysis using individual-level covariate data (see the Appendix for details).

If the effects of some covariates are the same across studies, then one can improve the efficiency of mega-analysis by incorporating this restriction into the joint likelihood function and thus estimating fewer parameters. However, the efficiency gain is usually minimal because the number of covariates is much smaller than the sample sizes of typical GWAS. Interestingly, one can achieve the same efficiency gain by performing a multivariate version of meta-analysis (see the Appendix for details). The multivariate version of meta-analysis is not generally recommended because it requires additional summary results and the assumption of common covariate effects may not be appropriate.

Both meta-analysis and mega-analysis assume a common genetic effect across studies. This assumption does not affect the validity of association testing since the genetic effects are all zero under the null hypothesis of no association. However, it is important to determine whether meta-analysis or mega-analysis is more powerful when the effect sizes are unequal among studies. We show in the Appendix that the estimates produced by meta-analysis and mega-analysis are approximately the same and their variance estimates are also approximately the same when the genetic effects are unequal across studies, so that the two methods have similar statistical powers.

RESULTS

SIMULATION STUDIES

To demonstrate the equivalence between meta-analysis and mega-analysis, we present here some simulation results on combining two case-control studies. We simulated data from model (1), in which the SNP of interest had population minor allele frequencies (MAFs) of 0.3 and 0.2 in studies 1 and 2, respectively, and X_ki was the number of minor alleles. We set α₁ = −3, α₂ = −2.2, and β = log 1.4. We also considered unequal values of β for the two studies. Note that e^β pertains to the odds ratio (OR) of disease with the SNP under the additive mode of inheritance. We obtained various combinations of the numbers of cases and controls for the two studies. For each combination of the simulation parameters, we generated 10 million data sets and performed meta-analysis and mega-analysis of each data set under model (1). The results are summarized in Table 1.

Table 1.

Mean effect estimates, standard errors and powers at the 10⁻⁷ significance level for meta-analysis and mega-analysis of case-control data

Study 1 (MAF = 0:3)			Study 2 (MAF = 0:2)			Meta-analysis			Mega-analysis
OR	Cases	Contls	OR	Cases	Contls	Mean	SE	Power	Mean	SE	Power
1.4	1,000	1,000	1.4	1,000	1,000	1.402	0.076	0.812	1.402	0.076	0.814
	1,500	1,500		500	500	1.402	0.074	0.865	1.402	0.074	0.866
	500	500		1,500	1,500	1.402	0.079	0.745	1.402	0.079	0.747
	750	1,500		1,500	750	1.402	0.076	0.814	1.402	0.076	0.815
	1,500	750		750	1,500	1.402	0.076	0.812	1.402	0.076	0.814
1.5	1,000	1,000	1.3	1,000	1,000	1.411	0.077	0.840	1.411	0.077	0.843
	1,500	1,500		500	500	1.459	0.077	0.967	1.459	0.077	0.967
	500	500		1,500	1,500	1.359	0.076	0.543	1.360	0.076	0.550
	750	1,500		1,500	750	1.408	0.076	0.830	1.408	0.076	0.841
	1,500	750		750	1,500	1.413	0.077	0.850	1.414	0.078	0.847
1.3	1,000	1,000	1.5	1,000	1,000	1.383	0.075	0.736	1.383	0.075	0.741
	1,500	1,500		500	500	1.338	0.070	0.594	1.339	0.070	0.599
	500	500		1,500	1,500	1.436	0.081	0.858	1.437	0.081	0.861
	750	1,500		1,500	750	1.386	0.075	0.755	1.386	0.076	0.748
	1,500	750		750	1,500	1.380	0.074	0.720	1.381	0.074	0.737

Open in a new tab

When the SNP effects are the same between the two studies, the mean estimates of the SNP effects and the standard errors are identical up to the third decimal point between meta-analysis and mega-analysis, and the powers are identical up to the second decimal point. When the SNP effects are different between the two studies, there are some slight differences between the two methods, and either method can be slightly more powerful than the other.

FUSION DATA

For illustration with empirical data, we considered the Finland-United States Investigation of NIDDM Genetics (FUSION) study (Scott et al. 2007). The FUSION study genotyped 1,161 Finnish type 2 diabetes (T2D) cases and 1,174 Finnish normal glucose-tolerant (NGT) controls on 317,503 SNPs on the Illumina HumanHap300 BeadChip in stage 1 of a two-stage design. Based on the stage-1 results and the findings of other studies, the study genotyped 224 SNPs in an additional 1,204 Finnish T2D cases and 1,253 Finnish NGT controls. The subjects with missing genotypes on a particular SNP were excluded from the analysis of that SNP. All subjects have age and sex information.

We performed meta-analysis and mega-analysis of T2D status on the 224 SNPs that were genotyped in both stage 1 and stage 2 of the FUSION study. The results under the additive mode of inheritance are displayed in Figure 1. The individual estimates of odds ratios vary considerably between stages 1 and 2. The combined estimates of odds ratios and the corresponding standard error estimates are virtually identical between meta-analysis and mega-analysis, and consequently the two sets of p-values are virtually identical. The only noticeable differences lie in SNPs 114, 166 and 176, which have observed MAFs of approximately 0.9%, 1.6% and 3.1%. For SNPs with low MAFs, the individual estimates of genetic effects may be unstable, which may cause the combined estimates to be different between meta-analysis and mega-analysis. Such differences are unlikely to alter the rankings of the top SNPs because the p-values associated with rare SNPs tend to be non-significant.

Analysis of stages 1 and 2 data from the FUSION study. The top left panel compares the individual estimates of odds ratios between stages 1 and 2; the top right panel compares the combined estimates of odds ratios between meta-analysis and mega-analysis; the bottom left panel compares the standard error estimates between the two methods; and the bottom right panel compares the − log₁₀(p-values) between the two methods. In each panel, the red line indicates where the values on the two axes are equal.

For further illustration, we included age and sex as covariates in the logistic regression model. When age and sex are allowed to have different effects between stages 1 and 2, meta-analysis and mega-analysis again produce virtually identical results; see Figure 2. When age and sex are assumed to have common effects between stages 1 and 2 in mega-analysis, the results between the two methods are slightly more different; see Figure 3.

Analysis of stages 1 and 2 data from the FUSION study adjusted for age and sex. The top left panel compares the individual estimates of odds ratios between stages 1 and 2; the top right panel compares the combined estimates of odds ratios between meta-analysis and mega-analysis; the bottom left panel compares the standard error estimates between the two methods; and the bottom right panel compares the − log₁₀(p-values) between the two methods. Both meta-analysis and mega-analysis allow age and sex effects to be different between stages 1 and 2. In each panel, the red line indicates where the values on the two axes are equal.

DISCUSSION

Publication bias is a major concern in meta-analysis of literature results. One may reduce or avoid this kind of bias by planning GWAS meta-analysis prospectively to take advantage of all available studies and all available SNPs. By using summary results rather than raw data, one can increase the number of available studies and thus enhance the power of the analysis and the generalizability of the findings.

In many applications, it is desirable to adjust for participant-level covariates, such as principal components and environmental exposures. Such data are not available in published reports. In a consortium setting, the covariate adjustments can be made within each study and the covariate-adjusted estimates of genetic effects can then be combined through meta-analysis. It is logistically much simpler to provide such adjusted estimates than to transfer raw data. Indeed, this is the strategy adopted by the Tobacco and Genetics Consortium and many other consortia. If the covariate effects are the same across studies, then the mega-analysis that incorporates that restriction tends to be more efficient than the traditional meta-analysis. However, the efficiency gain is generally minimal and the same efficiency gain can be achieved by using a multivariate version of meta-analysis (see the Appendix for details).

We have focused on binary traits. In a related paper, Olkin and Sampson (1998) showed that, for comparing treatments with respect to a continuous outcome in clinical trials, meta-analysis is equivalent to mega-analysis if the treatment effects and error variances are constant across trials. It follows from the arguments of the Appendix that all the conclusions of this article hold for quantitative traits and indeed for any traits under any study designs; the details are given in Lin and Zeng (2009).

By working with raw data, one can ensure that all studies use the the same quality-control criteria and estimate the same quantities. However, such standardization and harmonization of information can be achieved by requiring all participating investigators to follow a common set of guidelines on quality control and statistical analysis so that the data are filtered and analyzed in the same way across studies before summary results are submitted.

Acknowledgments

The authors are grateful to Drs. Michael Boehnke and Heather Stringham and other FUSION investigators for providing the data used in this article. They are also grateful to Dr. Kuo-Ping Li for his programming assistance. This research was supported by the National Institutes of Health.

APPENDIX

TECHNICAL DETAILS

We adopt the notation of the Methods section. Let α̂_k and β̂_k be the maximum likelihood estimates (MLEs) of α_k and β based on the likelihood function of the kth study, and let α̃_k and β̃ be the MLEs of α_k and β based on the joint likelihood function. Note that β̃ is the mega-analysis estimate of β. Write θ_k = (α_k, β), θ̂_k = (α̂_k, β̂_k) and θ̃_k = (α̃_k, β̃). Also, define

I_{k} (θ_{k}) = \sum_{i = 1}^{n_{k}} υ_{ki} (θ_{k}) X_{ki}^{2} - {\sum_{i = 1}^{n_{k}} υ_{ki} (θ_{k}) X_{ki}}^{2} / \sum_{i = 1}^{n_{k}} υ_{ki} (θ_{k}),

where υ_ki(θ_k) = e^{α_k + βX_ki}/(1 + e^{α_k + βX_ki})². According to the MLE theory (Cox and Hinkley 1979), the variances of β̂_k and β̃ are estimated by $V_{k} = I_{k}^{- 1} ({\hat{θ}}_{k})$ and

Var (\tilde{β}) = {\sum_{k = 1}^{K} I_{k} ({\tilde{θ}}_{k})}^{- 1},

respectively. The inverse-variance meta-analysis estimate of β is

\hat{β} = {\sum_{k = 1}^{K} I_{k} ({\hat{θ}}_{k})}^{- 1} \sum_{k = 1}^{K} I_{k} ({\hat{θ}}_{k}) {\hat{β}}_{k},

(2)

and its variance is estimated by

Var (\hat{β}) = {\sum_{k = 1}^{K} I_{k} ({\hat{θ}}_{k})}^{- 1} .

Note that Var(β̂) takes the same form as Var(β̃): the only difference is that I_k is evaluated at θ̂_k in the former and at θ̃_k in the latter. Denote $n = \sum_{k = 1}^{K} n_{k}$ . Under model (1) of the Methods section, α̂_k and α̃_k converge to α_k while β̂_k and β̃ converge to β (as sample sizes n_k increase), so that β̂ also converges to β while Var(n^1/2β̂) and Var(n^1/2β̃) converge to a common constant. Thus, n^1/2(β̂ − β) and n^1/2 (β̃ − β) are asymptotically normal with mean 0 and with a common variance, which implies that meta-analysis and mega-analysis are asymptotically equivalent.

To accommodate covariates, we extend equation (1) of the Methods section as follows:

\Pr (Y_{ki} = 1) = \frac{e^{α_{k} + β X_{ki} + γ_{k}^{T} Z_{ki}}}{1 + e^{α_{k} + β X_{ki} + γ_{k}^{T} Z_{ki}}},

(3)

where Z_ki is the vector of covariates for the ith participant of the kth study, and γ_k is the corresponding vector of log odds ratios. By incorporating the unit component into Z_ki and the intercept α_k into γ_k, equation (3) can be written in a more compact form

\Pr (Y_{ki} = 1) = \frac{e^{β X_{ki} + γ_{k}^{T} Z_{ki}}}{1 + e^{β X_{ki} + γ_{k}^{T} Z_{ki}}} .

The likelihood functions given in the Methods section are modified to reflect the inclusion of covariates in the model. Write θ_k = (β, γ_k). Let θ̂_k and θ̃_k denote the MLEs of θ_k based on the likelihood function of the kth study and the joint likelihood function, respectively. Then all the results of the previous paragraph hold with the redefinition of

I_{k} (θ_{k}) = \sum_{i = 1}^{n_{k}} υ_{ki} (θ_{k}) X_{ki}^{2} - {\sum_{i = 1}^{n_{k}} υ_{ki} (θ_{k}) X_{ki} Z_{ki}^{T}} {\sum_{i = 1}^{n_{k}} υ_{ki} (θ_{k}) Z_{ki} Z_{ki}^{T}}^{- 1} {\sum_{i = 1}^{n_{k}} υ_{ki} (θ_{k}) X_{ki} Z_{ki}},

where $υ_{ki} (θ_{k}) = e^{β X_{ki} + γ_{k}^{T} Z_{ki}} / {(1 + e^{β X_{ki} + γ_{k}^{T} Z_{ki}})}^{2}$ .

If the effects of covariates are the same across studies, then equation (3) becomes

\Pr (Y_{ki} = 1) = \frac{e^{α_{k} + β X_{ki} + γ^{T} Z_{ki}}}{1 + e^{α_{k} + β X_{ki} + γ^{T} Z_{ki}}} .

(4)

By expanding X_ki to include Z_ki, equation (4) can be written as

\Pr (Y_{ki} = 1) = \frac{e^{α_{k} + β^{T} X_{ki}}}{1 + e^{α_{k} + β^{T} X_{ki}}},

in which the vector β represents both the genetic effect and the covariate effects. Redefine

I_{k} (θ_{k}) = \sum_{i = 1}^{n_{k}} υ_{ki} (θ_{k}) X_{ki} X_{ki}^{T} - {\sum_{i = 1}^{n_{k}} υ_{ki} (θ_{k}) X_{ki}} {\sum_{i = 1}^{n_{k}} υ_{ki} (θ_{k}) X_{ki}^{T}} / \sum_{i = 1}^{n_{k}} υ_{ki} (θ_{k}),

where υ_ki(θ_k) = e^{α_k + β^TX_ki}/(1 + e^{α_k + β^TX_ki})². By the arguments of the first paragraph, β̂ and β̃ are asymptotically normal with mean β and with a common covariance matrix. Thus, performing the multivariate version of meta-analysis on the vector of parameters β yields an estimate of the genetic effect that is asymptotically as efficient as the mega-analysis estimate when covariate effects are the same across studies.

Because model (3) has K sets of covariate effects whereas model (4) only has one set, mega-analysis is generally more efficient under model (4) than under model (3). Thus, univariate meta-analysis, which is asymptotically equivalent to mega-analysis under model (3), is generally less efficient than mega-analysis under model (4). However, the efficiency loss is minimal in large samples. Although one can avoid the efficiency loss by performing multivariate meta-analysis, it is more difficult to obtain multivariate than univariate summary statistics.

All the above results assume that the genetic effects are the same across studies. This assumption does not affect the type I error of association testing since all genetic effects are zero under the null hypothesis of no association. Nevertheless, it is of practical importance to determine the relative power of meta-analysis versus mega-analysis when genetic effects are unequal. By taking the differences between the score functions of L_k(α_k, β) and $\prod_{k = 1}^{K} L_{k} (α_{k}, β)$ and applying the mean-value theorem, we can show that

\tilde{β} = {\sum_{k = 1}^{K} I_{k} (θ_{k}^{*})}^{- 1} \sum_{k = 1}^{K} I_{k} (θ_{k}^{*}) {\hat{β}}_{k},

where $θ_{k}^{*}$ lies between θ̂_k and θ̃_k. Thus, β̃ takes the same form as β̂ shown in equation (2), the difference being that I_k is evaluated at $θ_{k}^{*}$ in the former and at θ̂_k in the latter. As indicated before, the only difference between Var(β̃) and Var(β̂) is that I_k is evaluated at θ̃_k in the former and at θ̂_k in the latter. Note that I_k depends on θ_k through υ_ki(θ_k) only. It can be shown that υ_ki(θ_k) does not change its values drastically when θ_k varies between θ̂_k and θ̃_k in case-control studies with modest genetic effects. Thus, β̂ and β̃ are approximately the same, and so are Var(β̂) and Var(β̃). Consequently, the power of meta-analysis is similar to that of mega-analysis even when genetic effects are unequal across studies.

References

Cox DR, Hinkley DV. Theoretical Statistics. Chapman and Hall; 1979. [Google Scholar]
Kavvoura1 FK, Ioannidis JPA. Methods for meta-analysis in genetic association studies: a review of their potential and pitfalls. Human Genetics. 2008;123:1–14. doi: 10.1007/s00439-007-0445-9. [DOI] [PubMed] [Google Scholar]
Lin DY, Zeng D. On the relative efficiency of using summary statistics versus individual level data in meta-analysis. 2009 doi: 10.1093/biomet/asq006. Unpublished technical report. [DOI] [PMC free article] [PubMed] [Google Scholar]
Olkin I, Sampson A. Comparison of meta-analysis versus analysis of variance of individual patient data. Biometrics. 1998;54:317–22. [PubMed] [Google Scholar]
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics. 2006;38:904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]
Saxena R, Voight BF, Lyssenko V, Burtt NP, de Bakker PIW, Chen H, Roix JJ, Kathiresan S, Hirschhorn JN, Daly MJ, et al. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science. 2007;316:1331–1336. doi: 10.1126/science.1142358. [DOI] [PubMed] [Google Scholar]
Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, Duren WL, Erdos MR, Stringham HM, Chines PS, Jackson AU, et al. A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science. 2007;316:1341–1345. doi: 10.1126/science.1142382. [DOI] [PMC free article] [PubMed] [Google Scholar]
The Psychiatric GWAS Consortium Steering Committee. A framework for interpreting genome-wide association studies of psychiatric disorders. Molecular Psychiatry. 2008;14:10–17. doi: 10.1038/mp.2008.126. [DOI] [PubMed] [Google Scholar]
Zeggini E, Weedon MN, Lindgren CM, Frayling TM, Elliott KS, Lango H, Timpson NJ, Perry JRB, Rayner NW, Freathy RM, et al. Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science. 2007;316:1336–1341. doi: 10.1126/science.1142364. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zeggini E, Scott LJ, Saxena R, Voight BF, Marchini JL, Hu T, de Bakker PIW, Abecasis GR, Almgren P, Andersen G, et al. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nature Genetics. 2008;40:638–645. doi: 10.1038/ng.120. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] Cox DR, Hinkley DV. Theoretical Statistics. Chapman and Hall; 1979. [Google Scholar]

[R2] Kavvoura1 FK, Ioannidis JPA. Methods for meta-analysis in genetic association studies: a review of their potential and pitfalls. Human Genetics. 2008;123:1–14. doi: 10.1007/s00439-007-0445-9. [DOI] [PubMed] [Google Scholar]

[R3] Lin DY, Zeng D. On the relative efficiency of using summary statistics versus individual level data in meta-analysis. 2009 doi: 10.1093/biomet/asq006. Unpublished technical report. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Olkin I, Sampson A. Comparison of meta-analysis versus analysis of variance of individual patient data. Biometrics. 1998;54:317–22. [PubMed] [Google Scholar]

[R5] Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics. 2006;38:904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]

[R6] Saxena R, Voight BF, Lyssenko V, Burtt NP, de Bakker PIW, Chen H, Roix JJ, Kathiresan S, Hirschhorn JN, Daly MJ, et al. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science. 2007;316:1331–1336. doi: 10.1126/science.1142358. [DOI] [PubMed] [Google Scholar]

[R7] Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, Duren WL, Erdos MR, Stringham HM, Chines PS, Jackson AU, et al. A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science. 2007;316:1341–1345. doi: 10.1126/science.1142382. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] The Psychiatric GWAS Consortium Steering Committee. A framework for interpreting genome-wide association studies of psychiatric disorders. Molecular Psychiatry. 2008;14:10–17. doi: 10.1038/mp.2008.126. [DOI] [PubMed] [Google Scholar]

[R9] Zeggini E, Weedon MN, Lindgren CM, Frayling TM, Elliott KS, Lango H, Timpson NJ, Perry JRB, Rayner NW, Freathy RM, et al. Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science. 2007;316:1336–1341. doi: 10.1126/science.1142364. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Zeggini E, Scott LJ, Saxena R, Voight BF, Marchini JL, Hu T, de Bakker PIW, Abecasis GR, Almgren P, Andersen G, et al. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nature Genetics. 2008;40:638–645. doi: 10.1038/ng.120. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Meta-Analysis of Genome-Wide Association Studies: No Efficiency Gain in Using Individual Participant Data

D Y Lin

D Zeng

Abstract

INTRODUCTION

METHODS