Multiple Comparisons Distortions of Parameter Estimates

Multiple Comparisons Distortions of Parameter Estimates

Neal Jeffries

Abstract: In experiments involving many variables investigators typically use multiple comparisons procedures to determine differences that are unlikely to be the result of chance. However, investigators rarely consider how the magnitude of the greatest observed effect sizes may have been subject to bias resulting from multiple testing. These questions of bias become important to the extent investigators focus on the magnitude of the observed effects. As an example, such bias can lead to problems in attempting to validate results if a biased effect size is used to power a follow-up study. Further, such factors may give rise to conflicting findings in comparing two independent samples -- e.g. the variables with strongest effects in one study may predictably appear much less so in a second study. An associated important consequence is that confidence intervals constructed using standard distributions may be badly biased. A bootstrap approach is used to estimate and correct the bias in the effect sizes of those variables showing strongest differences. This bias is not always present; some principles showing what factors may lead to greater bias are given and a proof of the convergence of the bootstrap distribution is provided.

Key words: Effect size, bootstrap, multiple comparisons

Here is a pdf version of this paper. A longer version of the paper is also available.

Here is an expanded discussion of how the nested percentile bootstrap distribution was estimated. Also discussed are the basic bootstrap and bias corrected procedures. Expanded versions of the paper's Tables 1, 2, and 3 are provided with more discussion of the asymmetric coverage of the confidence intervals.

Here are two appendices. The first gives asymptotic justification for applying the bootstrap in these
situations and the second provides analysis showing which factors (e.g. sample size, pattern of true effect size, number of tests) enhance bias.

The simulations were performed using the R program language available from http://www.r-project.org. Here are program statements that generate the confidence intervals discussed in the paper.

Questions or comments may be directed to neal.jeffries@nih.gov.