**Multiple Comparisons Distortions of Parameter Estimates**

Neal Jeffries

**Abstract:** In experiments involving many variables investigators typically use
multiple comparisons procedures to determine differences that are unlikely to be
the result of chance. However,
investigators rarely consider how the magnitude of the greatest observed effect
sizes may have been subject to bias resulting from multiple testing. These
questions of bias become important to the extent investigators focus on the
magnitude of the observed effects. As an example, such bias can lead to problems
in attempting to validate results if a biased effect size is used to power a
follow-up study. Further, such factors may give rise to conflicting findings in
comparing two independent samples --
e.g. the variables with strongest effects in one study may predictably appear
much less so in a second study. An associated important consequence is that
confidence intervals constructed using standard
distributions may be badly biased. A bootstrap approach is used to
estimate and correct the bias in the effect sizes of those variables showing
strongest differences. This bias is not always present; some principles showing
what factors may lead to greater bias are given and a proof of the convergence
of the bootstrap distribution is provided.

**Key words: Effect size, bootstrap, multiple comparisons**

Here is a pdf version of this paper. A longer version of the paper is also available.

Here is an expanded discussion of how the nested percentile bootstrap distribution was estimated. Also discussed are the basic bootstrap and bias corrected procedures. Expanded versions of the paper's Tables 1, 2, and 3 are provided with more discussion of the asymmetric coverage of the confidence intervals.

Here are two appendices. The first
gives asymptotic justification for applying the bootstrap in these

situations and the second provides analysis showing which factors (e.g. sample
size, pattern of true effect size, number of tests) enhance bias.

The simulations were performed using the R program language available from http://www.r-project.org. Here are program statements that generate the confidence intervals discussed in the paper.

Questions or comments may be directed to neal.jeffries@nih.gov.