In article <4755969a$0$25349$ed362ca5 from nr2.newsreader.com>, Glen M.
Sizemore <gmsizemore2 from yahoo.com> writes
>(trim)
>>Hi John. I'm not sure I can help the sort of subtleties you might have in
>mind. Remember, behavior analysts eschew the use of statistics but, of
>course, I published in journals that require inferential statistics so I
>have used them - usually repeated measures ANOVA. I know that you were
>probably just making conversation, but far be it from me to pass up a chance
>to bad mouth inferential statistics. One of the weird things about p-values
>is that if you reject the null-hypothesis, the p-value becomes, in a sense,
>meaningless! This is because a p-value gives the probability of obtaining
>differences between two groups that are equal to or more extreme than what
>you obtained GIVEN THAT THE NULL-HYPOTHESIS IS TRUE! But if you reject the
>null-hypothesis on the basis of the p-value, what is the quantitative
>meaning of the p-value? Many people think that the p-value "gives the
>likelihood that the data you obtained are due to chance," but that is the
>equivalent of saying that it is the probability that the null-hypothesis is
>true given the data. But, as I have just described, that is not what a
>p-value gives - it gives the probability of obtaining the data given that
>the null hypothesis is true! This does not mean that a small p-value should
>not cause you to reject the null-hypothesis, but as I said in a previous
>post, rejecting the null hypothesis becomes increasingly likely as a
>function of sample size! I am increasingly becoming enamored with Beyesian
>statistics thanks to the unfortunately-absent Michael Olea. That guy is
>really smart (but I think praise makes him somewhat uncomfortable). Beyesian
>statistics do, in fact, give you the p that your hypothesis is true given
>the data. Another thing that people think is that, if the p-value is really
>small, repeating the experiment is likely to reproduce the results of the
>first, but this is not true, as far as I can see. The only way to show that
>a finding is reliable is to replicate it. But a lot of journals discourage
>the submission of experiments that solely function as replications!
Confidence intervals are only a little further into the statistics books
than p-values and are a good deal more illuminating. If the p-value
would reject the null hypothesis then the confidence interval gives you
a measure of how far away from the null hypothesis you can plausibly be.
If the p-value would not reject the null hypothesis you get a measure of
how big an effect there might be hiding under the noise.
One useful application of confidence intervals is to run an experiment
with the intention of dismissing some proposed effect - you can't prove
a negative, but you could come up with a small confidence interval
around zero and say that any possible effect must be negligible. This
would be a reason for doing experiments on folk wisdom preventative
measures for eyesight even if you didn't believe them; you could advance
the state of knowledge by running a statistically rigorous experiment to
dismiss them once and for all. A related (not exactly identical)
application is testing for bio-equivalence, where a drug manufacturer
has the goal of showing that there is no practical difference between
two drugs from different production processes.
Note that some of the people bad-mouthing p-values (e.g. various
psychologists) are suggesting confidence intervals as at least one
possible replacement.
If you assume enough you can get some sort of link between p-values and
reproducibility, but the figures aren't very encouraging, largely
because a lot of statistics is done with the minimum possible sample
size, if not smaller. Suppose that you have a two-tailed p-value of
0.001 for a simple test of a normally distributed value with known unit
variance. This means that the observed deviation was about 3.29 sigma.
The difference between two variables with unit variance has standard
deviation sqrt(2). With probability 90% the replicate is no more than
1.28 * sqrt(2) less encouraging than the original, which is 1.81, so
with probability 90% the replicate comes up with 1.48 sigma or better,
but 1.48 sigma is a 2-tailed value of 13.9% or so - n.s. If the original
p-value was 1.0E-6 then sigma increases to 4.89 sigma, so with
probability 90% or more we get 3.08 sigma or better, which is a 2-tailed
significance of p = 0.002.
--
A.G.McDowell