domingo, 3 de abril de 2011

Do blog de Andrew Gelman

We talk a lot about inference from Markov chain simulation (Gibbs sampler and Metropolis algorithm), convergence, etc. But inference from simple random samples can be nonintuitive as well.
Consider this example of inference from direct simulation. The left column shows five replications of 95% intervals for a hypothetical parameter theta that has a unit normal distribution, each based on 100 independent simulation draws. Right column: five replications of the same inference, each based on 1000 draws. For both columns, the correct answer is [-1.96, 1.96].
Inferences based on Inferences based on 100 random draws 1000 random draws [-1.79, 1.69] [-1.83, 1.97] [-1.80, 1.85] [-2.01, 2.04] [-1.64, 2.15] [-2.10, 2.13] [-2.08, 2.38] [-1.97, 1.95] [-1.68, 2.10] [-2.10, 1.97]
From one perspective, these estimates are pretty bad: even with 1000 simulations, either bound can easily be off by more than 0.1, and the entire interval width can easily be off by 10%. On the other hand, for the goal of inference about theta, even the far-off estimates above aren't so bad: the interval [-2.08, 2.38] has 97% probability coverage, and [-1.79, 1.69] has 92% coverage.
So, in this sense, my intuitions were wrong and wrong again: first, I thought 100 or 1000 independent draws were pretty good, so I was surprised that these 95% intervals were so bad. But, then, I realized that my corrected intuition was itself flawed: actually, nominal 95% intervals of [-2.08, 2.38] or [-1.79, 1.69] aren't so bad at all.

Nenhum comentário:

Postar um comentário