BOOTSTRAP CONFIDENCE INTERVALS FOR
THE BINOMIAL PARAMETER:
HOW GOOD IS THEIR COVERAGE WHEN THE
SAMPLE SIZE IS A POISSON RANDOM
VARIABLE
by
Tressa L. Fowler
B.A., University of Colorado at Boulder, 1993
A thesis submitted to the
University of Colorado at Denver
in partial fulfillment
of the requirements for the degree of
Master of Science
Applied Mathematics
This thesis for the Master of Science
degree by
Tressa L. Fowler
has been approved
by
Karen Kafadar
Date
Kent Goodrich
Fowler, Tressa L. (M.S., Applied Mathematics)
Bootstrap Confidence Intervals For The Binomial Parameter: How Well Do
They Approximate The True Interal When The Sample Size Is a Poisson Ran
dom Variable
Thesis directed by Professor Karen Kafadar
ABSTRACT
Certain measures of forecast quality (e.g. probability of detection) are essen
tially binomial probabilities. However, lack of systematic observations for forecasts
may cause forecast/observation data to violate the assumptions of the binomial
model. In these cases, not only are the numbers of successes random, so are the
numbers of observations.
Interval estimates of measures of forecast quality are more useful than point
estimates for comparing different forecasts. Traditional interval estimates based
on the binomial distribution may underestimate the true variability in the fore
cast/observation data, yielding a narrower interval than appropriate. This addi
tional variability can be addressed through the use of conditional models, propaga
tion of error formulas, and computer resampling methods. Interval estimates based
on these methods are computed and compared for simulated data where the condi
tional distribution of the number of Successes X given the sample size N is binomial
and the sample size N is distributed as Poisson. Simulated data include both large
and small samples.
Additionally, counts of observations may not fit the Poisson model well. The
parameter of the Poisson distribution may vary with the weather conditions, seasons,
in
availability of observers, etc. This may cause counts of observations of weather
hazards to appear to be overdispersed. Using the same methods, interval estimates
are constructed using a second set of simulated data similar to the first, but with
overdispersed Poisson counts.
Single simulations of the intervals for each set of data are compared to each
other. Additionally, the correlation between X and N is estimated and the effect of
this correlation on the intervals is discussed. Finally, the nominal coverage of each
method is estimated via multiple simulations.
This abstract accurately represents the content of the candidates thesis. I recom
mend its publication.
Signed
Karen Kafadar
IV
DEDICATION
To Aubrey, who enhances all aspects of my life.
ACKNOWLEDGEMENT
My gratitude is extended to everyone who has touched my life these last 5 years,
but especially to the following people:
KK for her wisdom and encouragement, Ann Landers story, and for sharing
her enthusiasm for statistics and ballet.
Barb for introducing me to the joys of statistics, keeping me employed while
I completed my degree, teaching me how to write a scientific paper, and
suggesting my thesis topic.
Kent for being a tremendous friend and mentor and getting me through Ad
vanced Calc for no more than a smile and a word of thanks.
Doug for being on my committee.
Randy for far too many things to enumerate here, but who has probably added
them all to my tab.
Mom The number one cheerleader for my education.
CONTENTS
Figures .................................................................ix
Chapter
1. Introduction ........................................................1
1.1 Overview .............................................................1
1.2 Background............................................................1
1.3 Motivation and Goals .................................................3
2. The Confidence Intervals.............................................7
2.1 Notation.............................................................7
2.2 Asymptotically Normal Interval.................................. . 8
2.3 Exact ...............................................................8
2.4 Bootstrap............................................................9
2.5 EVEs Rule...........................................................10
2.6 Propagation of Errors Formulas.......................................11
3. Data.................................................................13
3.1 Large Sample Binomial with Poisson Counts ...........................13
3.2 Large Sample Binomial with Overdispersed Poisson Counts..............13
vii
3.3 Small Sample Binomial with Poisson Counts..............................14
4. Correlation.............................................................15
4.1 Correlation between X and N.............................................15
4.2 Correlation between ^ and N.............................................16
5. Intervals Computed For A Single Realization
from the Large Sample Poisson Data Set....................................22
6. Intervals Computed For A Single Realization from
the Overdispersed Large Sample Poisson Data Set...........................32
7. Intervals Computed For A Single Realization
from the Small Poisson Data Set...........................................40
8. Nominal Coverages Estimated Via Simulation................................49
9. Conclusions...............................................................54
References....................................................................56
viii
FIGURES
Figure
1.1 Map of the Continental United States, showing the locations of all
PIREPs received in the threemonth period January to March 2001. . 5
1.2 Map of the continental United States showing locations of all PIREPs
received February 15, 2001............................................6
4.1 Correlation between number of trials and number of successes for
large sample Poisson data............................................18
4.2 Correlation between number of trials and number of successes for
large sample overdispersed Poisson data..............................19
4.3 Correlation between number of trials and number of successes for
small sample Poisson data............................................20
4.4 Boxplots of the simulated 2.5 and 97.5 percentiles of the distributions
of normal, adjusted normal, and conditinally binomial data..........21
5.1 Realization of confidence intervals including PENC for a single sim
ulation of large sample Poisson data with p = 0.5................24
5.2 Realization of confidence intervals for a single simulation of large
sample Poisson data with p = 0.5....................................25
ix
5.3 Realization of confidence intervals for a single simulation of large
sample Poisson data with p 0.6.....................................26
5.4 Realization of confidence intervals for a single simulation of large
sample Poisson data with p = 0.7.....................................27
5.5 Realization of confidence intervals for a single simulation of large
sample Poisson data with p = 0.8.....................................28
5.6 Realization of confidence intervals for a single simulation of large
sample Poisson data with p 0.9.....................................29
5.7 Realization of confidence intervals for a single simulation of large
sample Poisson data with p = 0.95....................................30
5.8 Realization of confidence intervals for a single simulation of large
sample Poisson data with p = 0.99....................................31
6.1 Realization of confidence intervals for a single simulation of overdis
persed large sample Poisson data with p 0.5.....................33
6.2 Realization of confidence intervals for a single simulation of overdis
persed large sample Poisson data with p = 0.6.....................34
6.3 Realization of confidence intervals for a single simulation of overdis
persed large sample Poisson data with p = 0.7.....................35
6.4 Realization of confidence intervals for a single simulation of overdis
persed large sample Poisson data with p = 0.8.....................36
6.5 Realization of confidence intervals for a single simulation of overdis
persed large sample Poisson data with p 0.9.....................37
6.6 Realization of confidence intervals for a single simulation of overdis
persed large sample Poisson data with p = 0.95....................38
x
6.7 Realization of confidence intervals for a single simulation of over dis
persed large sample Poisson data with p 0.99.........................39
7.1 Realization of confidence intervals for a single simulation of small
sample Poisson data with p = 0.5..................................42
7.2 Realization of confidence intervals for a single simulation of small
sample Poisson data with p = 0.6..................................43
7.3 Realization of confidence intervals for a single simulation of small
sample Poisson data with p = 0.7..................................44
7.4 Realization of confidence intervals for a single simulation of small
sample Poisson data with p = 0.8..................................45
7.5 Realization of confidence intervals for a single simulation of small
sample Poisson data with p = 0.9..................................46
7.6 Realization of confidence intervals for a single simulation of small
sample Poisson data with p = 0.95.................................47
7.7 Realization of confidence intervals for a single simulation of small
sample Poisson data with p = 0.99..................................48
8.1 Nominal coverage of confidence intervals for large data set by level of
p. (N=Normal, X=Exact, E=EVEs Rule, B=Bootstrap, P=Propagation
of Errors) .'......................................................51
8.2 Nominal coverage of confidence intervals for overdispersed data set
by level of p. (N=Normal, X=Exact, E=EVEs Rule, B=Bootstrap,
P=Propagation of Errors)...........................................52
8.3 Nominal coverage of confidence intervals for small data set by level of
p. (N=Normal, X=Exact, E=EVEs Rule, P=Propagation of Errors) 53
xi
1. Introduction
1.1 Overview
This introduction describes the problem that motivated this research. Chap
ter 2 presents possible approaches, using different approaches to methods of comput
ing confidence intervals. The three different simulated data sets used to assess the
confidence intervals are described in Chapter 2. Chapters 46 show the confidence
intervals calculated on a single simulation for each of three data sets. Nominal cover
age estimated from simulation is discussed in Chapter 7. Conclusions are presented
in Chapter 8.
1.2 Background
Observations of weather phenomena are inherently problematic. The obser
vational network in the continental U.S. probably has better spatial coverage than
anywhere else in the world. However, weather forecasts for aviation are three di
mensional, and therefore suffer from an even greater lack of systematic observations.
The only observation platforms are aircraft. The actual observation may be taken
by a person on the airplane, usually the pilot, or by an instrument attached to the
plane. These platforms are clearly nonstationary. Additionally, they are not in the
1
same place at the same time each day, making replicated observations infeasible.
Over many locations no aircraft fly and hence have no observations, while others
have hundreds. Such observations are certainly not independent, since a pilot report
of some hazard such as turbulence in one area both may dissuade further aircraft
from flying there, and may influence another pilots report (PIREP). Moreover, the
reports themselves are subjective and depend on the person responsible for making
the observation. Finally, no one can control the number of observations taken at
any time; i.e., total number of observations is random.
Most aircraft are not instrumented to detect weather hazards. General Avi
ation (GA) aircraft account for nearly twothirds of all PIREPs [Kane et al., 1998].
On the commercial aircraft that are instrumented, frequently the observations are
not recorded. They are available to the pilot in real time, but not stored. Research
aircraft are appropriately instrumented and the data are saved. However, there are
only a few research aircraft in the U.S., so these data are high quality but very rare
[Bernstein et al., 2000].
The number of observations is certainly related to patterns of airtrafflc. Figure
1.1 shows (in 2 dimensions) the locations of all Pilots Reports (PIREPs) for a 3
month period. The map shows clearly that most of the reports occur near airports
or on common airtrafflc routes, and many areas with no observations over an entire
winter season. Additionally, the map fails to show the vertical distribution of the
PIREPs. Even those locations that appear to have numerous observations in 2
dimensions may have several altitude ranges that lack reports.
While forecast verification frequently focuses on summaries for entire seasons,
verification is also done for shorter time periods such as a single day or week, some
2
times even a single hour [Mahoney et al., 1997]. Figure 1.2 shows (in 2 dimensions)
the distribution of PIREPs for a single day, February 15, 2001. The distribution
of the PIREPs is clearly not uniform across the U.S. Several areas have only a few
reports.
Most reports come from GA. Hence, when the weather is particularly bad,
the smart GA pilot does not fly. As a result, the number of available reports near
the major events is likely to be smaller than in locations with no major weather
events.
PIREP data are very problematic, but they are nonetheless the best data
available, and therefore must be used. However, no simple statistical model accounts
for all of the spatial and temporal biases, lack of independence, and random number
of observations. With so many problems to tackle at once, the work presented here
uses simulated data that imitates the effect of incorporating only one of the negative
characteristics of PIREPs, namely the random number of observations.
1.3 Motivation and Goals
For the purposes of verification, forecast/observation pairs are accumulated
in a 2 x 2 contingency table. Along with other measures, conditional probabilities
such as the probability of detection are computed from this table. These conditional
probabilities are really just estimates of a binomial proportion. Unfortunately, as
discussed in the previous section, the forecast/observation data violate the assump
tions of the binomial model. In this case, the confidence intervals calculated via the
usual methods may not provide coverage at the required level.
The intended outcome of this research is to find an approximately valid confi
3
dence interval for forecast verification statistics. The two most desirable qualities for
this interval to possess are ease of calculation and correct coverage near the chosen
probability level (validity).
4
Figure 1.1: Map of the Continental United States, showing the locations of all
PIREPs received in the threemonth period January to March 2001.
5
Figure 1.2: Map of the continental United States showing locations of all PIREPs
received February 15, 2001.
6
2. The Confidence Intervals
2.1 Notation
The following notation is adopted for the remainder of this document.
a : The probability that the null hypothesis is rejected when it is true.
X : The total number of successes from a binomial random variable.
x : A single observation of X.
N : The sample size of a binomial random variable. In this document we
assume that AT is a random variable with a Poisson distribution.
n : A single observation of N.
A : The parameter of the Poisson distribution, which specifies both the mean
and variance.
p : The binomial proportion.
p : An estimate of the binomial proportion.
E() : The expected value of the specified quantity.
Var() : The variance of the specified quantity.
a : The standard error. Subscripts are used to distinguish the standard errors
of different quantities.
a : An estimate of the standard error. Subscripts are used to distinguish the
standard error estimates of different quantities.
7
p : The correlation between X and N.
p : The estimate of p.
za : The ath quantiles of the Normal distribution.
Fa,b,a ' The ath quantiles of the Fdistribution (requiring specification of the
degrees of freedom a and b).
2.2 Asymptotically Normal Interval
The normal confidence interval is the simplest and most widely known method
of computing confidence intervals for the binomial parameter when n is somewhat
large. The binomial parameter is distributed asymptotically as Normal ^p, P~~P j,
justifying the use of a confidence interval based on the normal distribution. This
interval assumes n is fixed.
(2.1)
2.3 Exact
Calculation of the exact binomial confidence interval by the usual method
becomes difficult for large n. For this reason, the normal approximation (2.1) is
usually applied. However, to determine how well the normal approximation applies
for our unusual set of data, the exact binomial confidence interval must be calculated
for purposes of comparison. Agresti and Coull [1998]give the following formula for
the exact binomial confidence interval (again, n is assumed fixed):
1 +
n x + 1
i l
xF<
2x,2(ni+l),la/2
< p <
1 +
n x + 1
{x + l)i?2(x+l),2(nx),a/2
1
(2.2)
8
2.4 Bootstrap
Several types of confidence intervals can be constructed using bootstrap meth
ods [DiCiccio and Efron, 1996; Efron and Tibshirani, 1986]. Only one of them will
be considered here, the standard interval. This method begins with the following
procedure. Compute a large number, say B, of bootstrap samples (samples drawn
with replacement) from F, the empirical distribution of the data. Each bootstrap
sample of (forecast, observation) pairs yields an estimate of p; for the bth sample (b
= 1 .,~B), denote this estimate by p* (b). The mean of these values, P* {b)/B,
is the bootstrap estimate of p, denoted ps. When each of the B (e.g. 5000) boot
strap estimates p*{b) of p is assigned probability l/B, the result is the bootstrap
empirical distribution function of p, denoted G.
The standard error &b of the binomial proportion p is computed from the
bootstrap empirical distribution. This estimate is then substituted for into
the standard normal confidence interval formula (Eqn 2.1).
P Za/2B (2.3)
This method assumes the (at least asymptotic) normality of the binomial
parameter, because of the method of constructing the interval.
Confidence intervals can be computed by several other bootstrap methods.
However, each of these methods requires a minimum of 1000 bootstrap samples for
estimation while the variance estimate requires fewer replications. Efron and Tibshi
rani [1986] suggest that 100 replications suffices. However, Booth and Sarkar [1998]
show that 100 is unrealistically small. The large number of bootstrap samples re
quired for confidence interval estimation arises form the need to estimate distribution
9
quantiles, not merely the variance of the distribution. Use of repeated simulation
to estimate the coverage of these other intervals is extremely computer intensive in
terms of both time and memory. Therefore, use of the variance estimate to derive
a confidence interval is the most practical choice.
2.5 EVEs Rule
EVEs Rule for conditional variance [Mood et al., 1974]. can be used to de
rive an estimate of the variance of X\N. By substituting this estimate into the
normal confidence interval formula (Eqn 2J), we have yet another confidence inter
val. EVEs rule is as follows:
Once the variance estimate is obtained, it is substituted into the standard
the confidence interval, so this method assumes the (at least asymptotic) normality
of the binomial parameter.
Var(p) = E(Var{p\N)) +Var(E(p\N))
(2.4)
Var(p) = E(Var(p\N))+Var(E(p\N))
Since p is known, only the E
is estimated via bootstrap.
normal formula (Eqn 2.1) in place of ^. Once again, we are using the quantiles
of the standard normal distribution along with the variance estimate to determine
10
This method combines theory and resampling methods. The theory alone
leaves us with no answer in this case. However, using the bootstrap on the whole
problem is a naive approach. This approach takes theory as fax as it will allow.
When theory runs out, then resampling methods are applied to estimate the piece
that remains.
2.6 Propagation of Errors Formulas
Propagation of Errors (PEC) formulas can be used to estimate the variance
of jyJV using the moments of X and the moments of AT in a Taylor expansion
[Ku, 1966]. A better estimate can usually be obtained by using more terms in the
Taylor series. However, one of the main goals of this study is to determine an
appropriate interval that is easy to compute. By using more terms of the Taylor
expansion, this computation would become unwieldy. Even if the intervals were
good in all other respects, it would be difficult to convince users to implement
them.
This variance estimate can be calculated in two ways. It can be computed as if
X and N were independent, or it can account for the correlation between them. The
term in the PEC formula involving correlation is negative, therefore the variance
estimate will be reduced if the variables are positively correlated. With binomial X
and N, any correlation is certain to be positive.
Here, the variance is estimated two ways, one accounts for the correlation in
X and N, and one assumes independence.
The propagation of errors formula for ratios is as follows:
PEC = Var(p) = Var(^j  2(pcx) (J) (2.5)
11
(We substitute for unobserved X, N, p, and oxn estimates or observations; e.g. x,
n, p, &xn)
When we assume independence, the propagation of errors with no correlation
(PENC) variance estimate reduces to:
CJPENC = Var(p) = Varf^j fe + (26)
Once the variance estimates are obtained, they can be substituted for
in the usual normalbased confidence interval (Eqn 2.1). The resulting confidence
intervals are:
P Za/2aPEC
(2.7)
P Za/2
(2.8)
3. Data
We assume a model for the data of interest where the counts have a binomial
distribution with the sample size N distributed as Poisson. Three sets of data are
used in the analyses presented here. One is a large sample of binomial observations
with a range of binomial proportions used. The second set is similar, except that
the Poisson counts are overdispersed. Finally, a small sample with overdispersed
Poisson counts are simulated.
3.1 Large Sample Binomial with Poisson Counts
Two hundred realizations of a Poisson count with A = 100 were simulated.
Then, for each of these realizations, a binomial number of successes with a proba
bility p 6 {0.5,0.6,0.7,0.8,0.9,0.95,0.99} were simulated.
3.2 Large Sample Binomial with Overdispersed Poisson Counts
Overdispersed Poisson data were simulated by adding noise to the Poisson
parameter A, then simulating the observations as above. The noise is a simulated
set of data distributed as Normal(0,10).
13
3.3
Small Sample Binomial with Poisson Counts
This simulated data set was derived in the same manner as the large sample
counts, except that only ten realizations were simulated instead of 200 and the
Poisson parameter A = 7 .
14
4. Correlation
4.1 Correlation between X and N
In a binomial model with random N, it is clear that the number of successes
is correlated with the number of trials. The sample correlation for each of our data
sets was computed. Figures 4.1 through 4.3 show the correlation between X and N
at each value of p for the three different sets of data, respectively. Correlations are
very high between the number of successes and number of trials. The correlation is
highest, nearly 1, for the overdispersed counts and for probabilities nearer 1. The
lowest correlation, for the large sample Poisson data with p = 0.5, is still about 0.65.
For the small sample data, the correlation between number of successes and sample
size is also very high, between about 0.8 and 1. Our estimates of correlation based
on small samples are not monotonically increasing as one would expect. However,
this is probably attributable to variability in the estimates due to the small sample
size. The true correlation almost certainly increases as p varies from 0.5 to 0.99.
15
4.2
Correlation between and N
Cov(n)=e(Â§n)eQe(N) (4.1)
For the overdispersed Poisson data, pieces of this equation are analytically
intractible. However, when N is Poisson, the unconditional distribution of X is
Poisson(pA) [Casella and Berger, 2001]. Then the covariance (and therefore the cor
relation) between and N is zero as shown below.
Cov^.N) = *(Â£*)*(Â£)*(*)
= E(X) pX
= pX pX
= 0
While this result is interesting, it does not necessarily follow that the validity
of the resulting confidence intervals does not depend on whether N is fixed or
random, because the allowance based on zaVar(p) is only an approximation.
Bias is still an issue, as illustrated by a brief simulation.
The distribution of X/N where X\N is distributed as Binomial(N,.8) and
N is distributed as Poisson(50) was simulated using 5000 variates and the 125th
largest (2.5%) and 4875th largest (97.5%) were found. This is repeated 100 times
to get a good estimate of these two quantiles, which are then compared with the
normal limit and adjusted limit. Figure 4.4 shows the boxplots of the upper (97.5%)
and lower (2.5%) quantiles from these simulations. Even with the adjustment, the
16
actual limits on the distribution of X/N are quite a bit different, at least for this
one case (p = .8, A = 50). The lower limit based on the normal distribution is highly
variable, ranging from .3 to .945 (median 0.676); the upper limit ranges from 0.758
to 1 (median 0.924). The estimate of the 0.025 quantile appears to be 0.680, and
the estimate of the 0.975 quantile is 0.905.
17
Correlation of X and N
0.5 0.6 0.7 0.8 0.9 1.0
True Proportion
Figure 4.1: Correlation between number of trials and number of successes for large
sample Poisson data.
18
Correlation of X and N
Figure 4.2: Correlation between number of trials and number of successes for large
sample overdispersed Poisson data.
19
Correlation of X and N
Figure 4.3: Correlation between number of trials and number of successes for small
sample Poisson data.
20
Lower Limit
Upper Limit
p=0.8, lambda=50
p=0.8, lambda=50
Figure 4.4: Boxplots of the simulated 2.5 and 97.5 percentiles of the distributions
of normal, adjusted normal, and conditionally binomial data.
21
5. Intervals Computed For A Single Realization
from the Large Sample Poisson Data Set
In the next three chapters, intervals are computed on a a single simulation
from the three data sets. This facilitates examination of each confidence interval
as if it were being used in a real situation. Usually, only a single sample of data
is collected. Thus all estimates must be made using only that sample. By using a
single realization of the simulated data, we are mimicking this process. One of these
intervals displays undesirable characteristics even on a single sample. This being
the case, we are saved the effort of examining its properties on large numbers of
simulations.
Figure 5.1 shows the confidence intervals computed on the large data set using
each of the six methods. Because the PENC type of interval is so long, the scale of
the graph is such that it is difficult to see the other intervals. For this reason, the
PENC interval is excluded from the remainder of these comparison graphs.
Figures 5.2 through 5.8 show the different intervals for a single simulation of
data. The true (known) parameter value is represented by the line. The asterisk
represents the estimated parameter, and the confidence interval is represented by the
line. Additionally, these plots are an interesting illustration of graphical illusions.
The estimate of the binomial proportion is the same for each interval, but the
22
different line lengths make it appear that the asterisk is in a different location for
some intervals.
The Normal, exact, and EVEs Rule intervals are nearly identical. The PEC
interval is very short and the PENC interval is much too long. The bootstrap
interval is inconsistent, sometimes shorter than the normal interval and sometimes
nearly identical. The bootstrap interval fails to cover the true proportion in one
of the seven simulations while the PEC intervals fail to cover the true parameter
in three of the seven. The other intervals for all simulation runs cover the true
parameter.
23
Confidence Interval Lengths
0.35 0.40 0.45 0.50 0.55 0.60 0.65
Figure 5.1: Realization of confidence intervals including PENC for a single simula
tion of large sample Poisson data with p 0.5.
24
Confidence Interval Lengths
0.46 0.48 0.50 0.52 0.54
I;,11
Norm Exact Boot EVE PEC
Method
Figure 5.2: Realization of confidence intervals for a single simulation of large sample
Poisson data with p = 0.5.
25
Confidence Interval Lengths
0.56 0.58 0.60 0.62 0.64
Norm Exact Boot EVE PEC
Method
Figure 5.3: Realization of confidence intervals for a single simulation of large sample
Poisson data with p = 0.6.
26
Confidence Interval Lengths
0.66 0.68 0.70 0.72 0.74
Norm Exact Boot EVE PEC
Method
Figure 5.4: Realization of confidence intervals for a single simulation of large sample
Poisson data with p 0.7.
27
Confidence Interval Lengths
0.76 0.78 0.80 0.82 0.84
Norm Exact Boot EVE PEC
Method
Figure 5.5: Realization of confidence intervals for a single simulation of large sample
Poisson data with p = 0.8.
Confidence Interval Lengths
0.86 0.88 0.90 0.92 0.94
Norm Exact Boot EVE PEC
Method
Figure 5.6: Realization of confidence intervals for a single simulation of large sample
Poisson data with p = 0.9.
29
Confidence Interval Lengths
0.90 0.92 0.94 0.96 0.98 1.00
Norm Exact Boot EVE PEC
Method
Figure 5.7: Realization of confidence intervals for a single simulation of large sample
Poisson data with p = 0.95.
30
Confidence Interval Lengths
0.94 0.96 0.98 1.00 1.02 1.04
Figure 5.8: Realization of confidence intervals for a single simulation of large sample
Poisson data with p = 0.99.
31
6. Intervals Computed For A Single Realization from
the Overdispersed Large Sample Poisson Data Set
Figures 6.1 through 6.7 show the different intervals for a single simulation of
the overdispersed data. As expected, these intervals are a bit wider than the ones for
the large sample. The true (known) parameter value is represented by the line. As
before, the asterisk represents the estimated parameter, and the confidence interval
is represented by the line.
All of the intervals cover the true proportion in this case. The bootstrap
interval is shorter than the normal interval for p = 0.7, longer for p = 0.6 and
approximately the same length for all other values of p. As the PEC interval is
the shortest and all intervals cover the true parameter, this simulation makes it
seem preferable to the other intervals. However, as is illustrated in Chapter 7, the
coverage of this interval is not at the required level.
32
Confidence Interval Lengths
0.485 0.490 0.495 0.500 0.505 0.510 0.515
Norm Exact Boot EVE PEC
Method
Figure 6.1: Realization of confidence intervals for a single simulation of overdispersed
large sample Poisson data with p = 0.5.
33
Confidence Interval Lengths
0.585 0.590 0.595 0.600 0.605 0.610 0.615
Norm Exact Boot EVE PEC
Method
Figure 6.2: Realization of confidence intervals for a single simulation of overdispersed
large sample Poisson data with p = 0.6.
34
Confidence Interval Lengths
0.685 0.690 0.695 0.700 0.705 0.710 0.715
Norm Exact Boot EVE PEC
Method
Figure 6.3: Realization of confidence intervals for a single simulation of overdispersed
large sample Poisson data with p = 0.7.
35
Confidence Interval Lengths
0.785 0.790 0.795 0.800 0.805 0.810 0.815
I1111
Norm Exact Boot EVE PEC
Method
Figure 6.4: Realization of confidence intervals for a single simulation of overdispersed
large sample Poisson data with p = 0.8.
36
Confidence Interval Lengths
0.885 0.890 0.895 0.900 0.905 0.910 0.915
1 I I1 "1
Norm Exact Boot EVE PEC
Method
Figure 6.5: Realization of confidence intervals for a single simulation of overdispersed
large sample Poisson data with p = 0.9.
37
Confidence Interval Lengths
0.935 0.940 0.945 0.950 0.955 0.960 0.965
Norm Exact Boot EVE PEC
Method
Figure 6.6: Realization of confidence intervals for a single simulation of overdispersed
large sample Poisson data with p = 0.95.
38
Confidence Interval Lengths
0.975 0.980 0.985 0.990 0.995 1.000 1.005
Figure 6.7: Realization of confidence intervals for a single simulation of overdispersed
large sample Poisson data with p = 0.99.
39
7. Intervals Computed For A Single Realization
from the Small Poisson Data Set
Figures 7.1 through 7.7 show the different intervals for a single simulation
of data. The true (known) parameter value is represented by the horizontal line.
The asterisk represents the estimated parameter, and each confidence interval is
represented by the vertical line.
As expected, the intervals for the small sample are much wider than those
for the larger samples. For all values of p except 0.5 and 0.99, all of the intervals
include the true parameter. The bootstrap and PEC intervals fail to contain p = 0.5.
Figure 7.1 makes it clear that this particular simulation is unlucky, although not
unexpected, as the estimate of p is nearly 0.6. It is less likely that any interval will
cover the true parameter when the estimate is so far from the true value.
For p = 0.99, only the exact interval covers the true value. All of the other
intervals rely on the estimated variance. Due to the small sample size and p 0.99,
all trials in this simulation produced a success. The result is p = 1 and the estimated
sample variance is 0. Clearly, the sample variance cannot be used in this case to
derive an interval estimate of the binomial proportion. In practice, this is unlikely to
cause a problem. When samples are this small, computation of the exact confidence
interval is very simple. The other methods rely on asymptotic normality, which
40
assumption is generally rejected when p is very near 0 or 1. It is unlikely that
anyone would bother with any asymptotically normal confidence interval methods
in this case, choosing instead the exact interval.
41
Confidence Interval Lengths
r.
o
co
o
LO
o
o
co
d
Norm Exact Boot EVE PEC
Figure 7.1: Realization of confidence intervals for a single simulation of small sample
Poisson data with p = 0.5.
42
Confidence Interval Lengths
00
d
r.
o
co
o
10
o
o
Norm
Exact
Boot
EVE
PEC
Figure 7.2: Realization of confidence intervals for a single simulation of small sample
Poisson data with p = 0.6.
43
Confidence Interval Lengths
O)
o
ao
o
o
co
d
in
d
i111r
Norm Exact Boot EVE PEC
Figure 7.3: Realization of confidence intervals for a single simulation of small sample
Poisson data with p 0.7.
44
Confidence Interval Lengths
05
o
CO
o
N
o
co
o
Figure 7.4: Realization of confidence intervals for a single simulation of small sample
Poisson data with p = 0.8.
Norm
Exact
Boot
EVE
PEC
45
Confidence Interval Lengths
Norm
Exact
Boot
EVE
PEC
Figure 7.5: Realization of confidence intervals for a single simulation of small sample
Poisson data with p = 0.9.
46
Confidence Interval Lengths
O)
o
oo
o
Norm
Exact
Boot
EVE
PEC
Figure 7.6: Realization of confidence intervals for a single simulation of small sample
Poisson data with p = 0.95.
47
Confidence Interval Lengths
CM
O
O)
o
CO
o
i i111
Norm Exact Boot EVE PEC
Figure 7.7: Realization of confidence intervals for a single simulation of small sample
Poisson data with p = 0.99.
8. Nominal Coverages Estimated Via Simulation
To estimate the nominal confidence level for the intervals computed by the
different methods, each set of data was simulated 300 times and the intervals calcu
lated. The percentage of times the interval covered the true value of the proportion
is computed, and presented in the graphs below. This gives us an estimate of the
nominal coverage probability of each type of interval. Producing more than 300
replicates would give us a better estimate. However, for the intervals that involve
bootstrap resampling, even simulating the interval 300 times is computationally
intensive.
Figures 8.1 to 8.3 show the estimated nominal coverage of each type of confi
dence interval at various values of p for the three sets of data, respectively. For the
large set of data, the normal and exact intervals have coverage fairly close to the
95% level. The bootstrap coverage differs slightly from the coverage of the other
intervals. The coverage of the propagation of error interval is far below the specified
level.
With the small sample of data, the regular bootstrap intervals were not cal
culated. When the sample is of size 10, the bootstrap is not recommended. Again,
the interval based on propagation of errors does not provide the required level of
49
coverage. The normal and EVEs rule intervals all have reasonable coverage for the
small sample when p 6 {0.5,0.6,0.7,0.8}. However, the coverage is below the 95%
level for p {0.9,0.95,0.99}. The exact interval gives coverage at or above the 95%
level for all p.
The true coverage of these intervals for these types of data are neither known
nor derived, but estimated from a large sample. Thus, our estimates of the true
coverage are subject to variability. The number of simulations that result in a
success (i.e. the interval covers the true proportion) is a Binomial random variable.
Therefore, the variability in these estimates is the usual p^~p^
With the binomial distribution, one can calculate the expected number of
success for these simulations under the null hypothesis that the true coverage is
95%. We simply compute the 95% confidence interval for the binomial proportion.
This calculation is as follows:
p 1.96^2(^21
= 0.95 1.96^/0953(0005^
= 0.95 0.025
= (0.925,0.975)
Based on this interval, we can determine in the following simulations that any pro
portion of success above 0.925 indicates confidence interval coverage not significantly
different 95%. When the proportions are below 0.925, then that interval failed to
achieve 95% coverage.
50
Actual Interval Coverage
0.80 0.85 0.90 0.95 1.00
Figure 8.1: Nominal coverage of confidence intervals for large data set by level of p.
(N=Normal, X=Exact, E=EVEs Rule, B=Bootstrap, P=Propagation of Errors)
51
0.5 0.6 0.7 0.8 0.9 1.0
True Proportion
Figure 8.2: Nominal coverage of confidence intervals for overdispersed data set by
level of p. (N=Normal, X=Exact, E=EVEs Rule, B=Bootstrap, P=Propagation
of Errors)
52
Actual Interval Coverage
0.5 0.6 0.7 0.8 0.9 1.0
True Proportion
Figure 8.3: Nominal coverage of confidence intervals for small data set by level of
p. (N=Normal, X=Exact, E=EVEs Rule, P=Propagation of Errors)
53
9. Conclusions
The randomness of the sample size for binomial observations seems to have
little effect on the interval estimates of the binomial parameter. The high corre
lation between the number of successes and the number of observations makes the
additional variability negligible.
The bootstrap interval displays quite a bit of variability. The coverage of
these intervals is sometimes higher than the nominal level and sometimes lower.
The intervals derived from the Propagation of Errors formulas do not perform
well at all, probably due to poor estimation of correlation and approximation by
Taylors expansion. When the correlation is taken into account, the intervals are
too narrow and the coverage is far from the specified level. When the correlation
is ignored, the intervals far are too large. Additionally, the size of these intervals
increases dramatically as the true proportion gets closer to 1 or 0.
The EVEs Rule interval is nearly identical to the standard normal interval.
The small amount of extra variability due to N makes little difference, because of the
high correlation between X and N. There should be extra variability in the model
due to the uncertainty in N. However, since X and N are so highly correlated, most
of the variability due to N is already accounted for by the variability due to X. The
54
correlation is even higher when the counts are overdispersed and/or the proportion
is nearer to 1, so the EVEs rule interval is even closer to the normal interval for
these cases.
This indicates that use of the normal interval for these problematic data is
justified. In this case, the violations of the assumption regarding the fixed nature of
N does not seem to adversely aflpct the coverage of the resulting confidence interval.
For small sample with p very near 0 or 1, the exact interval is the only method that
gives the required level of coverage, so it is preferred in this case.
55
REFERENCES
A. Agresti and B. Coull, Approximate is better than exact for in
terval estimation of binomial proportions, The American Statistician, 52
(1998), pp. 119126.
B. Berstein, F. McDonough, M. Politovich, and B. Brown, A re
search aircraft verification of the integrated icing diagnostic algoritim
(IIDA), in Preprints: 9th Conference on Aviation, Range and Aerospace Me
teorology, American Meteorological Society, 2000.
G. Casella and R. L. Berger, Statistical Inference, Duxbury Press,
Belmont, California, 2001. .
T. DlClCClO AND B. Efron, Bootstrap confidence intervals, Statistical
Science, 11 (1996), pp. 189228.
B. Efron and R. Tibshirani, Bootstrap methods for standard errors,
confidence intervals, and other measures of statistical accuracy, Sta
tistical Science, 1 (1986), pp. 54778.
B. Efron and R. Tibshirani, An Introduction to the Bootstrap, Chapman
and Hall, New York, 1993.
T. L. Kane, B. G. Brown, and R. Bruintjes, Characteristics of pilot
reports of icing, in Preprints: 14th Conference on Probability and Statistics
in the Atmospheric Sciences, American Meteorological Society, 1998, pp. 9095.
H. H. Ku, Notes on the use of propagation of error formulas, Journal
of Research of the National Bureau of Standards C. Engineering and Instru
mentation, 70C (1966), pp. 331341.
J. L. Mahoney, J. K. Henderson, and P. A. Miller, A description
of the forecast systems laboratorys realtime verification system
(RTVS), in Preprints: 7th Conference on Aviation, Range, and Aerospace
Meteorology, American Meteorological Society, 1997, pp. J26J31.
A. Mood, F. Graybill, and D. Boes, Introduction to the Theory of
Statistics, McGraw Hill, Boston, Massachusetts, 1974.
56
