BAYESIAN HIERARCHICAL MODELING AND MARKOV CHAIN
SIMULATION FOR CHRONIC WASTING DISEASE
by
Christopher H. Mehl
Bachelor of Science, University of Colorado at Denver, 1998
A thesis submitted to the
University of Colorado at Denver
in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
Applied Mathematics
2004
This thesis for the Doctor of Philosophy
degree by
Christopher H. Mehl
has been approved
by
Lynn S. Bennethum
Mehl, Christopher H. (Ph.D., Applied Mathematics)
Bayesian Hierarchical Modeling and Markov Chain Simulation for Chronic Wast
ing Disease
Thesis directed by Assistant Professor Craig J. Johns
ABSTRACT
In this thesis, a dynamic spatial model for the spread of Chronic Wasting
Disease in Colorado mule deer is derived from a system of differential equations
that captures the qualitative spatial and temporal behaviour of the disease.
These differential equations are incorporated into an empirical Bayesian hierar
chical model through the unusual step of deterministic autoregressive updates.
Spatial effects in the model are described directly in the differential equations
rather than through the use of correlations in the data. The use of deterministic
updates is a simplification that reduces the number of parameters that must be
estimated, yet still provides a flexible model that gives reasonable predictions for
the disease. The posterior distribution generated by the data model hierarchy
possesses characteristics that are atypical for many Markov chain Monte Carlo
simulation techniques. To address these difficulties, a new MCMC technique is
developed that has qualities similar to recently introduced tempered Langevin
type algorithms. The methodology is used to fit the CWD model, and posterior
parameter estimates are then used to obtain predictions about Chronic Wasting
Disease.
in
This abstract accurately represents the content of the candidates thesis. I
DEDICATION
This thesis is dedicated to my lovely wife, Jennifer. She has loved and supported
me for more than four years, putting up with, among other things, boring dinner
conversation about deer and statistics. Without her, I would never have made
it this far. This is really for you, Jen!
ACKNOWLEDGMENT
This thesis would not have been possible without the generous support of my
family, friends, and those that have provided guidance over the years. Thank you
Liz and Marcia, for collectively knowing everything. Thanks to my committee,
Lynn Bennethum, Steve Sain, and Doug Nychka, for all of their comments and
their time. Thanks to my advisor, Craig Johns, for providing his guidance and
his friendship. Most of all, I would like to thank Karen Kafadar for teaching me
most of what I know about statistics.
CONTENTS
Figures ................................................................. xi
Tables................................................................ xxiv
Chapter
1. Introduction......................................................... 2
2. Models of Epidemics.................................................. 7
2.1 Deterministic Models of Epidemics................................ 7
2.2 Bayesian Hierarchical Models of Epidemics........................... 18
3. A Spatial Model for Chronic Wasting Disease ..................... 26
3.1 Data................................................................ 29
3.2 Prevalence Model for Chronic Wasting Disease..................... 31
3.2.1 Model Motivation.................................................. 32
3.2.2 Incorporating Spatial Mixing...................................... 37
3.2.3 The Hierarchical Statistical Model................................ 43
3.3 Finding a Good Model.............................................. 48
3.4 Accept/Reject Methods for the CWD Model............................. 52
4. Markov Chain Monte Carlo Simulation for the CWD Model ........... 62
4.1 Markov Chain Methods ............................................... 62
4.2 Stochastic Dynamics Methods......................................... 69
4.3 A Tempered Langevin Algorithm for Bounded Densities ............. 75
4.3.1 Markov Chain Simulation Attempts for the CWD Model............. 78
vii
4.3.2 Overcoming the Difficulties....................................... 86
4.3.3 Tuning the MALTS Algorithm ....................................... 94
4.3.4 Test Scenarios with Known Densities............................... 96
4.3.5 Implementation for the CWD Model................................. 137
4.4 Performance of the Markov Chain Sampler........................... 139
5. Results............................................................. 152
5.1 Analysis of the Posterior Estimates.............................. 152
5.2 Model Validation................................................... 160
6. Discussion.......................................................... 168
7. Conclusion.......................................................... 181
Appendix
A. Acceptance Rates for Simulation Comparisons......................... 183
A.l Acceptance Rates for the Truncated Normal Simulations............... 183
A. 2 Acceptance Rates for the Multivariate Normal Simulations.......... 183
B. Tables for the Model Validation Simulations....................... 188
B.l Group One Simulation Results..................................... 188
B.2 Group Two Simulation Results .................................... 188
B.3 Group Three Simulation Results...................................... 189
B.4 Group Four Simulation Results.................................... 189
B.5 Group Five Simulation Results ................................... 189
B.6 Group Six Simulation Results........................................ 189
B.7 Group Seven Simulation Results...................................... 190
B.8 Group Eight Simulation Results...................................... 190
B.9 Group Nine Simulation Results....................................... 191
viii
C. Matlab Code for the Markov Chain Simulations.................. 193
C.l Miscellaneous Probability Functions........................... 193
C.1.1 Function: logmvnkern.m..................................... 193
C.l.2 Function: logtruncmvn.m.................................... 194
C.2 Code for Simulating the Truncated Normal Distributions........ 195
C.2.1 Program: truncnormcomps.m................................. 195
C.2.2 Function: mehlwitch.m...................................... 199
C.2.3 Function: langwitch.m...................................... 201
C.2.4 Function: rwwitch.m........................................ 204
C.3 Code for Simulating the Normal Distributions.................. 206
C.3.1 Program: mvnormalcomps.m.................................. 206
C.3.2 Function: mehlnormal.m..................................... 209
C.3.3 Function: langnormal.m..................................... 212
C.3.4 Function: rwnormal.m....................................... 215
C.4 Code for the Chronic Wasting Disease Model ................... 216
C.4.1 Function: pmaker3.m........................................ 216
C.4.2 Function: postfun3nopi.m................................... 219
C.4.3 Program: prevmehl7g.m..................................... 221
C.4.4 Function: metpar7g.m....................................... 224
C.4.5 Function: metpar7a.m....................................... 228
C.5 Simulation Code for Model Validation ......................... 229
C.5.1 Program: groupltest.m .................................... 230
C.5.2 Function: simrun.m ........................................ 234
C.5.3 Function: metsim7g.m....................................... 237
IX
References
243
x
FIGURES
Figure
3.1 Colorado Deer Data Analysis Units across the state. The larger
spatial units are the DAUs, the smaller units are game management
units.................................................................. 31
3.2 Graph of the derivative of p versus p. Stable points (0,1/4) are on
the p axis............................................................. 36
3.3 Proportion of deer which test positive by year for DAUs 4, 5, and 10.
The total number tested is shown in the background, also by year.
The proportion which test positive decreases as the number tested
increases, as does the variability..................................... 49
3.4 Comparisons of the posterior sample with prior distributions for parame
ters (a) pio.Oj (b) a the acceleration parameter, (c) S the long run preva
lence and (d) 7, the average propensity to migrate. The prior and his
togram approximation of the posterior have been scaled to have maximum
value of one in order to highlight the difference between the two distribu
tions.............................................................. 56
3.5 Contour plot of a by 5. Most of the values occur along a banana
shaped ridge of high probability................................... 58
xi
3.6 Contour plots of variable pairs for the acceptance sampler. Plot (a)
shows the graph of the acceleration a versus the movement parameter
7. Plot (b) shows the graph of the long term sustainable level 5 versus
7. Plot (c) shows the graph of a versus pio,o> the initial prevalence
in DAU ten. Plot (d) shows the graph of p4)o versus the acceleration
a. These graphs indicate correlation among the parameters............ 59
3.7 Contour pair plots of the variables for the acceptance sampler. Plot
(a) shows the graph of the acceleration a versus the initial prevalence
in DAU five, Plot (b) shows the graph of the initial prevalences
pio,o and p4i0 Plot (c) shows the graph of pi0,o and p5i0. Plot (d)
shows the graph of p4)0 versus the p5]0.............................. 60
4.1 Pair plot for all 300,000 iterations of the parameters a and 5 pro
duced by the standard random walk sampler. The distinctive banana
shape seen in the acceptance sample is not present. The algorithm
failed to produce the correct distribution after 300,000 iterations. . 80
4.2 Pair plots for all iterations of the variables for the standard random
walk sampler. The algorithm failed to produce the correct distri
bution after 300,000 iterations. Plot (a) shows the graph of the
acceleration a versus the movement parameter 7. Plot (b) shows
the graph of the long term sustainable level 5 versus 7. Plot (c)
shows the graph of a versus pio.o, the initial prevalence in DAU ten.
Plot (d) shows the graph of p4io versus the acceleration a........... 81
xn
4.3 Pair plot of all iterations for the parameters a and 6 produced by
the Langevin sampler. The distinctive banana shape seen in the
acceptance sample is not present. The step size was on the order of
1 x 10~20. The algorithm failed to produce the correct distribution
after 300,000 iterations............................................. 83
4.4 Pair plot of all iterations for the parameters a and 6 produced by
the tempered Langevin sampler. The distinctive banana shape seen
in the acceptance sample is not present, and the chain was unable
to venture far from its initial position at the origin. The algorithm
failed to produce the correct distribution after 300,000 iterations.
Note from the axes that the chain is tightly concentrated near its
starting value at the origin......................................... 84
4.5 Pair plots of all iterations for the variables for the tempered Langevin
algorithm. The algorithm failed to produce the correct distribution
after 300,000 iterations. Plot (a) shows the graph of the acceleration
a versus the movement parameter 7. Plot (b) shows the graph of
the long term sustainable level 5 versus 7. Plot (c) shows the graph
of a versus p10)o, the initial prevalence in DAU ten. Plot (d) shows
the graph of p^o versus the acceleration a........................... 85
4.6 Univariate normal density with mean zero, variance three. A current
state is labeled X. The higher probability area is to the left of the
current state, lower probability areas are to the right.............. 92
4.7 The bivariate normal distribution truncated on [0,1] with mean
(0.5, 0.5)t and covariance 0.001 /................................. 97
Xlll
4.8 Pairwise plots of the mean parameters 9\ and 92 for the truncated
normal distribution with true mean (0.5, 0.5)r and true covariance
0. 001 I. (a) shows 8,000 iterations for the initial value (0, 0)T. (b)
shows 8,000 iterations for the starting value (0,0.001)T. (c) shows
8,000 iterations for the starting value (0.3,0.4)T. (d) shows 8,000
iterations for the starting value (0.1,0.8)T...........................
4.9 Plots of the log potential scale reductions by thousand iterations for
the mean parameters 9\ and 92 for the MALTS simulation of the
truncated normal distribution. The solid line shows the results for
01, the dotted line shows the results for 02...........................
4.10 Plots of the log potential scale reductions by hundred iterations for
the parameters 9\ and d2 for the the simulation of the 2 dimensional
truncated normal distribution. The solid line is the MALTS algo
rithm, the dashdot is the Langevin algorithm, and the dotted line
is the standard random walk, (a) shows the results for 9\. (b) shows
the results for 92.....................................................
4.11 Plots of the log potential scale reductions by hundred iterations for
the variance of the parameters 9i and 92 for the the simulation of
the 2 dimensional truncated normal distribution. The solid line is
the MALTS algorithm, the dashdot is the Langevin algorithm, and
the dotted line is the standard random walk, (a) shows the results
for 9\. (b) shows the results for 92...................................
4.12 Plots of the log potential scale reductions by hundred iterations for
the parameters 9X and 02 for the the simulation of the 36 dimen
sional truncated normal distribution. The solid line is the MALTS
algorithm, the dashdot is the Langevin algorithm, and the dotted
line is the standard random walk, (a) shows the results for 9X. (b)
shows the results for 02............................................... 107
4.13 Plots of the log potential scale reductions by hundred iterations for
the variance of the parameters 9X and 02 for the the simulation of
the 36 dimensional truncated normal distribution. The solid line is
the MALTS algorithm, the dashdot is the Langevin algorithm, and
the dotted line is the standard random walk, (a) shows the results
for 9\. (b) shows the results for 02................................... 108
4.14 Plots of the log potential scale reductions by time for the parame
ters 0i and 02 for the the simulation of the 2 dimensional truncated
normal distribution. The solid line is the MALTS algorithm, the
dashdot is the Langevin algorithm, and the dotted line is the stan
dard random walk, (a) shows the results for 0!. (b) shows the results
for 02................................................................. 109
4.15 Plots of the log potential scale reductions by time for the variance
of the parameters 0i and 02 for the the simulation of the 2 dimen
sional truncated normal distribution. The solid line is the MALTS
algorithm, the dashdot is the Langevin algorithm, and the dotted
line is the standard random walk, (a) shows the results for 9X. (b)
shows the results for 02............................................... 110
xv
4.16 Plots of the log potential scale reductions by time for the parame
ters 9\ and 02 for the the simulation of the 36 dimensional truncated
normal distribution. The solid line is the MALTS algorithm, the
dashdot is the Langevin algorithm, and the dotted line is the stan
dard random walk, (a) shows the results for 9\. (b) shows the results
for 02................................................................. Ill
4.17 Plots of the log potential scale reductions by time for the variance
of the parameters 9\ and 02 for the the simulation of the 36 dimen
sional truncated normal distribution. The solid line is the MALTS
algorithm, the dashdot is the Langevin algorithm, and the dotted
line is the standard random walk, (a) shows the results for 9\. (b)
shows the results for 02............................................... 112
4.18 Plots of the log potential scale reductions by hundred iterations for
the parameters 9\ and 02 for the the simulation of the two dimen
sional normal distribution, with correlation p = 0.879 and step size
1 x 10~5. The solid line is the MALTS algorithm, the dashdot is
the Langevin algorithm, and the dotted line is the standard random
walk, (a) shows the results for 9\. (b) shows the results for 02. . . 115
4.19 Plots of the log potential scale reductions by hundred iterations for
the variance of the parameters 9\ and 02 for the the simulation of the
two dimensional normal distribution, with correlation p = 0.879 and
step size 1 x 105. The solid line is the MALTS algorithm, the dash
dot is the Langevin algorithm, and the dotted line is the standard
random walk, (a) shows the results for 9\. (b) shows the results for 02.116
xvi
4.20 Plots of the log potential scale reductions by hundred iterations for
the parameters 9X and 02 for the the simulation of the 36 dimensional
normal distribution, with correlation p = 0.879 and step size 1 x 105.
The solid line is the MALTS algorithm, the dashdot is the Langevin
algorithm, and the dotted line is the standard random walk, (a)
shows the results for 9\. (b) shows the results for 02................. 117
4.21 Plots of the log potential scale reductions by hundred iterations for
the variance of the parameters 9X and 92 for the the simulation of the
36 dimensional normal distribution, with correlation p = 0.879 and
step size 1 x 105. The solid line is the MALTS algorithm, the dash
dot is the Langevin algorithm, and the dotted line is the standard
random walk, (a) shows the results for 9X. (b) shows the results for 02118
4.22 Plots of the log potential scale reductions by time for the parame
ters 0i and 02 for the the simulation of the two dimensional normal
distribution, with correlation p = 0.879 and step size 1 x 105. The
solid line is the MALTS algorithm, the dashdot is the Langevin al
gorithm, and the dotted line is the standard random walk, (a) shows
the results for 0i (b) shows the results for 02....................... 119
4.23 Plots of the log potential scale reductions by time for the variance
of the parameters 0i and 02 for the the simulation of the two dimen
sional normal distribution, with correlation p = 0.879 and step size
1 x 10~5. The solid line is the MALTS algorithm, the dashdot is
the Langevin algorithm, and the dotted line is the standard random
walk, (a) shows the results for 9X. (b) shows the results for 02. . . 120
XVII
4.24 Plots of the log potential scale reductions by time for the parame
ters Q\ and 62 for the the simulation of the 36 dimensional normal
distribution, with correlation p 0.879 and step size 1 x 105. The
solid line is the MALTS algorithm, the dashdot is the Langevin al
gorithm, and the dotted line is the standard random walk, (a) shows
the results for 9\. (b) shows the results for 62....................... 121
4.25 Plots of the log potential scale reductions by time for the variance of
the parameters 9X and 92 for the the simulation of the 36 dimensional
normal distribution, with correlation p = 0.879 and step size 1 x 105.
The solid line is the MALTS algorithm, the dashdot is the Langevin
algorithm, and the dotted line is the standard random walk, (a)
shows the results for 9\. (b) shows the results for 02................. 122
4.26 Plots of the log potential scale reductions by hundred iterations for
the parameters 9\ and 02 for the the simulation of the 36 dimensional
normal distribution, with correlation p = 0.992 and step size 1 x 10~5.
The solid line is the MALTS algorithm, the dashdot is the Langevin
algorithm, and the dotted line is the standard random walk, (a)
shows the results for 9\. (b) shows the results for 62................. 124
4.27 Plots of the log potential scale reductions by time for the parame
ters 9\ and #2 for the the simulation of the 36 dimensional normal
distribution, with correlation p = 0.992 and step size 1 x 105. The
solid line is the MALTS algorithm, the dashdot is the Langevin al
gorithm, and the dotted line is the standard random walk, (a) shows
the results for 9\. (b) shows the results for 02....................... 125
XVlll
4.28 Plots of the log potential scale reductions by hundred iterations for
the variance of the parameters 9\ and 02 for the the simulation of the
36 dimensional normal distribution, with correlation p = 0.992 and
step size 1 x 105. The solid line is the MALTS algorithm, the dash
dot is the Langevin algorithm, and the dotted line is the standard
random walk, (a) shows the results for 9\. (b) shows the results for 92l26
4.29 Plots of the log potential scale reductions by time for the variance of
the parameters 9\ and 02 for the the simulation of the 36 dimensional
normal distribution, with correlation p = 0.992 and step size 1 x 10~5.
The solid line is the MALTS algorithm, the dashdot is the Langevin
algorithm, and the dotted line is the standard random walk, (a)
shows the results for 9\. (b) shows the results for 02................. 127
4.30 Plots of the log potential scale reductions by hundred iterations for
the parameters 9i and 92 for the the simulation of the 200 dimensional
normal distribution, with correlation p = 0.992 and step size 1 x 10~5.
The solid line is the MALTS algorithm, the dashdot is the Langevin
algorithm, and the dotted line is the standard random walk, (a)
shows the results for 9\. (b) shows the results for 62................. 128
4.31 Plots of the log potential scale reductions by time for the parame
ters 9\ and 92 for the the simulation of the 200 dimensional normal
distribution, with correlation p = 0.992 and step size 1 x 105. The
solid line is the MALTS algorithm, the dashdot is the Langevin al
gorithm, and the dotted line is the standard random walk, (a) shows
the results for 9\. (b) shows the results for 92....................... 129
xix
4.32 Plots of the acceptance rates by dimension for the multivariate nor
mal simulation with correlation p = 0.992 and step size 1 x 105.
The solid line is the MALTS algorithm, the dashdot is the Langevin
algorithm, and the dotted line is the standard random walk............. 131
4.33 Plots of the log potential scale reductions by hundred iterations for
the parameters 9\ and 92 for the the simulation of the 36 dimensional
normal distribution, with correlation p = 0.992, and step size 1 x
10~4. The solid line is the MALTS algorithm, the dashdot is the
Langevin algorithm, and the dotted line is the standard random walk.
(a) shows the results for 9\. (b) shows the results for 92............. 133
4.34 Plots of the log potential scale reductions by time for the parame
ters 9\ and 92 for the the simulation of the 36 dimensional normal
distribution, with correlation p = 0.992, and step size 1 x 10~4. The
solid line is the MALTS algorithm, the dashdot is the Langevin al
gorithm, and the dotted line is the standard random walk, (a) shows
the results for 0\. (b) shows the results for 92....................... 134
4.35 Plots of the log potential scale reductions by hundred iterations for
the parameters 9\ and 92 for the the simulation of the 36 dimensional
normal distribution, with correlation p = 0.992, and step size 1 x
102. The solid line is the MALTS algorithm, the dashdot is the
Langevin algorithm, and the dotted line is the standard random walk.
(a) shows the results for 9\. (b) shows the results for the variance of
0i..................................................................... 135
xx
4.36 Plots of the log potential scale reductions by time for the parame
ters and 02 for the the simulation of the 36 dimensional normal
distribution, with correlation p 0.992, and step size 1 x 102. The
solid line is the MALTS algorithm, the dashdot is the Langevin al
gorithm, and the dotted line is the standard random walk, (a) shows
the results for 9\. (b) shows the results for the variance of 9\. ... 136
4.37 Time plots of the parameters pio,o and for the Markov chain
simulation, (a) shows the plot of pio,o by iteration, (b) shows the
plot of p4io by iteration............................................ 140
4.38 Time plots of the parameters p5i0 and 7 for the Markov chain sim
ulation. (a) shows the parameter p5i0 by iteration, (b) shows the
parameter 7 by iteration............................................. 141
4.39 Time plots of the parameters a and 5 for the Markov chain sim
ulation. It is not clear from these plots that the chain has reach
stationarity. (a) shows the parameter a versus the iteration, (b)
shows the parameter 5 versus the iteration........................... 143
4.40 Contour plot of the dynamic parameters a by 5 for 250,000 itera
tions of the Markov chain simulation. The contour plot is extremely
close to the acceptance sampler plot in figure 3.5 and produces the
distinctive banana shaped ridge...................................... 144
xxi
4.41 Contour plots produced by 250,000 iterations of the Markov chain
simulation. The large number of points produces a smoother image
than the contour image produced by the acceptance sampler, (a)
shows the parameters 7 versus a. (b) shows the parameters 7 versus
6. (c) shows the parameters a versus pio.o (d) shows the parameters
a versus p4>o........................................................ 145
4.42 Contour plots produced by 250,000 iterations of the Markov chain
simulation. The large number of points produces a smoother image
than the contour image produced by the acceptance sampler, (a)
shows the parameters a versus p$fl (b) shows the parameters p4i0
versus pio,o (c) shows the parameters p^o versus pio.o (d) shows the
parameters p^to versus p4i0.......................................... 146
4.43 Plot of the potential scale reduction by iteration for each parameter.
The dashed line shows the results of the standard random walk. The
solid line shows the results for algorithm 4.4. The graphs illustrate
both the poor performance of the standard random walk and the
convergence to stationarity of 4.4. (a) shows the results for pio.o (b)
shows the results for p4)0. (c) shows the results for p5j0. (d) shows
the results for a. (e) shows the results for 5. (f) shows the results
for 7................................................................ 148
xxii
5.1 Fitted values and corresponding frequentist confidence intervals for
the proportion of infected deer in the endemic region, based on the
componentwise median of the posterior sample. Depicted are (a)
DAU 4, and (b) DAU 5. Observed prevalences are denoted with o,
95% confidence intervals are denoted by and the predicted value
is denoted by V.................................................. 154
5.2 Fitted values and corresponding frequentist confidence intervals for
the proportion of infected deer in the endemic region, based on the
component wise median of the posterior sample. Depicted are (c)
DAU 10, and (d) DAU 27. Observed prevalences are denoted with o,
95% confidence intervals are denoted by and the predicted value
is denoted by V.................................................. 155
5.3 Fitted values and corresponding frequentist confidence intervals for
the proportion of infected deer in the endemic region, based on the
component wise median of the posterior sample. Depicted are (e)
DAU 44. Observed prevalences are denoted with o, 95% confidence
intervals are denoted by and the predicted value is denoted by V. 156
5.4 Predicted prevalences from 1976 to 2250, for all DAUs based on the
median curve based on the posterior Markov chain sample.......... 157
5.5 Solid line is the median track for prevalence based on the posterior
Markov chain sample (N = 300,000). Dashed lines represent point
wise 95% credible intervals based on Markov chain sample, (a) shows
DAU 3. (b) shows DAU 4. (c) shows DAU 16, which borders the
endemic region, (d) shows DAU 17................................. 158
XXUl
TABLES
Table
3.1 Neighborhood relationships between DAUs. Digits refer to DAU
number in Figure 3.1.............................................. 40
3.2 Neighborhood relationships between DAUs. Digits refer to DAU
number in Figure 3.1.............................................. 41
3.3 95% Bayesian credible intervals, posterior estimates, and modes for
the marginal posterior densities simulated using the acceptance sam
pler. All values have been multiplied by 100......................... 55
4.1 Values of the KolmogorovSmirnov test statistic and the correspond
ing pvalues comparing the acceptance sample with the Markov chain
sample for each parameter........................................... 149
4.2 95% Bayesian credible intervals and Monte Carlo estimates for pos
terior density from the Markov chain simulation. All values have
been multiplied by 100.............................................. 150
5.1 Results of the group 3 simulation, with p1)0 = 0.05, a 0.10, 5 =
0.50, and 7 = 0.01. Values are the average result of 50 pseudo data
sets................................................................ 164
5.2 Results of the group 4 simulation, with pli0 = 0.05, ^2,0 = 0.01,
a = 0.10, 5 = 0.05, and 7 = 0.01. All of the true parameter values
are captured by the credible intervals. Values are averaged over 50
pseudo data sets.................................................... 165
xxiv
A.l Acceptance rates for the multivariate truncated normal distribution. 183
A.2 Acceptance rates for each dimension for the multivariate normal dis
tribution with correlation p = 0.879. The step size is 1 x 10~5. . . 184
A.3 Acceptance rates for each dimension for the multivariate normal dis
tribution with correlation p = 0.879. The step size is 1 x 10~4. . . 184
A.4 Acceptance rates for each dimension for the multivariate normal dis
tribution with correlation p = 0.879. The step size is 1 x 102. . . 185
A.5 Acceptance rates for each dimension for the multivariate normal dis
tribution with correlation p 0.970. The step size is 1 x 105. . . 185
A.6 Acceptance rates for each dimension for the multivariate normal dis
tribution with correlation p = 0.970. The step size is 1 x 104. . . 185
A.7 Acceptance rates for each dimension for the multivariate normal dis
tribution with correlation p = 0.970. The step size is 1 x 10~2. . . 186
A.8 Acceptance rates for each dimension for the multivariate normal dis
tribution with correlation p = 0.992. The steps size is 1 x 10~5. . 186
A.9 Acceptance rates for each dimension for the multivariate normal dis
tribution with correlation p = 0.992. The steps size is 1 x 10~5. . 186
A. 10 Acceptance rates for each dimension for the multivariate normal dis
tribution with correlation p = 0.992. The step size is 1 x 104. . . 187
A. 11 Acceptance rates for each dimension for the multivariate normal dis
tribution with correlation p = 0.992. The step size is 1 x 102. . 187
B. l Results of the group 1 simulation, with pli0 = 0.05, a = 0.10, J =
0.05, and 7 = 0.01.................................................. 188
xxv
B.2 Results of the group 2 simulation, with pii0 = 0.05, a 0.10, d =
0.15, and 7 = 0.01..................................................... 189
B.3 Results of the group 5 simulation, with p10 = 0.05, Â£>2,0 = 0.01,
a = 0.10, 5 = 0.15, and 7 = 0.01....................................... 190
B.4 Results of the group 6 simulation, with plj0 = 0.05, p2,o 0.01,
a = 0.10, 5 = 0.50, and 7 = 0.01....................................... 190
B.5 Results of the group 7 simulation, with pi>0 = 0.01, a = 0.10, d =
0.05, and 7 = 0.01..................................................... 191
B.6 Results of the group 8 simulation, with p\$ = 0.01, a = 0.10, S =
0.15, and 7 = 0.01..................................................... 191
B.7 Results of the group 9 simulation, with pito = 0.01, a = 0.10, d =
0.50, and 7 = 0.01..................................................... 192
1
1. Introduction
One of the most fascinating and important branches of population biology is
the modeling of contagious phenomenon, or epidemics. The study of epidemics
is integral to the survival of humanity. The potential impact of a disease as it
moves through a society can be frightening. For example, in fourteenth century
Europe, 25 million people died from the Black Death (pulmonary plague, out of
a population of about 100 million. An estimated half of the 3.5 million Aztecs
died from smallpox. More recently, in 1919, more than 20 million people died
globally in the influenza pandemic [63, 74], Today, there is widespread awareness
and attention given to diseases such as AIDS [52, 63], as well as noninfectious
illnesses such as heart disease and cancer. There are also widespread cases of less
glamorous diseases such as malaria, which has an estimated 350 million cases in
endemic areas [63, 74],
The potential devastation that can be caused by an infectious disease has
caught the attention of both the news media and the public. Potential biological
attacks aside, natural diseases still pose a grave threat to the public. Recent
outbreaks of Severe Acute Respiratory Syndrome (SARS), West Nile Virus, and
avian influenza have all grabbed headlines and prompted the reaction of public
health officials across the world. Even illnesses that do not kill are costly to
society in terms of lost productivity and reduction of the quality of life. Under
standing how diseases evolve dynamically and spatially has and will continue to
help public health officials combat these negative effects. Mathematical mod
2
els that describe the temporal behaviour and the spatial spread of disease are
important in this process.
Mathematical models of epidemics represent a simplification of the biological
processes involved, but they are nonetheless important, because they enable
researchers to understand disease dynamics and to predict the potential impact
of an epidemic. Complicated biological systems often have features that can be
reasonably modeled by relatively simple mathematical systems. Epidemics have
been studied using deterministic, statistical, and stochastic modeling techniques.
These three methods originated at different times and for different reasons, and
at times they seem unrelated, but they have been applied jointly with great
success [33, 63, 17, 52],
Deterministic models describe the spread of disease through systems of or
dinary or partial differential equations [63, 33, 79]. In building a deterministic
model the goal is often to give the model as much freedom as necessary to make
it biologically plausible. Thus a deterministic model can have a large num
ber of parameters, and is judged by the qualitative behaviour it exhibits and
its explanatory capabilities. Little regard is given to available data or parsi
mony. Unfortunately, good but complicated models are often impossible to
solve explicitly, so a modeler employs qualitative analysis or numerical approxi
mation. Many deterministic models begin as simple population models, such as
the logistic equation, and are derived as particular cases of multiple species com
petition models, such as the LotkaVolterra predatorprey model. Variations on
these models include timelag differential equation models and partial differen
tial equations that incorporate age structures. Spatial population processes can
3
be modeled deterministically using discrete spatial structures such as lattices.
More complicated spatial models use deterministic versions of diffusion equa
tions [51, 63, 33]. An important feature of differential equations of epidemics is
that simple models with a small number of parameters often have qualitative
features that are similar to those of more complicated models. This point will be
exploited later in this thesis for building a model for Chronic Wasting Disease.
Stochastic models describe the spread of disease using stochastic differential
equations such as contact processes, diffusion process, or continuous birth and
death process [52, 63, 42], Stochastic models consist of a dynamic model with
random parameters that contain a noise term. These models are often difficult to
solve explicitly. The goal is to capture, as fully as possible, the random nature of
population systems, which can result in models that are mathematically rigorous
but with no biological interpretation [63]. Simplifying assumptions can make
the model tractable, such as the assumption that the noise term is a Brownian
motion [41]. Although these assumptions are mathematically desirable and make
the model both computationally feasible and easier to manipulate, they may
not be biologically appropriate and do not incorporate available data. Often,
however, analytical approximations lead to workable models with features that
are both biologically interpretable and mathematically elegant [51, 63].
Critical to statistical models of epidemics is the direct incorporation of data
to the model [52, 17, 54], The primary considerations for a statistical model are
the type and amount of data, and availability of covariate information. Limited
data availability can make the use of many of the more common deterministic
or stochastic models infeasible, since these models often have an abundance of
4
parameters. Statistical models typically have other considerations as well, such
as the difficulty of fitting models with nonstandard distributions, and the rela
tive ease and benefit of fitting models with normally distributed data. In many
instances, parameters are transformed in an attempt to achieve a model with
normality or linearity, simplifying the calculations for the model. One of the
great benefits of statistical modeling is the ability to reduce the variability in
the parameter estimates through the use of covariates and the introduction of
other explanatory variables. However, transforming the parameter space and
including the explanatory variables and covariates can sometimes come at the
loss of direct interpretability. For example, some of the spatial disease models
employed now do not estimate transmission rates directly, but instead model the
logrelative risk, since the transformed space can be modeled as a Gaussian au
toregressive process [17, 54]. Through the use of Bayesian hierarchical modeling,
the modeler can develop extremely complex but interpretable models. Bayesian
hierarchical models relate multiple parameters that are connected by the struc
ture of the problem. For statistical epidemic models, a priori knowledge of the
qualitative behaviour of the disease is of less concern than for deterministic or
stochastic models. The ease with which the data can be used to fit the model
and the confidence of the parameter estimates are of greater concern. It is the
data that drive the modeling process.
In the body of this thesis, these three modeling paradigms are discussed in
more detail, with the emphasis on the Bayesian hierarchical modeling aspect
of statistical modeling. A dynamic spatial model for Chronic Wasting Disease
(CWD) in Rocky Mountain Mule deer is introduced. To date, a spatial statisti
5
cal model for this disease has not been presented. This model is derived from a
system of differential equations, and describes the spatial dynamics directly in
the differential equations, rather than through correlations in the data. Because
the posterior distributions for the CWD model have some unusual characteris
tics, a new Markov Chain Monte Carlo simulation technique is constructed that
utilized qualities from existing techniques to efficiently sample distributions that
live on bounded parameter spaces. The properties of the new MCMC technique
are examined and compared to existing methods. The model is fit using the new
MCMC technique. The results of the model are discussed and interpreted, and
both the modeling method and the simulation method are empirically validated.
Throughout the paper, [X] denotes the marginal distribution of the random
variable X, [XT] denotes the conditional distribution of X given Y. The mul
tivariate normal distribution with mean jj, and covariance matrix E is denoted
by N(n, E). For matrices, A~l denotes the inverse matrix, and AT denotes the
transpose.
6
2. Models of Epidemics
Epidemic modeling is an intensely interdisciplinary subject, drawing to
gether a variety of mathematical and statistical disciplines. Modeling approaches
range from stochastic processes to deterministic models to empirical data anal
ysis. These approaches are generally guided by biological considerations that
help to construct a model and to determine its effectiveness. The approaches
that are most important in the construction of a model for Chronic Wasting
Disease are deterministic and statistical models.
2.1 Deterministic Models of Epidemics
Deterministic models describe the behaviour of dynamic systems through
the use of ordinary or partial differential equations. The study of dynamics
systems has its origins in the mid1600s, with Sir Isaac Newtons work on grav
itation and planetary motion [75]. Newtons first work on the subject involved
the twobody problem of calculating the motion of the earth around the sun.
Later efforts at extending Newtons methods to the threebody problem for
example, the motion of the sun, earth, and moon led to an eventual realiza
tion that some systems of differential equations were analytically unsolvable.
However, unsolvable models can still be analyzed qualitatively, and this char
acteristic is why these models are still used in many applications. For detailed
discussions of the qualitative analysis of differential equations, see [83, 8, 75].
Perhaps the earliest use of mathematics to deterministically model a pop
ulation can be traced to 1202, when Leonardo Fibonacci proposed the famous
7
sequence of integers to model a rabbit population [33]. The subjects first im
portant early application did not surface until the exponential growth model
introduced in 1798 by Thomas Malthus. The exponential growth model re
sults as the solution to Malthus constant rate of growth assumption, and is a
standard example of the simple separable differential equation:
(2.1)
where N is the size of the population, and r is the growth rate.
Malthus exponential model is a linear differential equation, and hence the
solution predicts unbounded growth or extinction, depending on the choice of
parameters and the initial conditions. Neither unbounded growth nor extinction
occurs in most populations, which typically tend towards a long term stable
population level, called the carrying capacity. Thus equation 2.1 is only useful
as a short term population model.
The need for a selflimiting population model led to the logistic equation for
growth, which was proposed initially by Pierre Verhulst in 1838 [63]. Verhulsts
model is more realistic than the Malthus model as a long term description of
population growth, and it incorporates a carrying capacity parameter K. The
logistic model is now one of the most commonly used models in mathematical
biology, and variations appear in many different applications. It is a relatively
simple nonlinear model, and it exhibits qualitative behaviour generally observed
in many populations. Verhulst suggested the model in 1838 to describe human
populations, although it was not until 1920 that his work received notice, when
Raymond Pearl searched for a model which has solutions characteristic of a
sigmoidal curve and empirically constructed Verhulsts logistic equation [56].
8
However, the logistic model was not derived analytically until 1925 by Alfred
James Lotka [45]. A nice derivation of the logistic model is provided in [63].
The logistic model with its sigmoidal curve will be the starting point for the
Chronic Wasting Disease model presented in this thesis. The logistic model can
be written as:
dN
j = rN(N K), (2.2)
where K is the carrying capacity, r > 0 is the constant growth rate, and N is the
size of the population. The form of equation 2.2 is relatively simple and leads to
an explicit nonlinear model for population growth. The model assumes that the
growth rate r is constant, as is the carrying capacity K. These assumptions are
somewhat simplistic, in that growth rates typically vary with weather conditions,
by regions, or due to other factors. The model can be extended in a variety of
ways to include these factors. One implied property of the simple logistic model
is that new members of the population immediately contribute to the growth,
which is not true for populations that reproduce sexually. Members of these
populations must reach a certain age before they are able to reproduce. There
are several ways of incorporating this property into the differential equation 2.2.
One method is by adding a timedelay term to 2.2, which leads to equation 2.3.
*E = rN(N{ttd)K). (2.3)
Though somewhat crude, the logistic equation 2.2 is useful in providing a
qualitative understanding of simple growth of a single population under limited
resources. One generalization of the logistic model that is important to epidemi
ology is the class of models broadly called competition models. These models
9
describe the interaction of two or more species, and are a simple extension of
the onespecies logistic model. The general two species model is:
tt ~ Ni(r\ t aiiNi I a\2N2) (2.4)
at
N2{r2 + a2i N\ + a22N2),
where Ni, is the size of population one, N2 is the size of population two, ai2
and a2i are interaction terms representing the competition between species for
limited resources, the parameters on and a22 represent the within species com
petition, and ri and r2 are the net birth/death rates for each species. The
model 2.4 is extremely versatile for describing two species interactions, and can
easily be extended to three or more species. Because each of the six parameters
can be zero, positive, or negative, for no effect, enhancement, or impedance, re
spectively, there are a total 36 = 729 variations for the two species model alone,
and each combination leads to a different configuration with a different interpre
tation [63]. The focus here is on two species models, both because the numbers
of variations are even larger for more species (33n for n species), and because
they are of greatest importance in epidemic modeling. Various configurations
for the two species model using the two by four array 2.5
sign(ri)  sign(an) sign(a12) ^
^sign(r2)  sign(a2x) sign(a22).^
It should be noted, however, that not every mathematical model has a biological
interpretation. The variations of 2.4 and 2.5 with the greatest significance are
those that describe biological competition (species competing for the same lim
ited resource), scavenging, symbiosis, and most importantly for epidemiology,
2.5
10
predatorprey relationships. A good examination of several competition models
is provided by [63] and [79].
The predatorprey relationship is represented by the array 2.6, with N\ being
the number of prey and N2 being the number of predators.
Setting an = 022 = 0 removes the within species competition, so that the growth
of the predators is limited only by the availability of prey, and the growth
of the prey is limited only by the number of predators. The first predator
prey competition model was constructed independently by Alfred Lotka in 1925
[45] and Vito Volterra in 1926 [77]. The LotkaVolterra model is important to
epidemic models and will be developed in more detail. There are, in fact, several
kinds of predation scenarios. These include parasitism and cannibalism, but the
most common is that in which one species feeds upon another.
Equation 2.7 shows the LotkaVolterra model, where all coefficients are assumed
to be positive. In the absence of predators {N2), the prey (N\) will multiply
indefinitely at rate rq, while in the absence of prey, the predators will die off at
rate r2. The predators kill off the prey at rate 61, while the skill of the predators
in catching their prey is represented by b2.
Qualitative analysis of the LotkaVolterra model reveals that for any initial
position (Vi(0), N2(0)), the model gives a family of closed curves, which implies
(2.6)
^ = N1(r1 + b1N2)
(2.7)
~7i~ A^2( r2 + b2Ni).
at
11
cyclic behaviour [63]. This means that as the number of prey increases, the
number of predators will increase, until the prey are killed off faster than they
reproduce. Then the predators, lacking food, will decrease also, and the cycle
will then repeat itself.
As mentioned before, most variations on the two species competition model
have no biological interpretation. To construct a workable model for epidemics,
however, the LotkaVolterra model only needs to be altered slightly. Let S = Ni
be the number of susceptible individuals in the population, and let I = N2 be
the number of infected individuals (infectives). For a simple epidemic, the birth
rate of susceptibles, 77, is set to zero. Setting b\ = b2 (3 > 0, and r2 = 7 > 0,
the LotkaVolterra epidemic model is:
dS_
dt
d'I
dt
= /3SI
= I3SI 71.
(2.8)
Equation 2.8 is a special case of the predator prey model 2.7. The funda
mental difference lies in the interpretation of the parameters. Since the death
of a susceptible results in the birth of an infective, for this model, ft can be
interpreted as the infection rate. The parameter 7 is interpreted as the death
rate or removal rate for the disease. The LotkaVolterra epidemic model assumes
that the population experiences homogeneous mixing, that is, every contact pair
individuals in the population is equally likely to meet. This model does not
allow for any new susceptibles to be introduced to the system, so the time frame
is necessarily short. Thus the disease modeled by equation 2.8 will eventually
burn out or run its course, since there are no new susceptibles being added to
12
the population. When only the general dynamics of the disease are of interest,
it is not uncommon to scale both parts of the equation by the total popula
tion N(t), so that both S and I represent the proportion of the total that is
susceptible and infective, respectively.
Model 2.8 can also be represented concisely using a flow chart or compart
mental diagram, which shows each class and indicates the rates at which individ
uals flow into and out of a particular class. Thus equation 2.8 can be represented
as:
gfiS I > j~yl
The arrow between S and I indicates that individuals are flowing out of the
susceptible class and into the infective class at a rate of /SSI, and out of the
infective class at a rate of 71.
Equation 2.8 is a simple model of an epidemic that occurs over a relatively
short period of time, such as influenza. The period over which the disease evolves
is short enough that the birth rate of the susceptible population is negligible. If
the disease has a longer incubation period, a net birth term a can be added for
susceptibles. This yields the equation:
^ = aS (3SI (2.9)
at
fr?31*1
Using the compartmental description, we have:
aS
Equation 2.9 allows the disease to evolve over a longer period of time. This
enables exploration of deeper questions about a disease, such as whether or not
13
the disease has long term sustainability. If the growth of the disease depends on
both susceptibles and infectives, and if the infection rate is too high compared to
the birth rate a, then susceptibles will be removed faster than they are replaced,
causing S to approach zero, thus the number of infectives will also reach zero
as there are no susceptibles left left to infect. Since the infectives die off faster
than they can be replaced, the disease will burn out. Similarly, if the death
rate of the disease, 7, is too high compared to the infection rate, then infectives
will die off faster than they can be replaced by newly infected individuals, again
causing the disease to burn out.
There are many examples of diseases that are unable to sustain themselves
over long periods of time. Measles, for example, will typically run its course
too quickly to sustain itself long term, although other factors such as human
response to control the disease, affect this as well. More deadly diseases also
fall into this category. One example is Ebola, which causes hemorrhagic fever,
leading to massive internal and external bleeding in its victims. The course of
infection is 2 to 21 days, ending in death for up to ninety percent of victims.
So far, the disease has progressed so quickly that it has been unable to sustain
itself in human populations for long periods of time.
Some infectious diseases can sustain themselves, however. One example is
Acquired Immunodeficiency Syndrome (AIDS) caused by the human immunod
eficiency virus (HIV). HIV can have a lengthy incubation period, and is spread
only through fluid contact with mucous membranes or broken skin. The dis
ease can take years to run its course, during which time the carrier can infect
others. The relationship between the infection rate and the death rate are such
14
that the disease is easily able to sustain itself. Other examples of sustainable
diseases include nonlethal infectious diseases such as influenza or the common
cold. These diseases run their course over a period of weeks, and the removal
rate is low. However, the fact that these diseases do not typically kill requires
an alteration to equation 2.8.
Equation 2.8 is the simplest of the SEIR class of epidemic models. These
initials stand for susceptible, exposed, infective, and removed (or quarantined).
The addition of the other classes gives these models a wide range of descriptive
capability, allowing the description of nondeadly diseases as well. For example,
the KermackMcKendrick model adds a quarantined class to describe diseases
that are not deadly or for which victims build immunity [33]. As before, this
model imposes the restriction that exposed susceptibles immediately become
infected. Thus, the system of differential equations is:
Â§=*s/
dQ ,
dT = 7'r'
(2.10)
Infectives no longer necessarily die, but once they recover from the disease they
are no longer susceptible and become quarantined. This model can also account
for the situation where a disease management strategy is in place, and infected
individuals are removed from the population and isolated. Once in the quar
antined group, they are no longer considered susceptible, either because they
are immune to the disease, or because they are isolated. Model 2.10 can be
15
represented using the diagram:
gfiSI^pI^Q
Because there are no additions to the susceptible class, the time period for this
model is short, and the epidemic eventually passes. The addition of a birth rate
is required to describe a disease that would last a longer period of time, possibly
reaching a sustainable level. Because quarantine models typically assume that
an epidemic is in progress and that a control strategy is being applied, it does
not include a birth rate.
A variation on this model includes an exposed class. This model was initially
proposed for measles, where exposed susceptibles do not immediately become
infectives. This occurs when a disease has an incubation period from the point
when an individual is exposed to the disease to the point when that individual
develops the disease [33]. The model can also include a birth rate for suscepti
bles, and allow for long term sustainability. Let r be the length of the incubation
period, and let a be the duration of the disease. Then the model is:
a^spsi^Et^f>r_
Even if the incubation period is constant, the resulting differential equations are
difficult to solve analytically, and must be approximated numerically. The fixed
values of r and a lead to timedelay differential equations, which can be difficult
to solve analytically. An in depth analysis of this model is given in [33].
Epidemiologists and public health officials are not only concerned with how
a disease evolves temporally, but also how it evolves spatially. From the earliest
times, people have been aware that there was a spatial component to disease.
16
During the Black Death epidemic of fourteenth century Europe, residents of
urban areas where the plague had struck would flee to the country, hoping that
the isolation would save them from the disease. Public health professionals are
aware that it is just as important to know how a disease spreads geographically
as it is to know the diseases dynamic characteristics infection rate, etc.
Spatial models of epidemics describe the spread of the disease across a ge
ographic region. There are several ways of modeling the spread within the
context of differential equations models. Without discrete spatial units, a typ
ical method of spatial modeling is through a stochastic diffusion process model
[51, 52, 63]. Nondiscrete spatial systems can also be modeled deterministically,
using partial differential equations similar to those used in modeling reactive
contaminant transport in porous media [18]. These equations are deterministic
analogs to stochastic diffusion models.
When the spatial regions are discretized, a model can be built by first con
structing an epidemic model such as equation 2.9 within each spatial unit, and
then adding a term in each equation that models the interaction between that
unit and every other unit. If there are i = l,...,iV discrete spatial units, then
for each unit i:
^  PM + Siiun + ... + uiN) (2.11)
~ PiSJi ~ Jili + Ii(Vn t ... + Vix)
where u,j is the net migration of susceptibles from unit j to unit i, and Vij is the
net migration of infectives from unit j to unit i [63]. These models are relatively
easy to construct, but the addition of the migration rates, Uij and make the
17
solution to the system of equations analytically intractable.
Even though an analytical solution is not available in most cases, equa
tion 2.11 lends itself to numerical solutions or approximation by difference equa
tions. The firstorder Euler approximation is the easiest approximation to use,
though higherorder approximations also exist. A discussion of most approxi
mation methods can be found in [83]. Numerical approximations are typically
designed with the assumption that the parameters in the differential equation
are known, and the goal is to generate values of the process. This is the re
verse of the statistical setting, where, in most cases, the data consists of discrete
observations of the process, with the parameters unknown. The focus in this
thesis is therefore on difference equation approaches, because these more easily
fit with data types typical of epidemics.
2.2 Bayesian Hierarchical Models of Epidemics
The Bayesian hierarchical modeling strategy has proven to be extremely
effective for modeling complex processes with multiple parameters [82, 78, 81].
Hierarchical models allow complex relationships between multiple parameters to
be separated into several levels. The hierarchical framework helps the analyst
to understand the underlying process linking to data to the model, and to de
velop computational strategies to simulate the desired posterior distributions.
Except in rare cases, simulation is necessary. For an introduction to hierarchical
modeling, see [24].
The overall strategy of the Bayesian hierarchical model has been informally
but concisely outlined in [3] and [17]. The strategy provides a link called the
process between the observed data and the unobserved parameters of interest,
18
denoted by 9.
[DataProcess, 9]
[Process 19]
[*]
(2.12)
The top layer is the observed data, which is modeled by an appropriate likelihood
function. It is assumed that the data is generated by an unknown process, for
example, an epidemic process. The process depends on unobserved parameters,
as shown in the middle layer. It is in this middle layer that the art of statistical
modeling takes place. On the bottom layer are the prior distributions that
represent a priori beliefs about the parameters.
Typically interest centers on the joint posterior distribution of the parame
ters, that is, the conditional distribution of the parameters, given the observed
data.
[0Data,Process] oc [0] [Process0][ DataProcess,0] (213)
The posterior distribution, shown in equation 2.13, is found using Bayes Theo
rem.
Modern disease data typically comes in the form of counts. For human
diseases, the data are numbers of occurrences per unit time. If the data has
a spatial component, then it is grouped by spatial unit as well. This type of
data is often modeled using Poisson random variables. Disease data can also be
binomial in nature, that is, number of positive cases out of a total. This type of
data can also be grouped by spatial and temporal units. The groupings in each
case may not be time and space. For example, the data could be grouped by
19
time and age [5].
Works such as [17], [54] and [25] implement the Bayesian hierarchical strat
egy for an epidemic model where the spatial dependency is modeled as a Markov
random field. These models deal with areal data that is, data aggregated over
discrete spatial regions. Data consists of the number of occurrences of a disease
in a spatial unit, aggregated over some unit of time as well. The count data is
modeled as a Poisson random variable:
[yit\Ei, zit] ~ Poisson^e***), (2.14)
where E, is the expected number of occurrences in spatial unit i, and Za is the
logrelative risk, which accounts for the deviation from the expected number of
cases. The expected number of occurrences per unit area, Ei, can be obtained
from overall death rates or prior, expert knowledge of the disease [50].
Because Poisson data cannot easily generalized to a multivariate setting and
thus cannot directly incorporate spatial effects, the spatial and temporal process
is often modeled in Za Covariate information, denoted by xtt, is added via a
linear model. This yields:
zit = xjta + sit (2.15)
where sit follows a multivariate firstorder autoregressive Gaussian time series,
defined by:
s t Hsti et. (2.16)
The term e< is a multivariate noise term, with et ~ ,/V(0,Â£). The model in
equations 2.14, 2.15, and 2.16, has been used successfully to model spatially
20
dependent count data from a variety of epidemics [54, 16, 17, 50, 78] and other
sources [34].
The dynamic nature of epidemics, especially the change from growth to de
cline to stability, has resulted in a variation of the previous model to describe
an influenza outbreak by [54, 17]. The term et is altered to become the epi
demic forcing term, causing an increase and decrease over the course of the
epidemic. This is done by introducing another parameter Pp(t) as the mean in
the distribution of et.
Suppose that over a time interval to < ti < t2 the number of occurrences
of a disease grows from a state of stability to a full fledged epidemic, and then
declines down to a stable state as the epidemic burns out 2.2. As the epidemic
takes off and increases in intensity, the parameter (3p(t) moves from the value
Po to After it reaches its peak and begins to subside, Pp(t) changes from Pi
to p2. Finally, as the epidemic passes, the value of Pp(t) returns to Po. p(t) is
an index parameter indicating the stage of the epidemic. p(t) = 0,1, or 2 for
stability, growth, or decline, respectively. Thus, Pp(t) is:
Pp(t) <
\
Po, t < t0
P\ to < t < ti
P2 u ^ Â£ < t2
Po, t>t2
Using Pp(t), the error term is then et ~ N(PP(t)l, Â£). This has the result of
turning the epidemic on and off.
The spatial dependencies in the data are built into the structures of the
matrices H and E. This covariance matrix has the form E = a2(I cf>C)~lM,
21
where M is a diagonal matrix containing conditional variances, C is a matrix
of partialregression coefficients with zeros on the diagonal, and (j> is chosen so
that Â£ is positive definite. The form of the covariance matrix is common to the
conditional autoregressive spatial models discussed in [16, 4], The structure of
the covariance matrix is found by making use of the Markovian property of the
spatial dependency. For this particular model, E~l are the diagonal entries of
M, and the entries of the matrix C are given by c^ = (Â§i)1/^2, and zero for i ^ j
[17, 54].
Spatial structure is also built into the matrix of autoregression parameters,
H. This is done by choosing a neighborhood structure for the model. For a given
spatial unit i, it must be determined which of the other units j are neighbors of
unit i, and whether they are first order, second, third, etc. neighbors. For those
units that are not in the neighborhood of i, hij is set to zero. For the influenza
model in [54, 17], the neighborhood structure is chosen to include up to second
order neighbors, and ha r)0, htj = r]i if units i and j are first order neighbors,
and = 772 if units i and j are second order neighbors. The parameters rjk are
then transformed so that a Gaussian prior distribution may be used. To finish
the modeling process, [54, 17, 78] choose appropriate prior distributions for the
remaining parameters, and simulate the resulting posterior distribution.
In the above model, the spatial dynamics of the disease are modeled through
correlations in the transformed parameter space, or by spatial dependencies in
cluded in the autoregressive coefficients, again on the transformed space. This
approach is convenient from a statistical point of view. The transformed param
eters are normally distributed, making computation and analysis of the spatial
22
and temporal dependencies easier. Standard differential equation models de
scribe the dynamics of an epidemic process in terms of numbers of susceptibles
and infectives. Counts are modeled using Poisson or Binomial random variables,
and there are no common multivariate generalizations of the binomial or Poisson
distribution.
For binomial count data, the general strategy is the same. Suppose that
Nn is the total number of individuals in spatial unit i at time t, and let p,( be
the probability that a given individual has the disease. Let yit be the observed
number (out of No) that have the disease. Then we can use the model
[Vit\NiuPit\ ~ Bin(Nit,pit). (2.17)
This type of data is commonly modeled using autologistic models. An example is
a model for prostate cancer given in [5]. Using the transformation Â£it = ln(^^)
yields:
iit + 9i + + V7it + (218)
where 9,, 9t, and ipu are observed covariates that explain temporal, spatial, and
interaction effects. The error term tn follows a normal distribution, with mean
zero and variance v.
The autologistic model can be generalized to include other covariate infor
mation in the equation 2.18. The error term can also be modeled as a Gaussian
autoregressive process, similar to 2.16. Let et be the error term aggregated over
spatial regions, so that:
et Heti + i'I, (2.19)
23
so that ef ~ N(Heti,iyI), and the matrix of autoregression parameters, H,
includes spatial structure as in 2.16.
Both of these models share the general strategy of modeling spatial and
temporal dynamics indirectly by transforming to a convenient parameter space.
While this can be effective, as in [17, 54, 78], it does not directly answer the ques
tion of how fast is the disease moving? The Poisson conditional autoregressive
models of [78], [17], and [54] use the Poisson approximation to the Binomial dis
tribution and therefore rely on large sample sizes and relatively small infection
rates. The practicality of these types of models is examined in [28].
Hierarchical Poisson models have been used to model domestic animal dis
eases as well. An example is a hierarchical model for clinical mastitis in herds
of dairy cattle given in [64], Let Xj be the number of cases of mastitis in herd
i. The hierarchical specification is:
Xi Poisson(Xi) (2.20)
AjGamma(a, Pi)
Pi ~ Gamma(a, b)
where A* is the underlying infection rate in herd i, and Pi is the spatial (herd)
explanatory variable, and a, b, and a are hyperparameters. As in the influenza
and prostate cancer models described above, a conditional autoregressive model
could be employed for the infection rate. A Poisson model is appropriate in
this situation, because the number of cattle that are infected can be accurately
determined. For epidemic models of wild animals, this is not usually the case.
24
Modeling epidemics in wild animal populations presents difficulties that hu
man epidemic models do not ordinarily have. The largest difficulty is the fact
that cases of a disease are not reported. Accurate data on the number of in
fected individuals is rarely available for wild animal populations, so researchers
instead use prevalence data. Prevalence data consists of the number of animals
that test positive for a disease, M, out of the total number tested, N. Thus a
binomial distribution is natural.
For the Chronic Wasting Disease prevalence data, sample sizes are small,
and so a Poisson approximation with its convenient transformation to normality
cannot be employed. Additionally, there is not a large amount of specific data
on transmission rates. What is available, in addition to the prevalence data,
is a qualitative understanding of the behaviour of CWD. Using this general
understanding of the disease, a differential equations model that has the same
qualitative behaviour can be derived. This ODE model will be the basis for a
statistical model that directly uses the available prevalence data via a Bayesian
Hierarchy.
25
3. A Spatial Model for Chronic Wasting Disease
Chronic Wasting Disease (CWD) is a transmissible spongiform encephalopa
thy (TSE) that occurs in wild cervid populations in North America. It is found
in both mule deer (Cervus odocoileus spp.) and the North American elk, also
known as the wapiti (Cervus elaphus nelsoni) [49, 27]. CWD is related to
other TSEs found in domestic animals, such as scrapie in sheep and bovine
spongiform encephalopathy, commonly called mad cow disease [49]. It is also
related to TSEs found in humans, such as classic CreutzfeltJacob disease and
variant CreutzfeltJacob disease. The exact agent that causes CWD has not yet
been positively identified, and the method of transmission is currently unknown.
However, the disease is accompanied by the presence of mutant proteins called
prions in the nervous tissue[14, 49, 27, 31, 32].
Chronic Wasting Disease is always fatal, causing damage to portions of the
brain in infected animals. Animals with CWD show a progressive loss of body
condition, behavioral changes, and eventually death [14, 27]. The course of
infection in mule deer appears to include both latent and clinical periods, and
spans from 18 to 36 months. Once clinical signs appear, few deer survive more
than 12 months [27].
Thus far, chronic wasting disease has been far more common in deer than in
elk. It has been estimated that in infected regions, between 5% to 6% of mule
deer have CWD, compared to only about 1% in elk [27, 14], although recent
data suggests that the infection rate in elk may be increasing [14]. While both
26
mule deer and whitetail deer can contract the disease, in the Rocky Moun
tain region the epidemic seems to be focused in those areas where mule deer
(Odocoileus hemionus) are common and few, if any, whitetailed deer are found.
Consequently, research and data collection on the dynamics of the disease in
the Rocky Mountain area have focused on mule deer [27, 31]. The research that
has been done on whitetailed deer suggests that the species responds similarly
to CWD, but a clustering effect is more likely, since whitetailed deer are more
social than mule deer.
Chronic Wasting Disease was first found near Fort Collins, Colorado, in
1967, and has been spreading over the past two decades. The disease has been
making its way across Colorado and has recently crossed the continental divide
[14]. The incidents of the disease in Central and Western Colorado have been
isolated thus far, although they are occurring with greater frequency. Currently,
the disease appears most concentrated in the region encompassed by Northeast
ern Colorado and Southeastern Wyoming, although there have been recent hot
spots in Wisconsin, South Dakota, Nebraska, and a few reported cases in Utah
and New Mexico.
Since hunting is such big business in many states, the potential consequences
of the CWD epidemic are great. A recent article in the October 24, 2003, edition
of the Denver Post reported that hunting added $599 million to the economy of
the state of Colorado in 2002. The same article estimated the potential economic
impact of CWD nationwide to be $100 billion. Thus, Chronic Wasting Disease
is recognized nationally as a potentially disastrous problem, and every state
except Hawaii has implemented a CWD surveillance program.
27
Current efforts for modeling the spread of chronic wasting disease have fo
cused on the simulation of individual interactions between deer using standard
epidemic models such as a birthdeath process [27] and modeling the spread as
a diffusion process [31], and these models have provided insight into the disease
dynamics. The disease has been spreading throughout the Rocky Mountain Re
gion for more than 30 years, but a statistical model with spatial components
has yet to be presented.
There are several standard differential equations that are commonly used as
epidemic models. These are described previously in this thesis in section 2.1.
One such is the adaptation of the LotkaVolterra predatorprey model (equa
tion 2.9) to diseases given in [63] and [33]. Some of these variants have been
applied to chronic wasting disease because they exhibit the qualitative behavior
observed in many epidemics [27, 32], The differential equations used by [27],
[31], and [32], implicitly model disease dynamics based on individual interac
tions between deer and very small time steps. However, most available data
for these models are aggregated over relatively large time steps and large ar
eas. These authors use stochastic forward integration as the primary tool for
understanding and exploring the spatiotemporal dynamics. The link between
the mathematical model and the data is indirect.
The correlated Poisson model of [78], [17], and [54], discussed in section 2.2,
is effective in modeling the spatial spread of human diseases such as influenza,
but application to CWD is hindered by fundamental differences in data. The
models discussed in section 2.2 rely on the Poisson approximation to the Bino
mial distribution and therefore implicitly rely on large sample sizes and relatively
28
small infection rates [78, 17, 54]. It is extremely difficult and expensive to obtain
prevalence data on a wildlife population, so models for Chronic Wasting Disease
need be to appropriate for small samples.
A Bayesian hierarchical model is constructed for chronic wasting disease
that directly integrates mathematical models related to those of [27], [31] with
a model for observations taken on CWD prevalence. Critical to the model is
a difference equation approximation to traditional differential equation models
of prevalence to model dynamics in time. A mechanism is included that allows
for spatial mixing between regions of interest. Available data is incorporated
through a Bayesian hierarchical framework, thus linking the data and mathe
matical model. This hierarchical approach integrates data and epidemic theory
into a single model, is capable under the constraint of small data sets, and
provides a mechanism for prediction and the study of data collection designs.
3.1 Data
Data for CWD is economically and ecologically expensive to obtain since
the testing procedure, until recently, required sacrificing the animal [14, 32],
Prevalence data consists of the numbers of deer that are tested for CWD in
each DAU, and the number of those tested that are found to have the disease
[49] and [27]. The testing procedure used to determine whether or not an animal
is infected is considered to be extremely accurate, and is performed in highly
controlled conditions [14]. Consequently, the false positive and false negative
rates for the test were assumed to be small enough to be neglected. Currently,
the data is predominantly obtained from hunters, who submit harvested animals
for testing. These submissions are required by the Colorado Division of Wildlife
29
in the endemic areas. For animals taken outside the endemic area, the animals
may be submitted voluntarily but are subject to a $15 testing fee. Because the
type of deer a hunter may take is restricted by age, and in most areas gender,
harvest data may provide a biased view of the disease [15]. Determining these
biases requires more detailed data than is available here. Nevertheless, the
model presented here is valid for the population of huntable deer, although the
definition of huntable can change from region to region.
Wildlife biologists divide the state of Colorado into geographic regions called
data analysis units (DAUs). Different species of animals have different DAU
definitions, so that the spatial units are specific for different species of animals.
For example, deer and elk have different DAUs. Since the focus of the model
presented here is mule deer, DAU will refer only to the data analysis units for
deer. These DAUs can be seen in Figure 3.1. The data shows that there are 13
DAUs where chronic wasting disease is known to exist as of the 2002 hunting
season. These are DAUs 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 17, 27, and 44 in Figure 3.1.
Because the disease has not yet spread to much of the state, the majority of the
data is from an endemic region located in the northeastern region of the state
of Colorado, consisting of DAUs 4, 5, 10, 27, and 44 [14]. While there have been
a number of cases from units surrounding the endemic region, the Division of
Wildlife believes that the disease is still controllable in the nonendemic areas.
It is thought that the disease will be a permanent presence in the endemic region
[14, 32],
The prevalence data from Colorado was collected over 27 years, from 1976
to 2002. The data were obtained from the Colorado Division of Wildlife, and
30
Figure 3.1: Colorado Deer Data Analysis Units across the state. The larger
spatial units are the DAUs, the smaller units are game management units.
each DAUyear value consists of two numbers M and N, where N is the number
of deer that were tested for CWD and M is the number that were found to be
infected. Because the disease was only recently recognized as an epidemic, much
of the data is from the endemic region and is most complete from the years 1996
2002. Prior to 1996 the data was collected sporadically. The Colorado Division
of Wildlife has recently made the testing results for the 2002 and 2003 hunting
seasons available publicly [14].
3.2 Prevalence Model for Chronic Wasting Disease
31
The approach presented in this thesis utilizes a system of differential equa
tions which exhibit the appropriate qualitative temporal and spatial behaviour.
Solutions to this system are approximated using a set of difference equations on
a scale appropriate for the data. Through a hierarchical Bayesian construction,
the data are used to estimate the unknown parameters that govern these dif
ference equations. One fundamentally different aspect of this approach is the
explicit modeling of spatial dynamics in the differential equations, rather than
indirectly attempting to capture the effect using spatial correlations typical in
statistical models.
3.2.1 Model Motivation
The logistic model 2.2 discussed in section 2.1 is commonly used to model
biological populations [63]. The model is simple, but applicable to a surprising
number of biological applications. The logistic model forms the basis for a
Chronic Wasting Disease model.
For a single population, the logistic model 2.2 exhibits the desirable charac
teristic of having two equilibrium points, at iV = 0 and N = K. A fixed point
or equilibrium point for the differential equation ^ = /(pf) is a point p* such
that f ip*) = 0. These points represent a steady state, since if the system
begins at p* it remains there for all time. Fixed points can be either stable, so
that if the system is perturbed off of the fixed point it will eventually return to
the steady state, or unstable, so that if the system is perturbed off of the fixed
point it will not return. The carrying capacity, K, represents a maximum sus
tainable population. As the population size approaches this value, the growth
will slow. Analysis shows that the point JV = 0 is an unstable equilibrium point,
32
and N = K is a stable equilibrium point. Thus, if the initial value of the pop
ulation is greater than the carrying capacity, N > K, then the population will
decrease back towards this stable point. Likewise, if 0 < N < K, then N is
increasing. This model is easily generalized to a spatial model by aggregating
over discrete spatial units and including migration terms [63],
A differential equation for the prevalence of a disease based on the logistic
model can be formulated by rescaling by K [33, 63]. Rescaling by the carrying
capacity results in a differential equation with the same basic properties as
equation 2.2, but which lives on the interval [0,1]. Beginning with equation 2.2,
divide by the maximum population K. This gives a logistic equation modified
for prevalence, restricted to the unit interval. Let p denote the proportion of
deer in a given DAU that are infected. Then,
^ = ap( 1 p) (3.1)
is a model for the change in prevalence over time.
Let pkt denote the proportion of infected deer in DAU*, for time period t and
pt be the vector of prevalences aggregated over all DAUs for a given year t. For
now, the focus is on the observed qualitative behaviour within a single DAU,
and the spatial dynamics will be added later. From historical observations and
recent studies of CWD [49, 27], it is known that CWD takes approximately 3
years from infection to the death of the animal. For a disease with an incubation
period this long, it is a reasonable supposition that a sustainable level of the
disease is possible. Additionally, if the disease is in fact caused by environmental
sources or carried by another animal, this would indicate that a sustainable
level is likely, since an external carrier could potentially be unaffected by the
33
disease. A classic example of an external host is the flea that carries the plague,
aiding in the spread of the disease and enabling the disease to sustain itself.
Furthermore, CWD has yet to show signs of elimination even in those areas
where the disease has been present the longest, giving further support to the
sustainability assumption. Another reasonable assumption is that if the disease
is not present in a given area, i.e. if pkt 0, then the proportion of deer that
are infected will stay at zero, unless infected deer from another area migrate
in (although this implies that deer are the vector for the infection, and not an
environmental source, and it is not known for certain that this is the case).
These two assumptions seem, on the surface, to be supported by the current
understanding of the disease. They provide support for the choice of the logistic
model 3.1, which has these qualitative characteristics.
The parameter a in equation 3.1 represents a kind of acceleration for the
disease, and can be interpreted as an infection rate. Fixing a as a constant
restricts the dynamics of the model by fixing the infection rate. Instead, by
choosing a = a(p,t) and following the basic tenet of the correlated Poisson
model discussed in section 2.2, more freedom is obtained for the model. This
type of freedom can be extremely useful in describing different types of disease
dynamics. One such dynamic is the observed behaviour in many epidemics of
fast growth in the initial stages of the epidemic, followed by a decline in the
latter stages of the epidemic. The parameter a(p, t) plays a role similar to that
of the control parameter f3p(t) in the influenza model of [17, 54] discussed in
section 2.2.
34
The form of a = a(p,t) must be chosen to provide the appropriate be
haviour. It can be designed to account for a variety of effects, such as spatial
and seasonal effects. With appropriate data, explanatory covariates can also be
included in the expression for a(p,t). In this case, due to the small amount of
data, the desire for appropriate qualitative behaviour guides the form of a(p, t).
Making the assumption that there is a nonzero, longterm sustainable level for
the disease, set a* = a(5 pt). This acts as a varying acceleration. The ac
celeration slows down the closer the proportion gets to 5. When pt = 5, the
acceleration term is zero, and the internal rate of change of the DAU is also
zero. The parameter 5 is the long term proportion of infected deer that the
DAU is able to sustain, and plays a role similar to that of the carrying capacity
in the logistic equation 2.2.
For a single DAU, the differential equation becomes:
^ = ap{5p)(l p). (3.2)
With this modification, prevalence is increasing for levels below 5, and decreasing
for levels above 5. If 5 = 0, then the disease is killing off the existing infectives
faster than it is creating new infectives and there will be an eventual burn out
of the disease. For 5 = 1, then the disease will eventually infect all the deer.
When 0 < 5 < 1, the model has a longrun stable proportion of infectives at
level 5.
Figure 3.2 shows the graph of the derivative, p\ versus p for equation 3.2,
with 5 = 0.25. The equilibrium points, p* are on the p axis, where p' 0. To
graphically determine the nature of the equilibrium point, arrows are drawn to
the right for values p' > 0, and to the left for values p' < 0. When the arrows flow
35
Figure 3.2: Graph of the derivative of p versus p. Stable points (0,1/4) are on
the p axis.
away from each other the point, p*, is unstable, when the arrows flow towards
each other p* is stable. The graph shows that the equilibrium point p* = 0 is
unstable, while the other equilibrium point, p* = 5, is stable, supporting the
interpretation of the parameter 5 as a long term sustainable level. The idea can
be extended to several dimensions as well.
Having developed an appropriate model for the dynamics of prevalence in a
single DAU, the model is extended to multiple DAUs. This is done most directly
36
by replicating over the DAUs to produce the system of differential equations:
=a(<5lpt)OPtO(lPt) (33)
where p( is the vector of prevalences, so that pt = (Pi,t,P2,t, ,P54,t). The symbol
O represents the Hadamard (element by element) matrix product. Importantly,
this system of differential equations allows for different prevalence values in each
DAU.
3.2.2 Incorporating Spatial Mixing
Although equation 3.3 is a relatively easy extension that has the appropriate
general temporal dynamics, it treats each DAU separately and does not take into
consideration any spatial dynamics. In equation 3.3, neighboring DAUs do not
affect each other, which is an unlikely scenario. It is known that the disease is
spreading across the state, so a spatial component for the model is needed.
The spatial dynamics are incorporated into the model by adding a spatial
mixing term to equation 3.3, yielding:
^ = c*(<$l pt) pt (1 pt) + Qpt> (3.4)
where Q is a matrix such that the values <7^ represent the instantaneous net
effect that the prevalence in DAU j has on the prevalence of DAU i. Thus the
rate of change in the prevalence over DAUs consists of an linear combination of
the internal dynamic processes governed by logistic equations, and the external
contributions by the other DAUs.
The multiDAU spatial model is extremely difficult to solve analytically, so
a firstorder Euler discretization is used to approximate the differential equation
37
by a difference equation. This discretization is a simple method of approximating
a derivative, and has the form f'(t) ~ _
Since the data is aggregated by year, the time step is assumed to be At = 1.
The difference equation for (3.4) is therefore:
Pt+i = a(5l pt) O Pt O (1 Pt) + Wpt (3.5)
where IT is a spatial mixing matrix that indirectly represents deer migration,
and is a discrete version of the matrix Q in equation 3.4.
Equation 3.5 is the dynamic spatial model for Chronic Wasting Disease,
and the prevalence data is used to find the parameters for this equation. The
migration of deer changes prevalence in that a single infected deer migrating from
one DAU to another changes the makeup of prevalence for both the from and
to DAUs. However, the matrix W does not directly describe this movement of
deer, but instead describes the movement of the disease prevalence. The change
in prevalence for a given DAU consists of a mixture of the internal dynamic
processes modeled by the logistic equations, and the external contributions from
migration to and from neighboring DAUs. Including the migration matrix W
neither implies nor removes the possibility that deer are the transmission method
for the disease in the model.
The fixed points for the difference equation will be the same as those for the
corresponding differential equation, provided that W1 = 1, where 1 is a unitary
vector [75, 83], and therefore the parameters in the difference equation 3.5 have
the same interpretation as the parameters in the differential equation 3.4.
Various forms of the spatial mixing matrix W could be designed to incorpo
rate elaborate schemes for disease transport. However, because the mechanism
38
by which the disease moves is currently unknown, two simple cases were con
sidered. Assuming that the disease can only migrate between adjoining DAUs,
so that if DAUj and DAUj are neighbors, and DAUj and DAU*, are neighbors,
but DAUj and DAU* are not, then in a single time step (one year in this case)
migration occurs between i and j, and between j and k, but not between i and
k. This is a reasonable assumption supported by biological studies of the behav
ior of deer in the wild and the spread of CWD in Colorado, and the manner in
which the state of Colorado has constructed the DAUs [14, 15, 49, 27, 31, 32],
Let Wij denote an element of the stochastic transition matrix W. For par
simony, fix = 7 if DAU, and DAUj are firstorder neighbors with a sig
nificant proportion of boundary that touch. If DAUj and DAUj have less of
boundary that touch, or are almost touching, then W,j = u < 7. For DAUs
that are separated and hence are not neighbors, = 0. The diagonal values
of W are fixed so that row sums of W are equal to 1. Thus, the can be
loosely interpreted as the average proportion of deer that migrate from DAUj
to DAUj each year. The classification of firstorder neighbors and secondorder
neighbors was admittedly subjective. Tables 3.1 and 3.2 show the relationships
between the DAUs that were used.
In order to ensure that equation 3.5 produces values for pt that are inter
pretable, certain restrictions are made on the parameters a, 5, 7, and uj. For
simplicity, let F(p, 77) = a(5lp)Op(lp)tIUp, where 77 = (a, 6,7, uj). The
parameters 7 and w are the nonzero elements of the matrix W. The relationship
in equation 3.5 is therefore written as pe+i = F'(pt, 77).
39
DAU Near Secondary DAU Near Secondary
1 2,6 7 28 33,45,46,47 48
2 1,3,4 8,9 29 24,52 
3 2,4,9 8,10 30 35,36,52 25
4 2,3,5,10 9,44 31 32,35,36 37
5 4,44,54  32 31,34,45 33
6 1,7,11  33 28,45 32
7 6,8,11,41,42,43 1 34 16,32,37,45 15,26,48
8 7,9,14,43 2,3,15,16,53 35 30,31,36 37
9 3,8,10,17,27 2,4,16,38 36 25,26,30,31,35,37 
10 4,9,27,44 3 37 26,34,36 15,31,35
11 6,7,18,41 12 38 16,17 9,50
12 13,41,51 11,18,19,42,43 39 20,21,40 25,51
13 12,15,22,51 42,43 40 19,25,39 24,51
Table 3.1: Neighborhood relationships between DAUs. Digits refer to DAU
number in Figure 3.1.
40
DAU Near Secondary DAU Near Secondary
14 8,43,53  41 7,11,12 42
15 13,16,22 8,26,34,37,53 42 7,43 12,13,41
16 15,34,38,48,50 8,9,45 43 7,8,14,42 12,13,53
17 9,38,44,49,50 48 44 5,10,17,54 4,27
18 11,19 12,23 45 28,32,33,34,48 16,46
19 18,23,24,40 12,51 46 28,47,48,49,54 45
20 21,39,51 22 47 28,46,54 
21 20,22,25,39  48 16,45,46,49,50 17,28,34
22 13,15,21,25,26 20 49 17,27,46,48,54 50
23 19 18,24 50 16,17,48 38,49
24 19,29,52 23,40 51 12,13,20 19,39,40
25 21,22,26,36,40 30,39,52 52 24,29,30 25
26 22,25,36,37 15,34 53 14 8,15,43
27 9,10,49 44 54 5,44,46,47,49 
Table 3.2: Neighborhood relationships between DAUs. Digits refer to DAU
number in Figure 3.1.
41
The parameters a, 5, 7, and u; are restricted so that 0 < a < min{4/(l
5)2,4/52}, 0 < 6 < 1 and 0 < 00 < 7 < l/max^^ Wij). With these
restrictions and W defined as above, there is the following theorem.
Theorem 3.2.1 Let a, 5, 7, and u be restricted as above. Then the update
function F has the following properties:
F(p,77)[0,l]* Vp[0,l]* (3.6)
F{0,r}) = 0 (3.7)
F(6l,ri) = 6l (3.8)
The first property (3.6) guarantees that the dynamic system will always return
prevalences between zero and one. The second property (3.7) implies that if
there is no disease in the system, the disease cannot propagate. The third (3.8)
implies that the system has a steady state where the disease maintains a stable
relationship with the deer.
PROOF: To show that these properties hold, first note that the form of W is
such that Wl = 1, WT1 1 and > 0. Thus, W51 = (51, and Wp 6 [0,1]^
whenever p [0,1]*". It remains to be shown that g(x) = x + ax(l x)(5 x) &
[0,1] whenever x [0,1]. Note that g(x) = x{\ + a.5 a(l + 5)x + ax2) = xq(x)
where q denotes the quadratic form. Now g has a root at x = 0 and may have
up to two more roots at the roots of q. If the roots of q are complex, then
g(x) 0 if and only if x = 0. It is easy to show that the roots of q are complex
for 0 < a < 4/(1 <5)2. Thus g(x) > 0 on [0,1] when 0 < a < 4/(1 <5)2. Let
h(x) 1 g{x) and note that g(x) < 1 whenever h(x) > 0. Factoring out (1 2;)
in h, argue as before to show that g{x) < 1 for x [0,1] whenever a < 4/62.
42
In what follows, let a (0,1) be the proportional acceleration and the ac
celeration to be 4amin(l/(52,1/(1 6)2). Likewise, rescale u and 7 to represent
the proportion of complete interchange, so that u and 7 are the probabilities
used in the spatial transition matrix W.
3.2.3 The Hierarchical Statistical Model
Because there are different sampling schemes for different stages of the
knowledge of disease prevalence, a simple modification is required. The change
amounts to modeling (location,time) observations from two possible distribu
tions. One distribution corresponds to timeperiods where a DAU has a large
number of deer tested for the disease. In this case, the effort behind the test
ing is primarily geared towards determining and understanding prevalence. The
Division of Wildlife knows that the disease is present, wishes to collect as much
data as possible, and has the additional goal of protecting hunters from eating
infected animals. On the other hand, when the number of tested carcasses is
low, that corresponds with a time when the CDOW is opportunistically search
ing for the disease in an area where it has not yet been found. This data comes in
the form of animals found already dead, and hunters in noninfected areas that
wish to pay for testing. The differences in searching for the disease add biases
which are explicitly included in the model to facilitate estimation. Furthermore,
model selection techniques are used to find a cutoff value which separates the
two processes.
43
The likelihood for the observation at DAUj at year t, denoted by Yit, is
modeled as one of two binomial distributions
Yit ~ B(Niupit) for Nit > C
~ B(Nit, 7r) for Nn < C. (3.9)
depending on some cutoff value C and where Nit is the number of tests completed
for DAUj at year t and pu is the ith component of pt This duality is required
because of the different sampling schemes for different stages of the knowledge
of disease prevalence. The first distribution corresponds to timeperiods where
a DAU has a large number of deer tested for the disease. In these cases, the
Division of Wildlife knows that the disease is present and wishes to protect
hunters from eating infected animals. On the other hand, DAUyears when the
number of tested carcasses is low correspond with the CDOW searching for the
disease in an area where it has not yet been found or confirmed problematic.
This approach is somewhat ad hoc, and only moderately reflect the true nature
of the sampling mechanism. Nonetheless, the modification provides sufficient
flexibility to allow the model to accurately describe the changing prevalences.
This modification essentially eliminates the data values below the cutoff from
the spatiotemporal model, as the posterior of n given all the data is independent
of the other parameters. Because of this independence, data that are used to
improve knowledge regarding 7T cannot be used to increase information about
the other parameters.
The binomial sampling scheme used here is a simplification that considers
the values Niit as fixed. Common practice is to consider these values as random,
44
perhaps modeling them as Poisson or binomial draws from the total population.
There are several reasons why a second layer of hierarchy was not added. First,
although populations estimates were available for most DAUs over a ten year
period, these estimates were considered unreliable. Population estimates are
prepared by game region, and the manner in which they are calculated varies by
region. Game regions can consist of several DAUs, and information about how
the game regions are structured was unavailable. Moreover, the method in which
the populations are estimated has changed over time, as new estimation tech
niques are adopted, both because of scientific advancement and changes in game
region management. Information on what population estimation methods were
used was unavailable. Finally, the testing data was obtained predominantly from
harvested animals. These animals form a subset of the population, because of
strict regulations that govern which animals can be harvested and which cannot.
In most areas of Colorado, huntable deer consists only of male deer two years or
older [14]. There was no information available on what percentage or number
of deer are harvested each year, although the Colorado Division of Wildlife does
collect estimates on this information, grouped by game management units [14].
The simple sampling scheme that was used means that even though all of the
results presented in this thesis may not be applicable to the total population of
deer, they can still be reliably applied to the population of harvested deer.
Given an initial prevalence vector p0 and the parameters in 77, the values of
pt are fixed. It is important to note the difference between this approach and a
standard state space model [23]. In a state space model, the value Xt at time
t is equal to a function F(Xti,rf) plus error. That is, given the parameters 77
45
and the previous value Xt_i, the new value is random. In the approach followed
in this thesis, the value of Xt given the parameter values 77 and the initial value
A0 are deterministic. This approach also differs from the traditional state space
model by using binomial distributions for the observational equation, rather than
normal distributions. By using deterministic rather than stochastic updates, the
size of the parameter space is effectively reduced from 54 x 27 + 4 to 54 + 4. In
fact, the parameter space is smaller because many of the initial conditions (the
54 elements of the vector p0) axe forced to zero.
The likelihood for the prevalence data is given by,
54 27
Â£(Mp0, 7i, C) = J]n[pf(lPt)(7VC}
i=l t=l
x [7rMi(l 7r)^M>]1<0<^c) (3.10)
where 1^ is 1 if condition A is true and zero otherwise, pu is the ith component
of pt, defined by recursion in equation 3.5, Nu is the number of deer from DAU*
that are tested in year t, Mu is the number of those that are infected, 7T is
the probability that a tested deer has CWD given that a small number of deer
are tested in that DAUyeax, C is a cutoff which models the different sampling
schemes relative to ir and 77. The value C = 3 provided a reasonable fit to the
data, and will be discussed later. The posterior distribution of parameters is
proportional to
Po, 77, 7T ) oc Â£(Y  p0, 77, 7T, C ) <7(p0) 9(77) q(n) (3.11)
where q() denotes a prior distribution. For simplicity below, let 6 = (T7,po,7r)
denote the complete vector of unknown parameters.
46
This model has qualitative characteristics, through the deterministic differ
ential equations, that are consistent with current knowledge and expectations of
CWD. It is also flexible enough to accommodate spatial mixing of prevalences
due to deer migration and possibly other factors. The model is formalized with
an Bayesian hierarchy, giving the added benefit of allowing for comparison of
different models, testing whether or not the disease is spreading spatially (if
7 > ui > 0) and whether or not there disease can coexist with the mule deer
population (if 0 < 5 < 1). The Bayesian hierarchy also provides a method of
checking sampling schemes that will maximize information from data collected
in years to come [37].
Prior distributions for a, 8 and it were all independent Beta distributions,
since each parameter in the model is bounded on [0,1]. The prior distributions
for the different parameters were chosen to reflect the current understanding of
CWD researchers and the statewide, large scale observations of the disease [14].
Information for the prior distributions comes from current research and expert
knowledge that is indirectly based on this and other data. Consequently, the
modeling strategy is not a formal Bayesian model, but instead has an empirical
Bayesian flavor.
The overall level of the disease in the endemic area is currently believed to
be between 813% [14]. While the disease still appears to be spreading spatially,
in those areas that have been infected the longest the level of the disease seems
to have stabilized. Consequently, the prior for 5 was chosen to be a Beta(8,82),
which has a mean of 0.09, or 9%., and a standard deviation of 0.0298.
47
Currently, there is little knowledge or expectations for the acceleration pa
rameter a. Detailed study of the dynamics of Chronic Wasting Disease is rel
atively recent but is ongoing [31, 32, 27]. Consequently, a uniform density on
[0,1] was chosen as the prior distribution to reflect this lack of information.
Because the DAUs are defined so that deer will tend to remain within their
DAU, the migration parameters 7 and uj are expected, a priori, to be small.
However, there is a lack of information available on the migration rate of the
disease [14], Consequently, the prior distribution for 7 and ui are chosen to be
Beta( 1, 50) and Beta( 1,100) respectively, subject to the constraints that 7 > w
and that max,^^ W^} < 1.
According to the Colorado Division of Wildlife, the disease was rare before
1990 [14], CWD was unknown to nonexperts in 1975 (t = 0), and no cases were
observed in wild populations prior to 1980. Consequently, the initial prevalence
values were expected to be extremely small. Additionally, because the disease
prevalence has only been observed increasing, those DAUs that are not currently
infected were not allowed to have the disease present at year zero. The prior
distributions for po were therefore either Beta(l, 200) or a point mass at zero.
3.3 Finding a Good Model
The first task in choosing a particular model is to determine a reasonable
value for the cutoff C in the likelihood (3.10). The joint posterior distribution
can be factored so that the posterior distribution of tt is independent of the
posterior for 77 and p0.
For a given N, the binomial distribution which accounts for more data
variability than any other binomial model is the distribution with p = 0.5.
48
Figure 3.3 shows the proportion of deer that test positive by year and the total
number tested for the three DAUs which have been infected the longest. There is
a large amount of variability in the early data, where the number of deer tested is
small. As the number of deer tested increases, the variability in the data appears
to decrease. Thus, a search is conducted for the value of C that yields a cutoff
level that has, as close as possible, half of the observations diseased and half
not. By inspecting the data it is found that this corresponds to a cutoff value of
C = 3. Thus in DAUyears where 4 or more deer are tested for Chronic Wasting
Disease, the modeled prevalence will be from the updates given by equation 3.5.
Otherwise, the prevalence is modeled as Bin(Nit,ir).
Endemic Area
Figure 3.3: Proportion of deer which test positive by year for DAUs 4, 5, and
10. The total number tested is shown in the background, also by year. The
proportion which test positive decreases as the number tested increases, as does
the variability.
49
Given the parameters a, 6, 7, ui and 7r, several model variations can be
formulated by determining how many, and which, of the elements of p0 are
nonzero. Because the data begins in 1976, the year 1975 is set as year 0.
The numbering scheme for the DAUs has nothing to do with Chronic Wasting
Disease, so it is necessary to impose an order on them that is sensible for CWD.
This is done by optimizing the posterior distribution over different arrangements
of DAUs to determine which DAU is likely to have the highest infection rate of
the disease down to which is least. There are a total of 54 DAUs, so to reduce
the search space the focus is only on those DAUs in the endemic area where
the disease is currently found. This assumption is reasonable since, at least
presently, the disease is still spreading, making it unlikely that it would have
disappeared from an area. The optimal ordering for the DAUs in the endemic
area is 10, 4, 5, 3, 17, 9, 54, 27, and 44.
The best model given this ordering can be determined using Bayes Factors
[24], Several sets of models are tested. The first set assumes that the disease
was present in only one DAU in 1975, and that all other DAUs had an initial
value of 0. This is accomplished by putting the prior distribution for p^o as a
point mass at zero for all DAUs except for the one of interest. The second set
of models assumes that two DAUs are infected at time 0 and all others are zero.
The pattern continues adding one DAU at a time, up to nine, the number that
are currently in the endemic region. Additionally, models were compared with
oj > 0 and ui = 0 to determine whether or not the second order migration term
was significant.
50
The different models are compared using Bayes factors. Bayes factors are
ratios of the marginal likelihood of the data under one model to the marginal
likelihood under another model. Hence:
nn/i, , x J C{Y\r],p0,K,Mi)q(po,r],TT\Mi)dpodridn
3 /Â£(Y77,p0,7r, A4J)<7(p0,T7,7r.Mj)dpodT7d^
where Mj implies that the integration is with respect to the prior and likelihood
under model i [24]. Both the numerators and denominators of the Bayes factors
are approximated via a random sample of the prior distributions implied by the
two models Mi and Mj, respectively. For k 1,... K, draw the appropriate
collection of parameters, denoted by Oik, from the prior distribution imposed
by Mi. Let lik denote the evaluation of the loglikelihood implied by (3.10)
evaluated at 0ik and Wik = exp(Â£ik max*^*}) and to*. = Yhkwik/K. Then
BF(fi/ti,Mlj') 6Xp{^jmax tj max}'UJi /'U)j..
The Bayes factor gives a weighting of the relative skill of one model compared
with another, competing, model. Under this criteria, it is found that the model
with P4,o,P5,0)Pio,o > 0, with all other DAUs having zero starting values, and
u 0 is as or more effective than other models with the same number or more
parameters. The estimated Bayes factor was 1.18 when comparing this model
with a similar model that also included nonzero initial values for DAUs 3, 17,
and 9. The next smallest Bayes factor (3.55) was for the comparison with the
model that forced p5jo = 0. The Bayes factors for comparison of the model with
P4,o, P5,0j Pio,o > 0 and ui = 0 and all remaining models were all greater than
100.
The results of the Bayes Factor comparisons are supported by the fact that
the first observed case of Chronic Wasting Disease occurred near Fort Collins,
51
Colorado, which is surrounded by DAUs 4, 5, and 10 [22], These DAUs are
currently the heart of the endemic area. Given this prior knowledge of the
disease, it is unlikely that it would have begun anywhere else.
The particular model that is chosen is that with dynamic parameters
t? = (a, (5,7),
and the initial conditions
Po = (Pio,o > P4,0 j P5,o) >
so that 0 = (77, Po) Thus, it is necessary to find the posterior distribution of
[0Y], where Y denotes the data. Because this distribution does not have a
convenient or known form, computational methods are needed to simulate the
posterior distribution.
3.4 Accept/Reject Methods for the CWD Model
The posterior distribution for this model depends on many nonlinear rela
tionships, which makes direct analysis difficult. Consequently, understanding
the posterior is most easily achieved using a simulated random sample. The
posterior distribution of the CWD model was first sampled in [37] using an
acceptance sampler. The results of this sample illustrate the characteristics
of the posterior distribution, and will help to determine if a more adaptable
Markov chain simulation is sampling from the correct density. Developing a
usable Markov chain simulation might allow for a more complete exploration of
the model by future researchers.
The Accept/Reject or acceptance method of sampling results in a random
sample from the desired distribution. Let f(x) be the target density. Let g(x)
52
be a distribution from which it is easy to sample, called the envelope density.
Further, suppose there exists a constant M such that f(x) < Mg(x), everywhere
on the support of the density /.
Algorithm 3.1 The basic Accept/Reject (A/R) algorithm is:
1. Generate a value X from the density g, and a value U from the uniform
density on [0,1].
2. IfU< m^'x) > then accept the value X.
An accepted X is distributed according to the density f.
The acceptance sampler can work extremely well, but its performance de
pends on finding an envelope density g that closely matches the target density,
as well as knowledge of the constant M. Further, for properly normalized den
sities, the average probability of accepting the value X in the algorithm 3.1 is
1/M, so that the expected number of iterations until a value is accepted is M
[64]. Thus to make the acceptance sampler efficient, the density g must closely
match the target density so that a sizable random sample can be generated in
a reasonable amount of time [64, 25].
Acceptance samplers have the benefit of generating an independent sample
exactly from the target distribution, and they can be applied to multivariate
densities. However, the problem of finding a suitable envelope density if often
prohibitive, especially in higher dimensions, although there are methods such as
Adaptive Rejection Sampling that can reduce this problem for certain kinds of
target densities. In addition, accept/reject samplers are difficult to modify, so
that slight changes in a model can require the construction of a new sampler.
53
There are a total of seven parameters in the CWD model. However, the sev
enth parameter, 7r, is independent of all of the others. Its posterior distribution
has a known form, 5efa(18.5,19.5), so it does not need to be included as part of
the simulation, since it can be simulated directly. Hence only the remaining six
parameters, pio.o, P4,0i Ps,o> <*, and 7, must be sampled. Each parameter has a
beta distribution as its prior, and hence the probability density function is non
zero only on the interval (0,1). Because the posterior distribution is known to
be confined to the six dimensional unit cube, a beta distribution seems a natural
starting point for the envelope distribution. Since there are six parameters, six
independent beta distributions are used with their parameters chosen so that the
modes of the beta distribution match the mode of the posterior distributions.
The mode is calculated by numerically maximizing the log of the posterior func
tion, using the NelderMead nonlinear optimization method [59, 55] built in to
Matlab. The modes for the marginal posterior distributions in this model are
shown in table 3.3. The value of the log posterior distribution at the mode was
2731.9. The specific envelope distributions for the CWD acceptance sampler
were Beta{2.7,106) for the parameter pio.O) Beta(2.1,164.2) for the parameter
p4>0, Beta(0.7,310.7) for the parameter p5i0, Beta(2.5,15.6) for the parameter
a, Beta(9.4,54.4) for the parameter d, and Beta(27.4,3375.3) for the parameter
7. The results of the acceptance sampler are discussed in [37].
The acceptance sampler was run using Matlab, version 6, on a machine with
a 2.2Ghz Pentium 4 processor, with 2GB RAM. The sampler generated a ran
dom value from the posterior roughly every 3 seconds. To generate a sample of
size 25,000 took nearly 24 hours. The slowness of the acceptance sampler makes
54
it impractical for studying variations of the model, or for applying to future
data sets. The long run time was likely due to the complicated nature of the
parameter space and the independence assumed in the envelope distributions.
Nevertheless, the random sample that was generated by the acceptance sampler
allows for examination of the characteristics of the joint posterior, and provides
a guide in designing a more efficient Markov chain sampler.
The 95% Bayesian credible intervals and the posterior estimates for the
six parameters based on this sample are shown in table 3.3. These summary
statistics will be used as an additional check for the Markov chain sampler.
Parameter L. Limit U.Limit Mode Mean Median Std.Dev.
Pio,o 1.67 5.63 2.51 3.35 3.24 1.01
P4.10 0.82 3.17 1.27 1.77 1.69 0.61
P5,0 0.07 0.69 0.22 0.31 0.28 0.16
a 6.67 22.13 14.08 13.01 12.51 3.98
S 11.18 20.39 14.79 15.08 14.84 2.38
7 0.62 1.09 0.81 0.84 0.83 0.12
Table 3.3: 95% Bayesian credible intervals, posterior estimates, and modes
for the marginal posterior densities simulated using the acceptance sampler. All
values have been multiplied by 100.
Scaled histograms of the variables Pio,OiP4,OjP5,o> 5, and j from the pos
terior sample are given in Figure 3.4. For comparison, the prior distribution is
shown with a solid line. The prior and histogram approximation of the posterior
have been scaled to highlight the differences between the two distributions. This
55
Relative Frequency Relative Frequency
gives some indication of both the ignorance in the prior distributions and the
strength of the data in the model. Noting that the prior and posterior distribu
tions are different for each variable, it is evident that the model is making use
of the information in the available data.
Figure 3.4: Comparisons of the posterior sample with prior distributions for pa
rameters (a) pio.Or (b) a the acceleration parameter, (c) 5 the long run prevalence and
(d) 7, the average propensity to migrate. The prior and histogram approximation of
the posterior have been scaled to have maximum value of one in order to highlight
the difference between the two distributions.
Examination of the posterior distribution indicates that the parameters are
highly correlated. This correlation is demonstrated graphically in the pairwise
56
contour plots of the variables. Figure 3.5 shows the contour plot of the param
eters a and (5. The points occur along a ridge of high probability. The pattern
indicates a high negative interdependency between these two parameters. The
ridge has a banana shape that is the most pronounced feature of the pos
terior distribution, and goes far in explaining the slowness of the acceptance
sampler based on independent Beta distributions. Any successful Markov chain
simulation will need to reproduce this ridge.
The pairwise contour plots of the remaining variables are shown in figures 3.6
and 3.7. Each pair plot shows a pattern that indicates a relationship between
the two variables. The initial prevalence values Pw,o, P4,o, and p5)0 show evidence
of positive correlation. The acceleration parameter a shows evidence of negative
correlation with all of the other variables, while 5 and 7 are positively correlated.
Calculating the correlation matrix of the A/R sample confirms the graphical
evidence of relationships amongst the six parameters. The correlation matrix is
shown in 3.13. The initial prevalence in DAU 10 is strongly correlated with the
other initial prevalences themselves, with Corr(pio,pi) = 0.7214 being the high
est correlation within this group of parameters. The other group of variables
the dynamic parameters also exhibits strong interdependence, although it is
clear from the graphical evidence in figure 3.5 that this relationship is nonlin
57
0.05 0.1 0.15 0.2 0.25 0.3
a
Figure 3.5: Contour plot of a by 5. Most of the values occur along a banana
shaped ridge of high probability.
58
x 10~3
14
0.05 0.1 0.15 0.2 0.25 0.3
a
c
8
T
0.02 0.04 0.06 0.08
P10,0
Figure 3.6: Contour plots of variable pairs for the acceptance sampler. Plot
(a) shows the graph of the acceleration a versus the movement parameter 7.
Plot (b) shows the graph of the long term sustainable level S versus 7. Plot
(c) shows the graph of a versus Pio.o, the initial prevalence in DAU ten. Plot
(d) shows the graph of p4)0 versus the acceleration a. These graphs indicate
correlation among the parameters.
59
b
0.02 0.04 0.06 0.06 0.01 0.02 0.03 0.04 0.05
P10,0 P4,0
Figure 3.7: Contour pair plots of the variables for the acceptance sampler.
Plot (a) shows the graph of the acceleration a versus the initial prevalence in
DAU five, psfl. Plot (b) shows the graph of the initial prevalences pio,o and p^p.
Plot (c) shows the graph of p10,o and p5^. Plot (d) shows the graph of p4i0 versus
the p5)0
60
ear.
( 1.0000 0.7214 0.5135 0.4183 0.0461 0.0812 ^
0.7214 1.0000 0.4060 0.4892 0.1321 0.2553
0.5135 0.4060 1.0000 0.4440 0.2194 0.0426
(3.13)
0.41830.4892 0.4440 1.0000 0.8582 0.4340
0.0461 0.1321 0.2194 0.8582 1.0000 0.4406
^ 0.0812 0.2553 0.04260.4340 0.4406 1.0000 )
There appears to be two natural groups of parameters, the initial prevalence
values, pio.o, P4,o, and p5io, and the dynamic parameters a, <5, and 7. There is
strong interdependence within each of these groups, and some evidence, based
on the pairwise graphs in figures 3.6 and 3.7, of a relationship between the
two groups. However, because the model is highly nonlinear, and because the
graphical evidence indicates that the relationships between the parameters are
nonlinear, a more general measure is needed. A canonical correlation analysis
shows the canonical correlation coefficient for the two groups to be 0.8134. This
is a strong indication that the two groups of variables are interdependent. This
interdependence will be an important consideration in the construction of a
Markov chain sampler for the CWD model.
61
4. Markov Chain Monte Carlo Simulation for the CWD Model
For some distributions, qualities such as independence or conditional in
dependence make direct sampling analytically tractable and computationally
feasible. However, more complex models, such as the CWD model in chapter 3,
can lead to situations where the target distribution does not have a recognizable
form, or has characteristics that make direct sampling difficult or impossible to
apply. When direct sampling is not possible, alternative methods such as the
acceptance sampler (see section 3.4) or Markov chain Monte Carlo (MCMC)
approaches must be applied. The complex form of the posterior distribution in
the Chronic Wasting Disease model (equation 3.11) requires the use of computa
tional methods to sample from the posterior distribution. This is not an uncom
mon situation for Bayesian hierarchical models, but in some cases the hierarchy
does allow for some computational simplifications. Analytical construction and
analysis of the posterior distribution is only possible in very simple hierarchical
models, such as the normal model discussed in [24]. The relative advantages
and disadvantages of some common simulation methods are discussed, and an
efficient Markov chain sampler is constructed.
4.1 Markov Chain Methods
A Markov Chain is a sequence of random variables {A^} such that:
[A(n+1)A(1),...,A(n)] = [A(n+1)A(n)] . (4.1)
For the Markov chain {A^}, if there exists a distribution / such that A^*) ~ /
implies A(n+1) /, then the distribution / is called the stationary distribution
62
for the Markov chain [64, 68]. The stationary distribution is a limiting distri
bution, where convergence is with respect to the total variation norm,  rv
The total variation norm between two distributions /i and fi is defined by:
ll/i Mlrv = sup /i(x) f2{x). (4.2)
X
A sequence of distributions / converges to a limiting distribution / if
lim /n f\\TV = 0
n> oo
A Markov chain is called ergodic if the convergence to a stationary distribution
does not depend on the initial state of the chain X(0) [64], The possible values
of the Markov chain are called the states of the chain. A Markov chain is called
reversible if it has the property
[X(n+1)X(") = x] = [X(n+1)X(n+2> = x] . (4.3)
A Markov chain Monte Carlo (MCMC) method for simulating from a distri
bution / is any method that produces an ergodic Markov chain, {X^}, with /
as the stationary distribution. MCMC methods do not produce an independent
sample exactly from the target density /. Instead, they produce a dependent
sample that is approximately distributed according to /. This may seem inferior
to accept/reject methods, which sample exactly from the target distribution, but
Markov chain methods possess a number of benefits that make their use advan
tageous. First, it is usually easy to construct a Markov chain that converges,
at least theoretically, to the target distribution. Markov chain methods are
also easier to generalize and alter than their A/R counterparts, and therefore
facilitate the exploration of more complicated models.
63
The MetropolisHustings algorithm is a very general method for simulating
a Markov chain [64, 25]. Using virtually any distribution, a Markov chain can
be constructed with the target density / as its stationary distribution.
Algorithm 4.1 Suppose that q(y\x) is a conditional density from which it is
easy to simulate. Given a current value x, generate the next value in the
chain as follows.
1. Generate a value Y from the density q(y\x^).
3. Take = Y with probability min(cn, 1). Otherwise, x= x.
Under broad regularity conditions, the algorithm 4.1 produces a Markov
chain with stationary distribution f. The convergence of algorithm 4.1 depends
on the relationship between the target density / and the conditional density
q{x\y). This relationship is given by theorem 4.1.1 taken from [64].
Theorem 4.1.1 Let S denote the support of the density f, and suppose that
S is connected. If the support of the conditional density q contains S, then the
Markov chain produced by the algorithm 41 has f as its stationary distribution.
There are many variations of the Metropolis Hastings algorithm. One of the
most common is random walk MetropolisHastings, which is frequently called
the standard random walk algorithm [25, 64, 58], and is the terminology used in
this thesis. In the standard random walk algorithm, given the current state of
the chain, X^\ the proposed value is generated by Y = + et, where et has
distribution g. The conditional distribution q in algorithm 4.1 is equal to g(yx),
64
so that q now has the symmetric property q{x\y) = q(y\x). In many applications,
g is a normal distribution. If the proposal density g and the target density
in the standard random walk satisfies the conditions of theorem 4.1.1, then
the standard random walk will converge in the limit to the correct stationary
distribution.
The conditional density q in algorithm 4.1 is called the instrumental or
proposal density. Finding a proposal density is typically very easy compared
to finding a suitable envelope density for the acceptance sampler. Thus the
MetropolisHastings algorithm is in theory easy to apply. The resulting Markov
chain will converge in the limit to the desired stationary distribution. Moreover,
if the conditional density satisfies the conditions of theorem 4.1.1, then the
Markov chain produced by algorithm 4.1 is ergodic [64], After a suitably large
number of iterations, T0, called the bum in period, the chain provides a good
approximation to the target distribution.
For multivariate densities, the components can be sampled individually,
and the target distribution constructed by means of the complete conditional
distributions, that is, the distribution of the component X* given all of the
other components. Suppose that the random variable X has the desired mul
tivariate density /, and that X can be written as X = (X^..X,,). Let
[Xi\xi, ...2j_i,x,+i, ...,xn] denote the complete conditional distribution of Xj
given all of the other components of X.
Algorithm 4.2 Given X^ = (x^\... ,Xn^), a Markov chain with stationary
distribution f is produced by the following algorithm:
65
1.
2.
3.
<+1> ~ y lr^ Ai X2 .. T() 5 Xn
y I (*+!) yi2\X\ Tw x3 )
Wl
Repeat for each conditional until...
I
Generate Xn+1^ ~
(t+i)
i )
T(*+i)'
)xnl
The algorithm 4.2 is called the Gibbs sampler, and is a classic example
of a strategy called successive substitution sampling (SSS), which enables high
dimensional problems to be broken into a series of univariate or simple block
sampling problems. The Gibbs sampler is a special case of the algorithm 4.1
in which the proposed value is always accepted. The Gibbs sampler is typically
used when the complete conditional densities have a known form and are easy
to sample. In general, the blocks Xi may be either univariate or multivariate.
For problems in which the complete conditional distributions are unknown, or
do not have a simple form, the conditional densities in 4.2 may be sampled using
a Metropolis Hastings step.
In practice, the burn in period for a Markov chain sampler can be pro
hibitively large. While building a Markov chain that theoretically converges is
relatively easy, building one that works in a reasonable amount of time may be
difficult. This difficulty is more common in higher dimensional problems where
the straight forward application of algorithm 4.2 by means of the complete con
ditional distributions is impossible. Complex models may lead to situations
where the target distribution has characteristics that make the chains produced
by standard MCMC methods mix slowly, that is, the Markov Chain does not
move rapidly about the support of the target distribution, which causes the
66
chain to converge slowly. This leads to a variety of techniques to accelerate con
vergence of the chain. The application of these techniques often involves careful
tuning by trial and error.
Complex multivariate distributions can have a number of features that make
a density difficult to simulate. If the target distribution is multimodal, has
sharp ridges of high probability, exhibits high correlation between individual
variables, or exists on a bounded parameter space, then constructing a Markov
chain that converges to the stationary distribution of interest in a reasonable
amount of time may be difficult. For multimodal target distributions, the chain
can get stuck near one mode, oversampling near that mode and perhaps missing
others entirely. When a target distribution has sharp ridges of high probability,
a Markov chain can bypass these areas if the average step size, that is, the
distance between the current state X and the proposed future state Y, is
too large. The step size is determined by both the mean and the variance
of the proposal distribution. When there is high correlation between individual
variables, sampling each component separately is inefficient, since the acceptance
or rejection of one variable can have a large impact on the acceptance or rejection
of another. With bounded parameter spaces, care must be taken to not propose
outside of the parameter space too often, since too many rejected proposals will
slow down convergence of the chain.
There are a variety of techniques available to alleviate some of the possible
difficulties when sampling from multivariate distributions. When the variables
are highly correlated, reparameterization can remove some of the correlation.
Examples of regression models and random effect models where this approach
67
is especially effective are given in [25]. Implementation of a reparameterization
technique requires the identification of an appropriate transformation, which
may not be an easy task.
The technique of blocking seeks to utilize the correlation between parame
ters to accelerate convergence. Blocking makes use of the correlation between
parameters in a multivariate setting by updating highly correlated variables to
gether to increase the mixing speed of the chain, that is, the speed at which the
chain moves around the support of the target density [29]. This method can
be especially effective in hierarchical models with spatially structured data. A
discussion of how the blocking approach can be used when dealing with areal
data, i.e. data aggregated over discrete spatial regions, is found in [29, 6].
Hit and run samplers are a class of algorithms that have been designed
to simulate target distributions that exhibit signs of multimodality [72, 11,
25]. The hit and run samplers are based on a generalization of the Metropolis
Hastings algorithm [72, 11, 58], and use a proposal distribution that chooses a
direction at random, and then moves a random distance in that direction. The
difference between the hit and run method and the standard random walk is
that the new move is always accepted. The hit and run sampler can be effective
in sampling multimodal target distributions, because the algorithm, with some
small probability, traverses the entire parameter space in a single move, which
allows modehopping when the proposed move is in the direction of another
mode. The hit and run algorithm has problems with mixing when the target
distribution has sharp ridges or spikes of high probability. The hit and run
algorithm may not pick out these areas of high probability sufficiently often,
68
causing the resultant chain to mix slowly. The problem of slow mixing becomes
even more significant in higher dimensions.
4.2 Stochastic Dynamics Methods
Stochastic dynamics methods are a class of algorithms motivated by stochas
tic differential equations. Stochastic dynamics methods are commonly used to
simulate physical systems such as quantum molecular dynamics and heat trans
fer. There are a number of Markov chain methods based on stochastic differential
equations, many of which are specifically designed for use in statistical physics,
although some have seen broader application [58]. The Langevin algorithm is
perhaps the most common, and is based on the discrete simulation of a contin
uous time diffusion process [58, 25, 64]. Stochastic dynamics methods suppress
the random walk behaviour of a Markov chain, which has been shown to improve
convergence results for high dimensional target densities [28].
A general diffusion process Xt X(t) is a continuous time stochastic process
defined as the solution to the stochastic differential equation:
dXt = p{Xt)dt + o{Xt)dBt, (4.4)
where Bt is standard Brownian motion [41]. In equation 4.4, p,(Xt) is called the
drift term or drift coefficient, and (?(Xt) is the diffusion term or diffusion coef
ficient. For simple cases, a solution to 4.4 can be found using the Ito Calculus,
which is derived in [13, 41]. If a solution to equation 4.4 does not exist or Xt
tends to infinity in a finite amount of time, then the diffusion process is called
explosive [41].
If a target density / is continuously differentiable, and if for some real num
bers N,a,b < oo, when (V f{x))Tx < aa; + b, for all x > N, then a diffusion
69
process can be constructed that has / as its stationary distribution [41, 71, 66].
The only nonexplosive, reversible diffusion with this property is the Langevin
diffusion [64], which has the form:
dXt=lV\og{f{Xt))dt + dBt. (4.5)
Where Bt is the standard Brownian motion.
A continuous time stochastic process cannot be directly simulated. Instead,
a discretized version of equation 4.5 must be used. The simplest such discretiza
tion is the Euler discretization. If oj > 0 is the size of the discretization, then
the continuous time diffusion process 4.5 is approximated by a smart random
walk, with steps given by:
Xt+i =Xt + y Vlog f(Xt)+utt, (4.6)
where tt follows a multivariate normal distribution with mean 0 and covariance I.
The random walk in 4.6 is called smart because proposed values are generally
in directions of higher probability than the current state. Equation 4.6 can also
be written as:
Xt+i ~ N(Xt + viog f(Xt),u2I). (4.7)
The Euler discretization in 4.6 and 4.7 is a first order approximation to
the stochastic differential equation 4.5. Other methods based on higher order
approximations have been developed, notably the Ozaki discretization which
uses a second order approximation [67]. A variety of discretization schemes for
stochastic dynamics methods are discussed in [58].
Unfortunately, the behaviour of the continuous time Markov process and
the discretized version may be radically different. This is because the Markov
70
chain based on stochastic process 4.5 samples from the correct distribution only
as the discretization size goes to zero [58]. Consequently, there are many situa
tions where the random walks given by equations 4.6 and 4.7 can be transient,
that is, there exists a state X of the chain such that the expected number of
times the Markov chain visits this state is finite. A Markov chain that is not
transient is recurrent. Situations that lead to transience include those for which
the target distribution is insufficiently smooth, or erratic, so that the gradient
experiences large changes in a local area or has a large magnitude. This erratic
behaviour of the gradient makes the chain unstable. Many of these problems
can be corrected with a MetropolisHastings rejection step [5, 64, 25, 58]. This
method of correcting for the discretization is sometimes called Hybrid Monte
Carlo, Langevin Monte Carlo [58], or a Metropolis Adjusted Langevin Algorithm
(MALA) [71].
Algorithm 4.3 The MALA can be summarized as follows. Let the target dis
tribution be denoted by f(x), and let fi(x) = ^Vlog/(a;). Let a(x,y) be defined
by:
a(x,y) =  Pl 2aJ. " (4.8)
f(x) eXp(li'x^)ll2))
Given the current state Xt, the Markov chain is simulated by:
1. Generate Y ~ N(A + fV log/(**), u2I)
2. Find min(a(At, Y), 1).
3. With probability min(a, 1), set Xt+1 = Y. Otherwise, set Xt+1 = X1.
The proposal value is kept with probability min(a, 1). The rejection step 4.8
turns the Langevin algorithm into a MetropolisHastings algorithm with a smart
71
proposal [25, 64]. The conditional density in the MetropolisHastings algo
rithm 4.1 for the MALA is q{x\y) cc N(y + //(y),w2/), where the form of the
conditional density in 4.8 comes from the Brownian motion in the stochastic
differential equations 4.4 and 4.5. The key difference of the Metropolis adjusted
Langevin algorithm from the standard random walk is that the MALA sup
presses the random walk behaviour of the chain by using the local properties
of the target density (i.e. Vlog/(A"t)) to adjust the proposal distribution [28].
This nudges the chain uphill in the direction of a mode, which is a highly
desirable property in high dimensional problems where the behaviour of the
gradient is smooth [71, 64, 25, 67].
The chain 4.3 will tend to propose values that are in the direction of the
mode. Because algorithm 4.3 is a special case of the Metropolis Hastings algo
rithm, convergence of the chain relies on the properties of algorithm 4.1, and not
those of the Langevin diffusion process. Strategies such as reparameterization
and blocking can be employed with 4.3 to increase the mixing speed. The sim
ulation can be further adjusted for erratic behaviour of the gradient by using a
truncated proposal step to achieve more robust ergodic properties [5, 71]. This
truncation comes through the imposition of an upper limit on the step length
in the proposal distribution [25, 65].
The MALA has been shown to perform as well as or better than the standard
random walk in many settings [66, 67, 71]. As with other kinds of Metropolis
Hastings simulations, convergence properties depend largely on the smoothness
of the target distribution. One situation where the MALA works especially well
is when the target distribution is from the exponential family of distributions. In
72
situations where the target distribution is discontinuous or is not differentiable
at some points, then smoothed proposals can be employed [71].
A recent variation of the Langevin algorithm is the tempered Langevin algo
rithm [67]. The standard MALA approach uses a fixed variance coefficient (w
in equations 4.6 and 4.7). A more general diffusion process can be used to get
another variation of the Langevin algorithm. Note that equation 4.5 is a special
case of equation 4.4, where the drift term is n(Xt) = V log(/(Xt)), and the
diffusion coefficient is a(Xt) = col. If a more general diffusion term is used in
4.4, then the tempered Langevin diffusion is obtained [67].
If the diffusion coefficient is a(x), then the diffusion matrix is given by a
positive definite matrix a(x) = a(x)aT(x). Let 0 < d < Choosing a(x) =
f~2d(x)I I, and setting the drift term to be p(x) = ^^a(x) V log(/(x))
gives the Langevin tempered diffusion [67]. The tempered diffusion is the solu
tion to the stochastic differential equation:
dxt = l=^r2d(Xt)X log(f(Xt))dt + rd(Xt)IdBt (4.9)
The diffusion in equation 4.9 is a more general form of the Langevin diffusion
4.5 and is sometimes called a heated Markov chain. The two special cases of
equation 4.9 that are important are when d = 0, which implies the standard
Langevin diffusion, and d = 1/2, which gives Brownian motion [67]. Choosing
different values of d gives different heated chains.
Heated Markov chains have stronger theoretical convergence properties than
unheated chains, since the diffusion matrix acts as an accelerator [67]. The
accelerator can alleviate some of the problems encountered with multimodal
target distributions. In areas of low probability, the variance a(x) = f~2d(x)I
73
will be larger, causing the chain to take larger steps in those areas. As the chain
enters areas of high probability, a(x) will decrease, leading to a smaller step size.
The discretized version of the diffusion 4.9 is given by:
Xt+i ~ N(At + ^a(Xt)V\ogf(Xt), a(Xt) I). (4.10)
Using a rejection step gives a Metropolis algorithm. To adjust the probability 4.8
for the tempered diffusion, set ui2 = a(Xt), which gives:
oc{XuYt)
m exp(m^i)
/Wexp(m^ggpi)
(4.11)
where /i(x) = ^^a(x)Vlog{f{x)) and a(ATf) = f~2d(Xt).
The tempered Langevin algorithm, like the standard Langevin algorithm 4.3,
is a random walk that makes smart jumps. That is, the chain is likely to
propose values that are in the direction of an area of high probability.
Blocking, hit and run algorithms, Langevin algorithms, and tempered
Langevin algorithms, are only a subset of the many methods for sampling from
complicated distributions. When more than a couple of difficulties are encoun
tered, it can require the innovative use of several of these or other approaches
to generate a sample.
The joint posterior distribution produced by the Chronic Wasting Disease
model presented in chapter 3 has a number of characteristics that make the
implementation of a Markov Chain simulation difficult. Firstly, the posterior
distribution exists on a bounded parameter space. Secondly, calculation of the
gradient at various points of the parameter space indicates that the surface of
the density function contains sharp ridges of high probability.
74
The sharp ridges of high probability mean that a hit and run type algorithm
could miss the area of high probability, thereby oversampling regions where
the density is comparatively low. Langevin algorithms also have difficulty in
this situation, because the steepness of the density causes the magnitude of the
gradient to be extremely large in areas of moderate probability, a common source
of instability in Langevin algorithms. The difficulties are compounded with the
bounded parameter space, since proposals are likely to fall outside the support
of parameter space. When using a normal proposal distribution, as with the
standard Langevin algorithm, too many proposals outside the parameter space
will cause the chain to mix slowly.
One possible method of dealing with this problem is to use a proposal distri
bution in algorithm 4.3 with the same support as the parameter space. However,
even using a different proposal distribution, such as a truncated normal distri
bution on [0,1], can slow the mixing of the Markov chain, because the potential
asymmetry of the truncated normal can cause difficulties for the standard ran
dom walk or MALA. Additionally, the convergence theorems for the stochastic
differential equation 4.5 no longer apply, since the noise term is no longer a
Brownian motion.
4.3 A Tempered Langevin Algorithm for Bounded Densities
The posterior sample produced by the acceptance sampler in [37] and dis
cussed in chapter 3 illustrates the complicated nature of the posterior distribu
tion 3.11. The largest issue is the fact that the support of the posterior density
exists on a bounded parameter space. All of the parameters, pio,o> P*,Qi P5,o,
a, 6, and 7, exist only within the unit interval, whereas standard approaches
75
