Citation
Bayesian hierarchical modeling and Markov chain simulation for chronic wasting disease

Material Information

Title:
Bayesian hierarchical modeling and Markov chain simulation for chronic wasting disease
Creator:
Mehl, Christopher H
Publication Date:
Language:
English
Physical Description:
xxv, 249 leaves : ; 28 cm

Subjects

Subjects / Keywords:
Chronic wasting disease -- Mathematical models ( lcsh )
Bayesian statistical decision theory ( lcsh )
Markov processes ( lcsh )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Bibliography:
Includes bibliographical references (leaves 243-249).
General Note:
Department of Mathematical and Statistical Sciences
Statement of Responsibility:
by Christopher H. Mehl.

Record Information

Source Institution:
|University of Colorado Denver
Holding Location:
|Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
63787886 ( OCLC )
ocm63787886
Classification:
LD1193.L622 2004d M33 ( lcc )

Full Text
BAYESIAN HIERARCHICAL MODELING AND MARKOV CHAIN
SIMULATION FOR CHRONIC WASTING DISEASE
by
Christopher H. Mehl
Bachelor of Science, University of Colorado at Denver, 1998
A thesis submitted to the
University of Colorado at Denver
in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
Applied Mathematics
2004


This thesis for the Doctor of Philosophy
degree by
Christopher H. Mehl
has been approved
by
Lynn S. Bennethum


Mehl, Christopher H. (Ph.D., Applied Mathematics)
Bayesian Hierarchical Modeling and Markov Chain Simulation for Chronic Wast-
ing Disease
Thesis directed by Assistant Professor Craig J. Johns
ABSTRACT
In this thesis, a dynamic spatial model for the spread of Chronic Wasting
Disease in Colorado mule deer is derived from a system of differential equations
that captures the qualitative spatial and temporal behaviour of the disease.
These differential equations are incorporated into an empirical Bayesian hierar-
chical model through the unusual step of deterministic autoregressive updates.
Spatial effects in the model are described directly in the differential equations
rather than through the use of correlations in the data. The use of deterministic
updates is a simplification that reduces the number of parameters that must be
estimated, yet still provides a flexible model that gives reasonable predictions for
the disease. The posterior distribution generated by the data model hierarchy
possesses characteristics that are atypical for many Markov chain Monte Carlo
simulation techniques. To address these difficulties, a new MCMC technique is
developed that has qualities similar to recently introduced tempered Langevin
type algorithms. The methodology is used to fit the CWD model, and posterior
parameter estimates are then used to obtain predictions about Chronic Wasting
Disease.
in


This abstract accurately represents the content of the candidates thesis. I


DEDICATION
This thesis is dedicated to my lovely wife, Jennifer. She has loved and supported
me for more than four years, putting up with, among other things, boring dinner
conversation about deer and statistics. Without her, I would never have made
it this far. This is really for you, Jen!


ACKNOWLEDGMENT
This thesis would not have been possible without the generous support of my
family, friends, and those that have provided guidance over the years. Thank you
Liz and Marcia, for collectively knowing everything. Thanks to my committee,
Lynn Bennethum, Steve Sain, and Doug Nychka, for all of their comments and
their time. Thanks to my advisor, Craig Johns, for providing his guidance and
his friendship. Most of all, I would like to thank Karen Kafadar for teaching me
most of what I know about statistics.


CONTENTS
Figures ................................................................. xi
Tables................................................................ xxiv
Chapter
1. Introduction......................................................... 2
2. Models of Epidemics.................................................. 7
2.1 Deterministic Models of Epidemics................................ 7
2.2 Bayesian Hierarchical Models of Epidemics........................... 18
3. A Spatial Model for Chronic Wasting Disease ..................... 26
3.1 Data................................................................ 29
3.2 Prevalence Model for Chronic Wasting Disease..................... 31
3.2.1 Model Motivation.................................................. 32
3.2.2 Incorporating Spatial Mixing...................................... 37
3.2.3 The Hierarchical Statistical Model................................ 43
3.3 Finding a Good Model.............................................. 48
3.4 Accept/Reject Methods for the CWD Model............................. 52
4. Markov Chain Monte Carlo Simulation for the CWD Model ........... 62
4.1 Markov Chain Methods ............................................... 62
4.2 Stochastic Dynamics Methods......................................... 69
4.3 A Tempered Langevin Algorithm for Bounded Densities ............. 75
4.3.1 Markov Chain Simulation Attempts for the CWD Model............. 78
vii


4.3.2 Overcoming the Difficulties....................................... 86
4.3.3 Tuning the MALTS Algorithm ....................................... 94
4.3.4 Test Scenarios with Known Densities............................... 96
4.3.5 Implementation for the CWD Model................................. 137
4.4 Performance of the Markov Chain Sampler........................... 139
5. Results............................................................. 152
5.1 Analysis of the Posterior Estimates.............................. 152
5.2 Model Validation................................................... 160
6. Discussion.......................................................... 168
7. Conclusion.......................................................... 181
Appendix
A. Acceptance Rates for Simulation Comparisons......................... 183
A.l Acceptance Rates for the Truncated Normal Simulations............... 183
A. 2 Acceptance Rates for the Multivariate Normal Simulations.......... 183
B. Tables for the Model Validation Simulations....................... 188
B.l Group One Simulation Results..................................... 188
B.2 Group Two Simulation Results .................................... 188
B.3 Group Three Simulation Results...................................... 189
B.4 Group Four Simulation Results.................................... 189
B.5 Group Five Simulation Results ................................... 189
B.6 Group Six Simulation Results........................................ 189
B.7 Group Seven Simulation Results...................................... 190
B.8 Group Eight Simulation Results...................................... 190
B.9 Group Nine Simulation Results....................................... 191
viii


C. Matlab Code for the Markov Chain Simulations.................. 193
C.l Miscellaneous Probability Functions........................... 193
C.1.1 Function: logmvnkern.m..................................... 193
C.l.2 Function: logtruncmvn.m.................................... 194
C.2 Code for Simulating the Truncated Normal Distributions........ 195
C.2.1 Program: truncnormcomps.m................................. 195
C.2.2 Function: mehlwitch.m...................................... 199
C.2.3 Function: langwitch.m...................................... 201
C.2.4 Function: rwwitch.m........................................ 204
C.3 Code for Simulating the Normal Distributions.................. 206
C.3.1 Program: mvnormalcomps.m.................................. 206
C.3.2 Function: mehlnormal.m..................................... 209
C.3.3 Function: langnormal.m..................................... 212
C.3.4 Function: rwnormal.m....................................... 215
C.4 Code for the Chronic Wasting Disease Model ................... 216
C.4.1 Function: pmaker3.m........................................ 216
C.4.2 Function: postfun3nopi.m................................... 219
C.4.3 Program: prevmehl7g.m..................................... 221
C.4.4 Function: metpar7g.m....................................... 224
C.4.5 Function: metpar7a.m....................................... 228
C.5 Simulation Code for Model Validation ......................... 229
C.5.1 Program: groupltest.m .................................... 230
C.5.2 Function: simrun.m ........................................ 234
C.5.3 Function: metsim7g.m....................................... 237
IX


References
243
x


FIGURES
Figure
3.1 Colorado Deer Data Analysis Units across the state. The larger
spatial units are the DAUs, the smaller units are game management
units.................................................................. 31
3.2 Graph of the derivative of p versus p. Stable points (0,1/4) are on
the p axis............................................................. 36
3.3 Proportion of deer which test positive by year for DAUs 4, 5, and 10.
The total number tested is shown in the background, also by year.
The proportion which test positive decreases as the number tested
increases, as does the variability..................................... 49
3.4 Comparisons of the posterior sample with prior distributions for parame-
ters (a) pio.Oj (b) a the acceleration parameter, (c) S the long run preva-
lence and (d) 7, the average propensity to migrate. The prior and his-
togram approximation of the posterior have been scaled to have maximum
value of one in order to highlight the difference between the two distribu-
tions.............................................................. 56
3.5 Contour plot of a by 5. Most of the values occur along a banana
shaped ridge of high probability................................... 58
xi


3.6 Contour plots of variable pairs for the acceptance sampler. Plot (a)
shows the graph of the acceleration a versus the movement parameter
7. Plot (b) shows the graph of the long term sustainable level 5 versus
7. Plot (c) shows the graph of a versus pio,o> the initial prevalence
in DAU ten. Plot (d) shows the graph of p4)o versus the acceleration
a. These graphs indicate correlation among the parameters............ 59
3.7 Contour pair plots of the variables for the acceptance sampler. Plot
(a) shows the graph of the acceleration a versus the initial prevalence
in DAU five, Plot (b) shows the graph of the initial prevalences
pio,o and p4i0- Plot (c) shows the graph of pi0,o and p5i0. Plot (d)
shows the graph of p4)0 versus the p5]0.............................. 60
4.1 Pair plot for all 300,000 iterations of the parameters a and 5 pro-
duced by the standard random walk sampler. The distinctive banana
shape seen in the acceptance sample is not present. The algorithm
failed to produce the correct distribution after 300,000 iterations. . 80
4.2 Pair plots for all iterations of the variables for the standard random
walk sampler. The algorithm failed to produce the correct distri-
bution after 300,000 iterations. Plot (a) shows the graph of the
acceleration a versus the movement parameter 7. Plot (b) shows
the graph of the long term sustainable level 5 versus 7. Plot (c)
shows the graph of a versus pio.o, the initial prevalence in DAU ten.
Plot (d) shows the graph of p4io versus the acceleration a........... 81
xn


4.3 Pair plot of all iterations for the parameters a and 6 produced by
the Langevin sampler. The distinctive banana shape seen in the
acceptance sample is not present. The step size was on the order of
1 x 10~20. The algorithm failed to produce the correct distribution
after 300,000 iterations............................................. 83
4.4 Pair plot of all iterations for the parameters a and 6 produced by
the tempered Langevin sampler. The distinctive banana shape seen
in the acceptance sample is not present, and the chain was unable
to venture far from its initial position at the origin. The algorithm
failed to produce the correct distribution after 300,000 iterations.
Note from the axes that the chain is tightly concentrated near its
starting value at the origin......................................... 84
4.5 Pair plots of all iterations for the variables for the tempered Langevin
algorithm. The algorithm failed to produce the correct distribution
after 300,000 iterations. Plot (a) shows the graph of the acceleration
a versus the movement parameter 7. Plot (b) shows the graph of
the long term sustainable level 5 versus 7. Plot (c) shows the graph
of a versus p10)o, the initial prevalence in DAU ten. Plot (d) shows
the graph of p^o versus the acceleration a........................... 85
4.6 Univariate normal density with mean zero, variance three. A current
state is labeled X. The higher probability area is to the left of the
current state, lower probability areas are to the right.............. 92
4.7 The bivariate normal distribution truncated on [0,1] with mean
(0.5, 0.5)t and covariance 0.001 /................................. 97
Xlll


4.8 Pairwise plots of the mean parameters 9\ and 92 for the truncated
normal distribution with true mean (0.5, 0.5)r and true covariance
0. 001 I. (a) shows 8,000 iterations for the initial value (0, 0)T. (b)
shows 8,000 iterations for the starting value (0,0.001)T. (c) shows
8,000 iterations for the starting value (0.3,0.4)T. (d) shows 8,000
iterations for the starting value (0.1,0.8)T...........................
4.9 Plots of the log potential scale reductions by thousand iterations for
the mean parameters 9\ and 92 for the MALTS simulation of the
truncated normal distribution. The solid line shows the results for
01, the dotted line shows the results for 02...........................
4.10 Plots of the log potential scale reductions by hundred iterations for
the parameters 9\ and d2 for the the simulation of the 2 dimensional
truncated normal distribution. The solid line is the MALTS algo-
rithm, the dash-dot is the Langevin algorithm, and the dotted line
is the standard random walk, (a) shows the results for 9\. (b) shows
the results for 92.....................................................
4.11 Plots of the log potential scale reductions by hundred iterations for
the variance of the parameters 9i and 92 for the the simulation of
the 2 dimensional truncated normal distribution. The solid line is
the MALTS algorithm, the dash-dot is the Langevin algorithm, and
the dotted line is the standard random walk, (a) shows the results
for 9\. (b) shows the results for 92...................................


4.12 Plots of the log potential scale reductions by hundred iterations for
the parameters 9X and 02 for the the simulation of the 36 dimen-
sional truncated normal distribution. The solid line is the MALTS
algorithm, the dash-dot is the Langevin algorithm, and the dotted
line is the standard random walk, (a) shows the results for 9X. (b)
shows the results for 02............................................... 107
4.13 Plots of the log potential scale reductions by hundred iterations for
the variance of the parameters 9X and 02 for the the simulation of
the 36 dimensional truncated normal distribution. The solid line is
the MALTS algorithm, the dash-dot is the Langevin algorithm, and
the dotted line is the standard random walk, (a) shows the results
for 9\. (b) shows the results for 02................................... 108
4.14 Plots of the log potential scale reductions by time for the parame-
ters 0i and 02 for the the simulation of the 2 dimensional truncated
normal distribution. The solid line is the MALTS algorithm, the
dash-dot is the Langevin algorithm, and the dotted line is the stan-
dard random walk, (a) shows the results for 0!. (b) shows the results
for 02................................................................. 109
4.15 Plots of the log potential scale reductions by time for the variance
of the parameters 0i and 02 for the the simulation of the 2 dimen-
sional truncated normal distribution. The solid line is the MALTS
algorithm, the dash-dot is the Langevin algorithm, and the dotted
line is the standard random walk, (a) shows the results for 9X. (b)
shows the results for 02............................................... 110
xv


4.16 Plots of the log potential scale reductions by time for the parame-
ters 9\ and 02 for the the simulation of the 36 dimensional truncated
normal distribution. The solid line is the MALTS algorithm, the
dash-dot is the Langevin algorithm, and the dotted line is the stan-
dard random walk, (a) shows the results for 9\. (b) shows the results
for 02................................................................. Ill
4.17 Plots of the log potential scale reductions by time for the variance
of the parameters 9\ and 02 for the the simulation of the 36 dimen-
sional truncated normal distribution. The solid line is the MALTS
algorithm, the dash-dot is the Langevin algorithm, and the dotted
line is the standard random walk, (a) shows the results for 9\. (b)
shows the results for 02............................................... 112
4.18 Plots of the log potential scale reductions by hundred iterations for
the parameters 9\ and 02 for the the simulation of the two dimen-
sional normal distribution, with correlation p = 0.879 and step size
1 x 10~5. The solid line is the MALTS algorithm, the dash-dot is
the Langevin algorithm, and the dotted line is the standard random
walk, (a) shows the results for 9\. (b) shows the results for 02. . . 115
4.19 Plots of the log potential scale reductions by hundred iterations for
the variance of the parameters 9\ and 02 for the the simulation of the
two dimensional normal distribution, with correlation p = 0.879 and
step size 1 x 10-5. The solid line is the MALTS algorithm, the dash-
dot is the Langevin algorithm, and the dotted line is the standard
random walk, (a) shows the results for 9\. (b) shows the results for 02.116
xvi


4.20 Plots of the log potential scale reductions by hundred iterations for
the parameters 9X and 02 for the the simulation of the 36 dimensional
normal distribution, with correlation p = 0.879 and step size 1 x 10-5.
The solid line is the MALTS algorithm, the dash-dot is the Langevin
algorithm, and the dotted line is the standard random walk, (a)
shows the results for 9\. (b) shows the results for 02................. 117
4.21 Plots of the log potential scale reductions by hundred iterations for
the variance of the parameters 9X and 92 for the the simulation of the
36 dimensional normal distribution, with correlation p = 0.879 and
step size 1 x 10-5. The solid line is the MALTS algorithm, the dash-
dot is the Langevin algorithm, and the dotted line is the standard
random walk, (a) shows the results for 9X. (b) shows the results for 02-118
4.22 Plots of the log potential scale reductions by time for the parame-
ters 0i and 02 for the the simulation of the two dimensional normal
distribution, with correlation p = 0.879 and step size 1 x 10-5. The
solid line is the MALTS algorithm, the dash-dot is the Langevin al-
gorithm, and the dotted line is the standard random walk, (a) shows
the results for 0i- (b) shows the results for 02....................... 119
4.23 Plots of the log potential scale reductions by time for the variance
of the parameters 0i and 02 for the the simulation of the two dimen-
sional normal distribution, with correlation p = 0.879 and step size
1 x 10~5. The solid line is the MALTS algorithm, the dash-dot is
the Langevin algorithm, and the dotted line is the standard random
walk, (a) shows the results for 9X. (b) shows the results for 02. . . 120
XVII


4.24 Plots of the log potential scale reductions by time for the parame-
ters Q\ and 62 for the the simulation of the 36 dimensional normal
distribution, with correlation p 0.879 and step size 1 x 10-5. The
solid line is the MALTS algorithm, the dash-dot is the Langevin al-
gorithm, and the dotted line is the standard random walk, (a) shows
the results for 9\. (b) shows the results for 62....................... 121
4.25 Plots of the log potential scale reductions by time for the variance of
the parameters 9X and 92 for the the simulation of the 36 dimensional
normal distribution, with correlation p = 0.879 and step size 1 x 10-5.
The solid line is the MALTS algorithm, the dash-dot is the Langevin
algorithm, and the dotted line is the standard random walk, (a)
shows the results for 9\. (b) shows the results for 02................. 122
4.26 Plots of the log potential scale reductions by hundred iterations for
the parameters 9\ and 02 for the the simulation of the 36 dimensional
normal distribution, with correlation p = 0.992 and step size 1 x 10~5.
The solid line is the MALTS algorithm, the dash-dot is the Langevin
algorithm, and the dotted line is the standard random walk, (a)
shows the results for 9\. (b) shows the results for 62................. 124
4.27 Plots of the log potential scale reductions by time for the parame-
ters 9\ and #2 for the the simulation of the 36 dimensional normal
distribution, with correlation p = 0.992 and step size 1 x 10-5. The
solid line is the MALTS algorithm, the dash-dot is the Langevin al-
gorithm, and the dotted line is the standard random walk, (a) shows
the results for 9\. (b) shows the results for 02....................... 125
XVlll


4.28 Plots of the log potential scale reductions by hundred iterations for
the variance of the parameters 9\ and 02 for the the simulation of the
36 dimensional normal distribution, with correlation p = 0.992 and
step size 1 x 10-5. The solid line is the MALTS algorithm, the dash-
dot is the Langevin algorithm, and the dotted line is the standard
random walk, (a) shows the results for 9\. (b) shows the results for 92-l26
4.29 Plots of the log potential scale reductions by time for the variance of
the parameters 9\ and 02 for the the simulation of the 36 dimensional
normal distribution, with correlation p = 0.992 and step size 1 x 10~5.
The solid line is the MALTS algorithm, the dash-dot is the Langevin
algorithm, and the dotted line is the standard random walk, (a)
shows the results for 9\. (b) shows the results for 02................. 127
4.30 Plots of the log potential scale reductions by hundred iterations for
the parameters 9i and 92 for the the simulation of the 200 dimensional
normal distribution, with correlation p = 0.992 and step size 1 x 10~5.
The solid line is the MALTS algorithm, the dash-dot is the Langevin
algorithm, and the dotted line is the standard random walk, (a)
shows the results for 9\. (b) shows the results for 62................. 128
4.31 Plots of the log potential scale reductions by time for the parame-
ters 9\ and 92 for the the simulation of the 200 dimensional normal
distribution, with correlation p = 0.992 and step size 1 x 10-5. The
solid line is the MALTS algorithm, the dash-dot is the Langevin al-
gorithm, and the dotted line is the standard random walk, (a) shows
the results for 9\. (b) shows the results for 92....................... 129
xix


4.32 Plots of the acceptance rates by dimension for the multivariate nor-
mal simulation with correlation p = 0.992 and step size 1 x 10-5.
The solid line is the MALTS algorithm, the dash-dot is the Langevin
algorithm, and the dotted line is the standard random walk............. 131
4.33 Plots of the log potential scale reductions by hundred iterations for
the parameters 9\ and 92 for the the simulation of the 36 dimensional
normal distribution, with correlation p = 0.992, and step size 1 x
10~4. The solid line is the MALTS algorithm, the dash-dot is the
Langevin algorithm, and the dotted line is the standard random walk.
(a) shows the results for 9\. (b) shows the results for 92............. 133
4.34 Plots of the log potential scale reductions by time for the parame-
ters 9\ and 92 for the the simulation of the 36 dimensional normal
distribution, with correlation p = 0.992, and step size 1 x 10~4. The
solid line is the MALTS algorithm, the dash-dot is the Langevin al-
gorithm, and the dotted line is the standard random walk, (a) shows
the results for 0\. (b) shows the results for 92....................... 134
4.35 Plots of the log potential scale reductions by hundred iterations for
the parameters 9\ and 92 for the the simulation of the 36 dimensional
normal distribution, with correlation p = 0.992, and step size 1 x
10-2. The solid line is the MALTS algorithm, the dash-dot is the
Langevin algorithm, and the dotted line is the standard random walk.
(a) shows the results for 9\. (b) shows the results for the variance of
0i..................................................................... 135
xx


4.36 Plots of the log potential scale reductions by time for the parame-
ters and 02 for the the simulation of the 36 dimensional normal
distribution, with correlation p 0.992, and step size 1 x 10-2. The
solid line is the MALTS algorithm, the dash-dot is the Langevin al-
gorithm, and the dotted line is the standard random walk, (a) shows
the results for 9\. (b) shows the results for the variance of 9\. ... 136
4.37 Time plots of the parameters pio,o and for the Markov chain
simulation, (a) shows the plot of pio,o by iteration, (b) shows the
plot of p4io by iteration............................................ 140
4.38 Time plots of the parameters p5i0 and 7 for the Markov chain sim-
ulation. (a) shows the parameter p5i0 by iteration, (b) shows the
parameter 7 by iteration............................................. 141
4.39 Time plots of the parameters a and 5 for the Markov chain sim-
ulation. It is not clear from these plots that the chain has reach
stationarity. (a) shows the parameter a versus the iteration, (b)
shows the parameter 5 versus the iteration........................... 143
4.40 Contour plot of the dynamic parameters a by 5 for 250,000 itera-
tions of the Markov chain simulation. The contour plot is extremely
close to the acceptance sampler plot in figure 3.5 and produces the
distinctive banana shaped ridge...................................... 144
xxi


4.41 Contour plots produced by 250,000 iterations of the Markov chain
simulation. The large number of points produces a smoother image
than the contour image produced by the acceptance sampler, (a)
shows the parameters 7 versus a. (b) shows the parameters 7 versus
6. (c) shows the parameters a versus pio.o- (d) shows the parameters
a versus p4>o........................................................ 145
4.42 Contour plots produced by 250,000 iterations of the Markov chain
simulation. The large number of points produces a smoother image
than the contour image produced by the acceptance sampler, (a)
shows the parameters a versus p$fl- (b) shows the parameters p4i0
versus pio,o- (c) shows the parameters p^o versus pio.o- (d) shows the
parameters p^to versus p4i0.......................................... 146
4.43 Plot of the potential scale reduction by iteration for each parameter.
The dashed line shows the results of the standard random walk. The
solid line shows the results for algorithm 4.4. The graphs illustrate
both the poor performance of the standard random walk and the
convergence to stationarity of 4.4. (a) shows the results for pio.o- (b)
shows the results for p4)0. (c) shows the results for p5j0. (d) shows
the results for a. (e) shows the results for 5. (f) shows the results
for 7................................................................ 148
xxii


5.1 Fitted values and corresponding frequentist confidence intervals for
the proportion of infected deer in the endemic region, based on the
component-wise median of the posterior sample. Depicted are (a)
DAU 4, and (b) DAU 5. Observed prevalences are denoted with o,
95% confidence intervals are denoted by and the predicted value
is denoted by V.................................................. 154
5.2 Fitted values and corresponding frequentist confidence intervals for
the proportion of infected deer in the endemic region, based on the
component wise median of the posterior sample. Depicted are (c)
DAU 10, and (d) DAU 27. Observed prevalences are denoted with o,
95% confidence intervals are denoted by and the predicted value
is denoted by V.................................................. 155
5.3 Fitted values and corresponding frequentist confidence intervals for
the proportion of infected deer in the endemic region, based on the
component wise median of the posterior sample. Depicted are (e)
DAU 44. Observed prevalences are denoted with o, 95% confidence
intervals are denoted by and the predicted value is denoted by V. 156
5.4 Predicted prevalences from 1976 to 2250, for all DAUs based on the
median curve based on the posterior Markov chain sample.......... 157
5.5 Solid line is the median track for prevalence based on the posterior
Markov chain sample (N = 300,000). Dashed lines represent point-
wise 95% credible intervals based on Markov chain sample, (a) shows
DAU 3. (b) shows DAU 4. (c) shows DAU 16, which borders the
endemic region, (d) shows DAU 17................................. 158
XXUl


TABLES
Table
3.1 Neighborhood relationships between DAUs. Digits refer to DAU
number in Figure 3.1.............................................. 40
3.2 Neighborhood relationships between DAUs. Digits refer to DAU
number in Figure 3.1.............................................. 41
3.3 95% Bayesian credible intervals, posterior estimates, and modes for
the marginal posterior densities simulated using the acceptance sam-
pler. All values have been multiplied by 100......................... 55
4.1 Values of the Kolmogorov-Smirnov test statistic and the correspond-
ing p-values comparing the acceptance sample with the Markov chain
sample for each parameter........................................... 149
4.2 95% Bayesian credible intervals and Monte Carlo estimates for pos-
terior density from the Markov chain simulation. All values have
been multiplied by 100.............................................. 150
5.1 Results of the group 3 simulation, with p1)0 = 0.05, a 0.10, 5 =
0.50, and 7 = 0.01. Values are the average result of 50 pseudo data
sets................................................................ 164
5.2 Results of the group 4 simulation, with pli0 = 0.05, ^2,0 = 0.01,
a = 0.10, 5 = 0.05, and 7 = 0.01. All of the true parameter values
are captured by the credible intervals. Values are averaged over 50
pseudo data sets.................................................... 165
xxiv


A.l Acceptance rates for the multivariate truncated normal distribution. 183
A.2 Acceptance rates for each dimension for the multivariate normal dis-
tribution with correlation p = 0.879. The step size is 1 x 10~5. . . 184
A.3 Acceptance rates for each dimension for the multivariate normal dis-
tribution with correlation p = 0.879. The step size is 1 x 10~4. . . 184
A.4 Acceptance rates for each dimension for the multivariate normal dis-
tribution with correlation p = 0.879. The step size is 1 x 10-2. . . 185
A.5 Acceptance rates for each dimension for the multivariate normal dis-
tribution with correlation p 0.970. The step size is 1 x 10-5. . . 185
A.6 Acceptance rates for each dimension for the multivariate normal dis-
tribution with correlation p = 0.970. The step size is 1 x 10-4. . . 185
A.7 Acceptance rates for each dimension for the multivariate normal dis-
tribution with correlation p = 0.970. The step size is 1 x 10~2. . . 186
A.8 Acceptance rates for each dimension for the multivariate normal dis-
tribution with correlation p = 0.992. The steps size is 1 x 10~5. . 186
A.9 Acceptance rates for each dimension for the multivariate normal dis-
tribution with correlation p = 0.992. The steps size is 1 x 10~5. . 186
A. 10 Acceptance rates for each dimension for the multivariate normal dis-
tribution with correlation p = 0.992. The step size is 1 x 10-4. . . 187
A. 11 Acceptance rates for each dimension for the multivariate normal dis-
tribution with correlation p = 0.992. The step size is 1 x 10-2. . 187
B. l Results of the group 1 simulation, with pli0 = 0.05, a = 0.10, J =
0.05, and 7 = 0.01.................................................. 188
xxv


B.2 Results of the group 2 simulation, with pii0 = 0.05, a 0.10, d =
0.15, and 7 = 0.01..................................................... 189
B.3 Results of the group 5 simulation, with p1|0 = 0.05, £>2,0 = 0.01,
a = 0.10, 5 = 0.15, and 7 = 0.01....................................... 190
B.4 Results of the group 6 simulation, with plj0 = 0.05, p2,o 0.01,
a = 0.10, 5 = 0.50, and 7 = 0.01....................................... 190
B.5 Results of the group 7 simulation, with pi>0 = 0.01, a = 0.10, d =
0.05, and 7 = 0.01..................................................... 191
B.6 Results of the group 8 simulation, with p\$ = 0.01, a = 0.10, S =
0.15, and 7 = 0.01..................................................... 191
B.7 Results of the group 9 simulation, with pito = 0.01, a = 0.10, d =
0.50, and 7 = 0.01..................................................... 192
1


1. Introduction
One of the most fascinating and important branches of population biology is
the modeling of contagious phenomenon, or epidemics. The study of epidemics
is integral to the survival of humanity. The potential impact of a disease as it
moves through a society can be frightening. For example, in fourteenth century
Europe, 25 million people died from the Black Death (pulmonary plague, out of
a population of about 100 million. An estimated half of the 3.5 million Aztecs
died from smallpox. More recently, in 1919, more than 20 million people died
globally in the influenza pandemic [63, 74], Today, there is widespread awareness
and attention given to diseases such as AIDS [52, 63], as well as non-infectious
illnesses such as heart disease and cancer. There are also widespread cases of less
glamorous diseases such as malaria, which has an estimated 350 million cases in
endemic areas [63, 74],
The potential devastation that can be caused by an infectious disease has
caught the attention of both the news media and the public. Potential biological
attacks aside, natural diseases still pose a grave threat to the public. Recent
outbreaks of Severe Acute Respiratory Syndrome (SARS), West Nile Virus, and
avian influenza have all grabbed headlines and prompted the reaction of public
health officials across the world. Even illnesses that do not kill are costly to
society in terms of lost productivity and reduction of the quality of life. Under-
standing how diseases evolve dynamically and spatially has and will continue to
help public health officials combat these negative effects. Mathematical mod-
2


els that describe the temporal behaviour and the spatial spread of disease are
important in this process.
Mathematical models of epidemics represent a simplification of the biological
processes involved, but they are nonetheless important, because they enable
researchers to understand disease dynamics and to predict the potential impact
of an epidemic. Complicated biological systems often have features that can be
reasonably modeled by relatively simple mathematical systems. Epidemics have
been studied using deterministic, statistical, and stochastic modeling techniques.
These three methods originated at different times and for different reasons, and
at times they seem unrelated, but they have been applied jointly with great
success [33, 63, 17, 52],
Deterministic models describe the spread of disease through systems of or-
dinary or partial differential equations [63, 33, 79]. In building a deterministic
model the goal is often to give the model as much freedom as necessary to make
it biologically plausible. Thus a deterministic model can have a large num-
ber of parameters, and is judged by the qualitative behaviour it exhibits and
its explanatory capabilities. Little regard is given to available data or parsi-
mony. Unfortunately, good but complicated models are often impossible to
solve explicitly, so a modeler employs qualitative analysis or numerical approxi-
mation. Many deterministic models begin as simple population models, such as
the logistic equation, and are derived as particular cases of multiple species com-
petition models, such as the Lotka-Volterra predator-prey model. Variations on
these models include time-lag differential equation models and partial differen-
tial equations that incorporate age structures. Spatial population processes can
3


be modeled deterministically using discrete spatial structures such as lattices.
More complicated spatial models use deterministic versions of diffusion equa-
tions [51, 63, 33]. An important feature of differential equations of epidemics is
that simple models with a small number of parameters often have qualitative
features that are similar to those of more complicated models. This point will be
exploited later in this thesis for building a model for Chronic Wasting Disease.
Stochastic models describe the spread of disease using stochastic differential
equations such as contact processes, diffusion process, or continuous birth and
death process [52, 63, 42], Stochastic models consist of a dynamic model with
random parameters that contain a noise term. These models are often difficult to
solve explicitly. The goal is to capture, as fully as possible, the random nature of
population systems, which can result in models that are mathematically rigorous
but with no biological interpretation [63]. Simplifying assumptions can make
the model tractable, such as the assumption that the noise term is a Brownian
motion [41]. Although these assumptions are mathematically desirable and make
the model both computationally feasible and easier to manipulate, they may
not be biologically appropriate and do not incorporate available data. Often,
however, analytical approximations lead to workable models with features that
are both biologically interpretable and mathematically elegant [51, 63].
Critical to statistical models of epidemics is the direct incorporation of data
to the model [52, 17, 54], The primary considerations for a statistical model are
the type and amount of data, and availability of covariate information. Limited
data availability can make the use of many of the more common deterministic
or stochastic models infeasible, since these models often have an abundance of
4


parameters. Statistical models typically have other considerations as well, such
as the difficulty of fitting models with nonstandard distributions, and the rela-
tive ease and benefit of fitting models with normally distributed data. In many
instances, parameters are transformed in an attempt to achieve a model with
normality or linearity, simplifying the calculations for the model. One of the
great benefits of statistical modeling is the ability to reduce the variability in
the parameter estimates through the use of covariates and the introduction of
other explanatory variables. However, transforming the parameter space and
including the explanatory variables and covariates can sometimes come at the
loss of direct interpretability. For example, some of the spatial disease models
employed now do not estimate transmission rates directly, but instead model the
log-relative risk, since the transformed space can be modeled as a Gaussian au-
toregressive process [17, 54]. Through the use of Bayesian hierarchical modeling,
the modeler can develop extremely complex but interpretable models. Bayesian
hierarchical models relate multiple parameters that are connected by the struc-
ture of the problem. For statistical epidemic models, a priori knowledge of the
qualitative behaviour of the disease is of less concern than for deterministic or
stochastic models. The ease with which the data can be used to fit the model
and the confidence of the parameter estimates are of greater concern. It is the
data that drive the modeling process.
In the body of this thesis, these three modeling paradigms are discussed in
more detail, with the emphasis on the Bayesian hierarchical modeling aspect
of statistical modeling. A dynamic spatial model for Chronic Wasting Disease
(CWD) in Rocky Mountain Mule deer is introduced. To date, a spatial statisti-
5


cal model for this disease has not been presented. This model is derived from a
system of differential equations, and describes the spatial dynamics directly in
the differential equations, rather than through correlations in the data. Because
the posterior distributions for the CWD model have some unusual characteris-
tics, a new Markov Chain Monte Carlo simulation technique is constructed that
utilized qualities from existing techniques to efficiently sample distributions that
live on bounded parameter spaces. The properties of the new MCMC technique
are examined and compared to existing methods. The model is fit using the new
MCMC technique. The results of the model are discussed and interpreted, and
both the modeling method and the simulation method are empirically validated.
Throughout the paper, [X] denotes the marginal distribution of the random
variable X, [X|T] denotes the conditional distribution of X given Y. The mul-
tivariate normal distribution with mean jj, and covariance matrix E is denoted
by N(n, E). For matrices, A~l denotes the inverse matrix, and AT denotes the
transpose.
6


2. Models of Epidemics
Epidemic modeling is an intensely interdisciplinary subject, drawing to-
gether a variety of mathematical and statistical disciplines. Modeling approaches
range from stochastic processes to deterministic models to empirical data anal-
ysis. These approaches are generally guided by biological considerations that
help to construct a model and to determine its effectiveness. The approaches
that are most important in the construction of a model for Chronic Wasting
Disease are deterministic and statistical models.
2.1 Deterministic Models of Epidemics
Deterministic models describe the behaviour of dynamic systems through
the use of ordinary or partial differential equations. The study of dynamics
systems has its origins in the mid-1600s, with Sir Isaac Newtons work on grav-
itation and planetary motion [75]. Newtons first work on the subject involved
the two-body problem of calculating the motion of the earth around the sun.
Later efforts at extending Newtons methods to the three-body problem- for
example, the motion of the sun, earth, and moon- led to an eventual realiza-
tion that some systems of differential equations were analytically unsolvable.
However, unsolvable models can still be analyzed qualitatively, and this char-
acteristic is why these models are still used in many applications. For detailed
discussions of the qualitative analysis of differential equations, see [83, 8, 75].
Perhaps the earliest use of mathematics to deterministically model a pop-
ulation can be traced to 1202, when Leonardo Fibonacci proposed the famous
7


sequence of integers to model a rabbit population [33]. The subjects first im-
portant early application did not surface until the exponential growth model
introduced in 1798 by Thomas Malthus. The exponential growth model re-
sults as the solution to Malthus constant rate of growth assumption, and is a
standard example of the simple separable differential equation:
(2.1)
where N is the size of the population, and r is the growth rate.
Malthus exponential model is a linear differential equation, and hence the
solution predicts unbounded growth or extinction, depending on the choice of
parameters and the initial conditions. Neither unbounded growth nor extinction
occurs in most populations, which typically tend towards a long term stable
population level, called the carrying capacity. Thus equation 2.1 is only useful
as a short term population model.
The need for a self-limiting population model led to the logistic equation for
growth, which was proposed initially by Pierre Verhulst in 1838 [63]. Verhulsts
model is more realistic than the Malthus model as a long term description of
population growth, and it incorporates a carrying capacity parameter K. The
logistic model is now one of the most commonly used models in mathematical
biology, and variations appear in many different applications. It is a relatively
simple nonlinear model, and it exhibits qualitative behaviour generally observed
in many populations. Verhulst suggested the model in 1838 to describe human
populations, although it was not until 1920 that his work received notice, when
Raymond Pearl searched for a model which has solutions characteristic of a
sigmoidal curve and empirically constructed Verhulsts logistic equation [56].
8


However, the logistic model was not derived analytically until 1925 by Alfred
James Lotka [45]. A nice derivation of the logistic model is provided in [63].
The logistic model with its sigmoidal curve will be the starting point for the
Chronic Wasting Disease model presented in this thesis. The logistic model can
be written as:
dN
j- = rN(N K), (2.2)
where K is the carrying capacity, r > 0 is the constant growth rate, and N is the
size of the population. The form of equation 2.2 is relatively simple and leads to
an explicit nonlinear model for population growth. The model assumes that the
growth rate r is constant, as is the carrying capacity K. These assumptions are
somewhat simplistic, in that growth rates typically vary with weather conditions,
by regions, or due to other factors. The model can be extended in a variety of
ways to include these factors. One implied property of the simple logistic model
is that new members of the population immediately contribute to the growth,
which is not true for populations that reproduce sexually. Members of these
populations must reach a certain age before they are able to reproduce. There
are several ways of incorporating this property into the differential equation 2.2.
One method is by adding a time-delay term to 2.2, which leads to equation 2.3.
*E = rN(N{t-td)-K). (2.3)
Though somewhat crude, the logistic equation 2.2 is useful in providing a
qualitative understanding of simple growth of a single population under limited
resources. One generalization of the logistic model that is important to epidemi-
ology is the class of models broadly called competition models. These models
9


describe the interaction of two or more species, and are a simple extension of
the one-species logistic model. The general two species model is:
tt ~ Ni(r\ -t- aiiNi -I- a\2N2) (2.4)
at
N2{r2 + a2i N\ + a22N2),
where Ni, is the size of population one, N2 is the size of population two, ai2
and a2i are interaction terms representing the competition between species for
limited resources, the parameters on and a22 represent the within species com-
petition, and ri and r2 are the net birth/death rates for each species. The
model 2.4 is extremely versatile for describing two species interactions, and can
easily be extended to three or more species. Because each of the six parameters
can be zero, positive, or negative, for no effect, enhancement, or impedance, re-
spectively, there are a total 36 = 729 variations for the two species model alone,
and each combination leads to a different configuration with a different interpre-
tation [63]. The focus here is on two species models, both because the numbers
of variations are even larger for more species (33n for n species), and because
they are of greatest importance in epidemic modeling. Various configurations
for the two species model using the two by four array 2.5
sign(ri) | sign(an) sign(a12) ^
^sign(r2) | sign(a2x) sign(a22).^
It should be noted, however, that not every mathematical model has a biological
interpretation. The variations of 2.4 and 2.5 with the greatest significance are
those that describe biological competition (species competing for the same lim-
ited resource), scavenging, symbiosis, and most importantly for epidemiology,
2.5
10


predator-prey relationships. A good examination of several competition models
is provided by [63] and [79].
The predator-prey relationship is represented by the array 2.6, with N\ being
the number of prey and N2 being the number of predators.
Setting an = 022 = 0 removes the within species competition, so that the growth
of the predators is limited only by the availability of prey, and the growth
of the prey is limited only by the number of predators. The first predator-
prey competition model was constructed independently by Alfred Lotka in 1925
[45] and Vito Volterra in 1926 [77]. The Lotka-Volterra model is important to
epidemic models and will be developed in more detail. There are, in fact, several
kinds of predation scenarios. These include parasitism and cannibalism, but the
most common is that in which one species feeds upon another.
Equation 2.7 shows the Lotka-Volterra model, where all coefficients are assumed
to be positive. In the absence of predators {N2), the prey (N\) will multiply
indefinitely at rate rq, while in the absence of prey, the predators will die off at
rate r2. The predators kill off the prey at rate 61, while the skill of the predators
in catching their prey is represented by b2.
Qualitative analysis of the Lotka-Volterra model reveals that for any initial
position (Vi(0), N2(0)), the model gives a family of closed curves, which implies
(2.6)
^ = N1(r1 + b1N2)
(2.7)
~7i~ A^2( r2 + b2Ni).
at
11


cyclic behaviour [63]. This means that as the number of prey increases, the
number of predators will increase, until the prey are killed off faster than they
reproduce. Then the predators, lacking food, will decrease also, and the cycle
will then repeat itself.
As mentioned before, most variations on the two species competition model
have no biological interpretation. To construct a workable model for epidemics,
however, the Lotka-Volterra model only needs to be altered slightly. Let S = Ni
be the number of susceptible individuals in the population, and let I = N2 be
the number of infected individuals (infectives). For a simple epidemic, the birth
rate of susceptibles, 77, is set to zero. Setting b\ = b2 (3 > 0, and r2 = 7 > 0,
the Lotka-Volterra epidemic model is:
dS_
dt
d'I
dt
= -/3SI
= I3SI 71.
(2.8)
Equation 2.8 is a special case of the predator prey model 2.7. The funda-
mental difference lies in the interpretation of the parameters. Since the death
of a susceptible results in the birth of an infective, for this model, ft can be
interpreted as the infection rate. The parameter 7 is interpreted as the death
rate or removal rate for the disease. The Lotka-Volterra epidemic model assumes
that the population experiences homogeneous mixing, that is, every contact pair
individuals in the population is equally likely to meet. This model does not
allow for any new susceptibles to be introduced to the system, so the time frame
is necessarily short. Thus the disease modeled by equation 2.8 will eventually
burn out or run its course, since there are no new susceptibles being added to
12


the population. When only the general dynamics of the disease are of interest,
it is not uncommon to scale both parts of the equation by the total popula-
tion N(t), so that both S and I represent the proportion of the total that is
susceptible and infective, respectively.
Model 2.8 can also be represented concisely using a flow chart or compart-
mental diagram, which shows each class and indicates the rates at which individ-
uals flow into and out of a particular class. Thus equation 2.8 can be represented
as:
gfiS I > j~yl
The arrow between S and I indicates that individuals are flowing out of the
susceptible class and into the infective class at a rate of /SSI, and out of the
infective class at a rate of 71.
Equation 2.8 is a simple model of an epidemic that occurs over a relatively
short period of time, such as influenza. The period over which the disease evolves
is short enough that the birth rate of the susceptible population is negligible. If
the disease has a longer incubation period, a net birth term a can be added for
susceptibles. This yields the equation:
^ = aS (3SI (2.9)
at
fr?31--*1-
Using the compartmental description, we have:
aS
Equation 2.9 allows the disease to evolve over a longer period of time. This
enables exploration of deeper questions about a disease, such as whether or not
13


the disease has long term sustainability. If the growth of the disease depends on
both susceptibles and infectives, and if the infection rate is too high compared to
the birth rate a, then susceptibles will be removed faster than they are replaced,
causing S to approach zero, thus the number of infectives will also reach zero
as there are no susceptibles left left to infect. Since the infectives die off faster
than they can be replaced, the disease will burn out. Similarly, if the death
rate of the disease, 7, is too high compared to the infection rate, then infectives
will die off faster than they can be replaced by newly infected individuals, again
causing the disease to burn out.
There are many examples of diseases that are unable to sustain themselves
over long periods of time. Measles, for example, will typically run its course
too quickly to sustain itself long term, although other factors such as human
response to control the disease, affect this as well. More deadly diseases also
fall into this category. One example is Ebola, which causes hemorrhagic fever,
leading to massive internal and external bleeding in its victims. The course of
infection is 2 to 21 days, ending in death for up to ninety percent of victims.
So far, the disease has progressed so quickly that it has been unable to sustain
itself in human populations for long periods of time.
Some infectious diseases can sustain themselves, however. One example is
Acquired Immunodeficiency Syndrome (AIDS) caused by the human immunod-
eficiency virus (HIV). HIV can have a lengthy incubation period, and is spread
only through fluid contact with mucous membranes or broken skin. The dis-
ease can take years to run its course, during which time the carrier can infect
others. The relationship between the infection rate and the death rate are such
14


that the disease is easily able to sustain itself. Other examples of sustainable
diseases include non-lethal infectious diseases such as influenza or the common
cold. These diseases run their course over a period of weeks, and the removal
rate is low. However, the fact that these diseases do not typically kill requires
an alteration to equation 2.8.
Equation 2.8 is the simplest of the SEIR class of epidemic models. These
initials stand for susceptible, exposed, infective, and removed (or quarantined).
The addition of the other classes gives these models a wide range of descriptive
capability, allowing the description of non-deadly diseases as well. For example,
the Kermack-McKendrick model adds a quarantined class to describe diseases
that are not deadly or for which victims build immunity [33]. As before, this
model imposes the restriction that exposed susceptibles immediately become
infected. Thus, the system of differential equations is:
§=-*s/
dQ ,
dT = 7'r'
(2.10)
Infectives no longer necessarily die, but once they recover from the disease they
are no longer susceptible and become quarantined. This model can also account
for the situation where a disease management strategy is in place, and infected
individuals are removed from the population and isolated. Once in the quar-
antined group, they are no longer considered susceptible, either because they
are immune to the disease, or because they are isolated. Model 2.10 can be
15


represented using the diagram:
g-fiSI^pI^Q
Because there are no additions to the susceptible class, the time period for this
model is short, and the epidemic eventually passes. The addition of a birth rate
is required to describe a disease that would last a longer period of time, possibly
reaching a sustainable level. Because quarantine models typically assume that
an epidemic is in progress and that a control strategy is being applied, it does
not include a birth rate.
A variation on this model includes an exposed class. This model was initially
proposed for measles, where exposed susceptibles do not immediately become
infectives. This occurs when a disease has an incubation period from the point
when an individual is exposed to the disease to the point when that individual
develops the disease [33]. The model can also include a birth rate for suscepti-
bles, and allow for long term sustainability. Let r be the length of the incubation
period, and let a be the duration of the disease. Then the model is:
a^s-psi^Et^f->r_
Even if the incubation period is constant, the resulting differential equations are
difficult to solve analytically, and must be approximated numerically. The fixed
values of r and a lead to time-delay differential equations, which can be difficult
to solve analytically. An in depth analysis of this model is given in [33].
Epidemiologists and public health officials are not only concerned with how
a disease evolves temporally, but also how it evolves spatially. From the earliest
times, people have been aware that there was a spatial component to disease.
16


During the Black Death epidemic of fourteenth century Europe, residents of
urban areas where the plague had struck would flee to the country, hoping that
the isolation would save them from the disease. Public health professionals are
aware that it is just as important to know how a disease spreads geographically
as it is to know the diseases dynamic characteristics- infection rate, etc.
Spatial models of epidemics describe the spread of the disease across a ge-
ographic region. There are several ways of modeling the spread within the
context of differential equations models. Without discrete spatial units, a typ-
ical method of spatial modeling is through a stochastic diffusion process model
[51, 52, 63]. Non-discrete spatial systems can also be modeled deterministically,
using partial differential equations similar to those used in modeling reactive
contaminant transport in porous media [18]. These equations are deterministic
analogs to stochastic diffusion models.
When the spatial regions are discretized, a model can be built by first con-
structing an epidemic model such as equation 2.9 within each spatial unit, and
then adding a term in each equation that models the interaction between that
unit and every other unit. If there are i = l,...,iV discrete spatial units, then
for each unit i:
^ - PM + Siiun + ... + uiN) (2.11)
~ PiSJi ~ Jili + Ii(Vn -t- ... + Vix)
where u,j is the net migration of susceptibles from unit j to unit i, and Vij is the
net migration of infectives from unit j to unit i [63]. These models are relatively
easy to construct, but the addition of the migration rates, Uij and make the
17


solution to the system of equations analytically intractable.
Even though an analytical solution is not available in most cases, equa-
tion 2.11 lends itself to numerical solutions or approximation by difference equa-
tions. The first-order Euler approximation is the easiest approximation to use,
though higher-order approximations also exist. A discussion of most approxi-
mation methods can be found in [83]. Numerical approximations are typically
designed with the assumption that the parameters in the differential equation
are known, and the goal is to generate values of the process. This is the re-
verse of the statistical setting, where, in most cases, the data consists of discrete
observations of the process, with the parameters unknown. The focus in this
thesis is therefore on difference equation approaches, because these more easily
fit with data types typical of epidemics.
2.2 Bayesian Hierarchical Models of Epidemics
The Bayesian hierarchical modeling strategy has proven to be extremely
effective for modeling complex processes with multiple parameters [82, 78, 81].
Hierarchical models allow complex relationships between multiple parameters to
be separated into several levels. The hierarchical framework helps the analyst
to understand the underlying process linking to data to the model, and to de-
velop computational strategies to simulate the desired posterior distributions.
Except in rare cases, simulation is necessary. For an introduction to hierarchical
modeling, see [24].
The overall strategy of the Bayesian hierarchical model has been informally
but concisely outlined in [3] and [17]. The strategy provides a link- called the
process- between the observed data and the unobserved parameters of interest,
18


denoted by 9.
[Data|Process, 9]
[Process 19]
[*]
(2.12)
The top layer is the observed data, which is modeled by an appropriate likelihood
function. It is assumed that the data is generated by an unknown process, for
example, an epidemic process. The process depends on unobserved parameters,
as shown in the middle layer. It is in this middle layer that the art of statistical
modeling takes place. On the bottom layer are the prior distributions that
represent a priori beliefs about the parameters.
Typically interest centers on the joint posterior distribution of the parame-
ters, that is, the conditional distribution of the parameters, given the observed
data.
[0|Data,Process] oc [0] [Process|0][ Data|Process,0] (2-13)
The posterior distribution, shown in equation 2.13, is found using Bayes Theo-
rem.
Modern disease data typically comes in the form of counts. For human
diseases, the data are numbers of occurrences per unit time. If the data has
a spatial component, then it is grouped by spatial unit as well. This type of
data is often modeled using Poisson random variables. Disease data can also be
binomial in nature, that is, number of positive cases out of a total. This type of
data can also be grouped by spatial and temporal units. The groupings in each
case may not be time and space. For example, the data could be grouped by
19


time and age [5].
Works such as [17], [54] and [25] implement the Bayesian hierarchical strat-
egy for an epidemic model where the spatial dependency is modeled as a Markov
random field. These models deal with areal data- that is, data aggregated over
discrete spatial regions. Data consists of the number of occurrences of a disease
in a spatial unit, aggregated over some unit of time as well. The count data is
modeled as a Poisson random variable:
[yit\Ei, zit] ~ Poisson^e***), (2.14)
where E, is the expected number of occurrences in spatial unit i, and Za is the
log-relative risk, which accounts for the deviation from the expected number of
cases. The expected number of occurrences per unit area, Ei, can be obtained
from overall death rates or prior, expert knowledge of the disease [50].
Because Poisson data cannot easily generalized to a multivariate setting and
thus cannot directly incorporate spatial effects, the spatial and temporal process
is often modeled in Za- Covariate information, denoted by xtt, is added via a
linear model. This yields:
zit = xjta + sit (2.15)
where sit follows a multivariate first-order autoregressive Gaussian time series,
defined by:
s t Hst-i et. (2.16)
The term e< is a multivariate noise term, with et ~ ,/V(0,£). The model in
equations 2.14, 2.15, and 2.16, has been used successfully to model spatially
20


dependent count data from a variety of epidemics [54, 16, 17, 50, 78] and other
sources [34].
The dynamic nature of epidemics, especially the change from growth to de-
cline to stability, has resulted in a variation of the previous model to describe
an influenza outbreak by [54, 17]. The term et is altered to become the epi-
demic forcing term, causing an increase and decrease over the course of the
epidemic. This is done by introducing another parameter Pp(t) as the mean in
the distribution of et.
Suppose that over a time interval to < ti < t2 the number of occurrences
of a disease grows from a state of stability to a full fledged epidemic, and then
declines down to a stable state as the epidemic burns out 2.2. As the epidemic
takes off and increases in intensity, the parameter (3p(t) moves from the value
Po to After it reaches its peak and begins to subside, Pp(t) changes from Pi
to p2. Finally, as the epidemic passes, the value of Pp(t) returns to Po. p(t) is
an index parameter indicating the stage of the epidemic. p(t) = 0,1, or 2 for
stability, growth, or decline, respectively. Thus, Pp(t) is:

Pp(t) <
\
Po, t < t0
P\ to < t < ti
P2 u ^ £ < t2
Po, t>t2
Using Pp(t), the error term is then et ~ N(PP(t)l, £). This has the result of
turning the epidemic on and off.
The spatial dependencies in the data are built into the structures of the
matrices H and E. This covariance matrix has the form E = a2(I cf>C)~lM,
21


where M is a diagonal matrix containing conditional variances, C is a matrix
of partial-regression coefficients with zeros on the diagonal, and (j> is chosen so
that £ is positive definite. The form of the covariance matrix is common to the
conditional autoregressive spatial models discussed in [16, 4], The structure of
the covariance matrix is found by making use of the Markovian property of the
spatial dependency. For this particular model, E~l are the diagonal entries of
M, and the entries of the matrix C are given by c^- = (§i)1/^2, and zero for i ^ j
[17, 54].
Spatial structure is also built into the matrix of autoregression parameters,
H. This is done by choosing a neighborhood structure for the model. For a given
spatial unit i, it must be determined which of the other units j are neighbors of
unit i, and whether they are first order, second, third, etc. neighbors. For those
units that are not in the neighborhood of i, hij is set to zero. For the influenza
model in [54, 17], the neighborhood structure is chosen to include up to second
order neighbors, and ha r)0, htj = r]i if units i and j are first order neighbors,
and = 772 if units i and j are second order neighbors. The parameters rjk are
then transformed so that a Gaussian prior distribution may be used. To finish
the modeling process, [54, 17, 78] choose appropriate prior distributions for the
remaining parameters, and simulate the resulting posterior distribution.
In the above model, the spatial dynamics of the disease are modeled through
correlations in the transformed parameter space, or by spatial dependencies in-
cluded in the autoregressive coefficients, again on the transformed space. This
approach is convenient from a statistical point of view. The transformed param-
eters are normally distributed, making computation and analysis of the spatial
22


and temporal dependencies easier. Standard differential equation models de-
scribe the dynamics of an epidemic process in terms of numbers of susceptibles
and infectives. Counts are modeled using Poisson or Binomial random variables,
and there are no common multivariate generalizations of the binomial or Poisson
distribution.
For binomial count data, the general strategy is the same. Suppose that
Nn is the total number of individuals in spatial unit i at time t, and let p,( be
the probability that a given individual has the disease. Let yit be the observed
number (out of No) that have the disease. Then we can use the model
[Vit\NiuPit\ ~ Bin(Nit,pit). (2.17)
This type of data is commonly modeled using autologistic models. An example is
a model for prostate cancer given in [5]. Using the transformation £it = ln(^^)
yields:
iit + 9i + + V7it + (2-18)
where 9,, 9t, and ipu are observed covariates that explain temporal, spatial, and
interaction effects. The error term tn follows a normal distribution, with mean
zero and variance v.
The autologistic model can be generalized to include other covariate infor-
mation in the equation 2.18. The error term can also be modeled as a Gaussian
autoregressive process, similar to 2.16. Let et be the error term aggregated over
spatial regions, so that:
et Het-i + i'I, (2.19)
23


so that ef ~ N(Het-i,iyI), and the matrix of autoregression parameters, H,
includes spatial structure as in 2.16.
Both of these models share the general strategy of modeling spatial and
temporal dynamics indirectly by transforming to a convenient parameter space.
While this can be effective, as in [17, 54, 78], it does not directly answer the ques-
tion of how fast is the disease moving? The Poisson conditional autoregressive
models of [78], [17], and [54] use the Poisson approximation to the Binomial dis-
tribution and therefore rely on large sample sizes and relatively small infection
rates. The practicality of these types of models is examined in [28].
Hierarchical Poisson models have been used to model domestic animal dis-
eases as well. An example is a hierarchical model for clinical mastitis in herds
of dairy cattle given in [64], Let Xj be the number of cases of mastitis in herd
i. The hierarchical specification is:
Xi Poisson(Xi) (2.20)
AjGamma(a, Pi)
Pi ~ Gamma(a, b)
where A* is the underlying infection rate in herd i, and Pi is the spatial (herd)
explanatory variable, and a, b, and a are hyperparameters. As in the influenza
and prostate cancer models described above, a conditional autoregressive model
could be employed for the infection rate. A Poisson model is appropriate in
this situation, because the number of cattle that are infected can be accurately
determined. For epidemic models of wild animals, this is not usually the case.
24


Modeling epidemics in wild animal populations presents difficulties that hu-
man epidemic models do not ordinarily have. The largest difficulty is the fact
that cases of a disease are not reported. Accurate data on the number of in-
fected individuals is rarely available for wild animal populations, so researchers
instead use prevalence data. Prevalence data consists of the number of animals
that test positive for a disease, M, out of the total number tested, N. Thus a
binomial distribution is natural.
For the Chronic Wasting Disease prevalence data, sample sizes are small,
and so a Poisson approximation with its convenient transformation to normality
cannot be employed. Additionally, there is not a large amount of specific data
on transmission rates. What is available, in addition to the prevalence data,
is a qualitative understanding of the behaviour of CWD. Using this general
understanding of the disease, a differential equations model that has the same
qualitative behaviour can be derived. This ODE model will be the basis for a
statistical model that directly uses the available prevalence data via a Bayesian
Hierarchy.
25


3. A Spatial Model for Chronic Wasting Disease
Chronic Wasting Disease (CWD) is a transmissible spongiform encephalopa-
thy (TSE) that occurs in wild cervid populations in North America. It is found
in both mule deer (Cervus odocoileus spp.) and the North American elk, also
known as the wapiti (Cervus elaphus nelsoni) [49, 27]. CWD is related to
other TSEs found in domestic animals, such as scrapie in sheep and bovine
spongiform encephalopathy, commonly called mad cow disease [49]. It is also
related to TSEs found in humans, such as classic Creutzfelt-Jacob disease and
variant Creutzfelt-Jacob disease. The exact agent that causes CWD has not yet
been positively identified, and the method of transmission is currently unknown.
However, the disease is accompanied by the presence of mutant proteins called
prions in the nervous tissue[14, 49, 27, 31, 32].
Chronic Wasting Disease is always fatal, causing damage to portions of the
brain in infected animals. Animals with CWD show a progressive loss of body
condition, behavioral changes, and eventually death [14, 27]. The course of
infection in mule deer appears to include both latent and clinical periods, and
spans from 18 to 36 months. Once clinical signs appear, few deer survive more
than 12 months [27].
Thus far, chronic wasting disease has been far more common in deer than in
elk. It has been estimated that in infected regions, between 5% to 6% of mule
deer have CWD, compared to only about 1% in elk [27, 14], although recent
data suggests that the infection rate in elk may be increasing [14]. While both
26


mule deer and white-tail deer can contract the disease, in the Rocky Moun-
tain region the epidemic seems to be focused in those areas where mule deer
(Odocoileus hemionus) are common and few, if any, white-tailed deer are found.
Consequently, research and data collection on the dynamics of the disease in
the Rocky Mountain area have focused on mule deer [27, 31]. The research that
has been done on white-tailed deer suggests that the species responds similarly
to CWD, but a clustering effect is more likely, since white-tailed deer are more
social than mule deer.
Chronic Wasting Disease was first found near Fort Collins, Colorado, in
1967, and has been spreading over the past two decades. The disease has been
making its way across Colorado and has recently crossed the continental divide
[14]. The incidents of the disease in Central and Western Colorado have been
isolated thus far, although they are occurring with greater frequency. Currently,
the disease appears most concentrated in the region encompassed by Northeast-
ern Colorado and Southeastern Wyoming, although there have been recent hot
spots in Wisconsin, South Dakota, Nebraska, and a few reported cases in Utah
and New Mexico.
Since hunting is such big business in many states, the potential consequences
of the CWD epidemic are great. A recent article in the October 24, 2003, edition
of the Denver Post reported that hunting added $599 million to the economy of
the state of Colorado in 2002. The same article estimated the potential economic
impact of CWD nationwide to be $100 billion. Thus, Chronic Wasting Disease
is recognized nationally as a potentially disastrous problem, and every state
except Hawaii has implemented a CWD surveillance program.
27


Current efforts for modeling the spread of chronic wasting disease have fo-
cused on the simulation of individual interactions between deer using standard
epidemic models such as a birth-death process [27] and modeling the spread as
a diffusion process [31], and these models have provided insight into the disease
dynamics. The disease has been spreading throughout the Rocky Mountain Re-
gion for more than 30 years, but a statistical model with spatial components
has yet to be presented.
There are several standard differential equations that are commonly used as
epidemic models. These are described previously in this thesis in section 2.1.
One such is the adaptation of the Lotka-Volterra predator-prey model (equa-
tion 2.9) to diseases given in [63] and [33]. Some of these variants have been
applied to chronic wasting disease because they exhibit the qualitative behavior
observed in many epidemics [27, 32], The differential equations used by [27],
[31], and [32], implicitly model disease dynamics based on individual interac-
tions between deer and very small time steps. However, most available data
for these models are aggregated over relatively large time steps and large ar-
eas. These authors use stochastic forward integration as the primary tool for
understanding and exploring the spatio-temporal dynamics. The link between
the mathematical model and the data is indirect.
The correlated Poisson model of [78], [17], and [54], discussed in section 2.2,
is effective in modeling the spatial spread of human diseases such as influenza,
but application to CWD is hindered by fundamental differences in data. The
models discussed in section 2.2 rely on the Poisson approximation to the Bino-
mial distribution and therefore implicitly rely on large sample sizes and relatively
28


small infection rates [78, 17, 54]. It is extremely difficult and expensive to obtain
prevalence data on a wildlife population, so models for Chronic Wasting Disease
need be to appropriate for small samples.
A Bayesian hierarchical model is constructed for chronic wasting disease
that directly integrates mathematical models related to those of [27], [31] with
a model for observations taken on CWD prevalence. Critical to the model is
a difference equation approximation to traditional differential equation models
of prevalence to model dynamics in time. A mechanism is included that allows
for spatial mixing between regions of interest. Available data is incorporated
through a Bayesian hierarchical framework, thus linking the data and mathe-
matical model. This hierarchical approach integrates data and epidemic theory
into a single model, is capable under the constraint of small data sets, and
provides a mechanism for prediction and the study of data collection designs.
3.1 Data
Data for CWD is economically and ecologically expensive to obtain since
the testing procedure, until recently, required sacrificing the animal [14, 32],
Prevalence data consists of the numbers of deer that are tested for CWD in
each DAU, and the number of those tested that are found to have the disease
[49] and [27]. The testing procedure used to determine whether or not an animal
is infected is considered to be extremely accurate, and is performed in highly
controlled conditions [14]. Consequently, the false positive and false negative
rates for the test were assumed to be small enough to be neglected. Currently,
the data is predominantly obtained from hunters, who submit harvested animals
for testing. These submissions are required by the Colorado Division of Wildlife
29


in the endemic areas. For animals taken outside the endemic area, the animals
may be submitted voluntarily but are subject to a $15 testing fee. Because the
type of deer a hunter may take is restricted by age, and in most areas gender,
harvest data may provide a biased view of the disease [15]. Determining these
biases requires more detailed data than is available here. Nevertheless, the
model presented here is valid for the population of huntable deer, although the
definition of huntable can change from region to region.
Wildlife biologists divide the state of Colorado into geographic regions called
data analysis units (DAUs). Different species of animals have different DAU
definitions, so that the spatial units are specific for different species of animals.
For example, deer and elk have different DAUs. Since the focus of the model
presented here is mule deer, DAU will refer only to the data analysis units for
deer. These DAUs can be seen in Figure 3.1. The data shows that there are 13
DAUs where chronic wasting disease is known to exist as of the 2002 hunting
season. These are DAUs 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 17, 27, and 44 in Figure 3.1.
Because the disease has not yet spread to much of the state, the majority of the
data is from an endemic region located in the northeastern region of the state
of Colorado, consisting of DAUs 4, 5, 10, 27, and 44 [14]. While there have been
a number of cases from units surrounding the endemic region, the Division of
Wildlife believes that the disease is still controllable in the non-endemic areas.
It is thought that the disease will be a permanent presence in the endemic region
[14, 32],
The prevalence data from Colorado was collected over 27 years, from 1976
to 2002. The data were obtained from the Colorado Division of Wildlife, and
30


Figure 3.1: Colorado Deer Data Analysis Units across the state. The larger
spatial units are the DAUs, the smaller units are game management units.
each DAU-year value consists of two numbers M and N, where N is the number
of deer that were tested for CWD and M is the number that were found to be
infected. Because the disease was only recently recognized as an epidemic, much
of the data is from the endemic region and is most complete from the years 1996-
2002. Prior to 1996 the data was collected sporadically. The Colorado Division
of Wildlife has recently made the testing results for the 2002 and 2003 hunting
seasons available publicly [14].
3.2 Prevalence Model for Chronic Wasting Disease
31


The approach presented in this thesis utilizes a system of differential equa-
tions which exhibit the appropriate qualitative temporal and spatial behaviour.
Solutions to this system are approximated using a set of difference equations on
a scale appropriate for the data. Through a hierarchical Bayesian construction,
the data are used to estimate the unknown parameters that govern these dif-
ference equations. One fundamentally different aspect of this approach is the
explicit modeling of spatial dynamics in the differential equations, rather than
indirectly attempting to capture the effect using spatial correlations typical in
statistical models.
3.2.1 Model Motivation
The logistic model 2.2 discussed in section 2.1 is commonly used to model
biological populations [63]. The model is simple, but applicable to a surprising
number of biological applications. The logistic model forms the basis for a
Chronic Wasting Disease model.
For a single population, the logistic model 2.2 exhibits the desirable charac-
teristic of having two equilibrium points, at iV = 0 and N = K. A fixed point
or equilibrium point for the differential equation ^- = /(pf) is a point p* such
that f ip*) = 0. These points represent a steady state, since if the system
begins at p* it remains there for all time. Fixed points can be either stable, so
that if the system is perturbed off of the fixed point it will eventually return to
the steady state, or unstable, so that if the system is perturbed off of the fixed
point it will not return. The carrying capacity, K, represents a maximum sus-
tainable population. As the population size approaches this value, the growth
will slow. Analysis shows that the point JV = 0 is an unstable equilibrium point,
32


and N = K is a stable equilibrium point. Thus, if the initial value of the pop-
ulation is greater than the carrying capacity, N > K, then the population will
decrease back towards this stable point. Likewise, if 0 < N < K, then N is
increasing. This model is easily generalized to a spatial model by aggregating
over discrete spatial units and including migration terms [63],
A differential equation for the prevalence of a disease based on the logistic
model can be formulated by rescaling by K [33, 63]. Rescaling by the carrying
capacity results in a differential equation with the same basic properties as
equation 2.2, but which lives on the interval [0,1]. Beginning with equation 2.2,
divide by the maximum population K. This gives a logistic equation modified
for prevalence, restricted to the unit interval. Let p denote the proportion of
deer in a given DAU that are infected. Then,
^ = ap( 1 p) (3.1)
is a model for the change in prevalence over time.
Let pkt denote the proportion of infected deer in DAU*, for time period t and
pt be the vector of prevalences aggregated over all DAUs for a given year t. For
now, the focus is on the observed qualitative behaviour within a single DAU,
and the spatial dynamics will be added later. From historical observations and
recent studies of CWD [49, 27], it is known that CWD takes approximately 3
years from infection to the death of the animal. For a disease with an incubation
period this long, it is a reasonable supposition that a sustainable level of the
disease is possible. Additionally, if the disease is in fact caused by environmental
sources or carried by another animal, this would indicate that a sustainable
level is likely, since an external carrier could potentially be unaffected by the
33


disease. A classic example of an external host is the flea that carries the plague,
aiding in the spread of the disease and enabling the disease to sustain itself.
Furthermore, CWD has yet to show signs of elimination even in those areas
where the disease has been present the longest, giving further support to the
sustainability assumption. Another reasonable assumption is that if the disease
is not present in a given area, i.e. if pkt 0, then the proportion of deer that
are infected will stay at zero, unless infected deer from another area migrate
in (although this implies that deer are the vector for the infection, and not an
environmental source, and it is not known for certain that this is the case).
These two assumptions seem, on the surface, to be supported by the current
understanding of the disease. They provide support for the choice of the logistic
model 3.1, which has these qualitative characteristics.
The parameter a in equation 3.1 represents a kind of acceleration for the
disease, and can be interpreted as an infection rate. Fixing a as a constant
restricts the dynamics of the model by fixing the infection rate. Instead, by
choosing a = a(p,t) and following the basic tenet of the correlated Poisson
model discussed in section 2.2, more freedom is obtained for the model. This
type of freedom can be extremely useful in describing different types of disease
dynamics. One such dynamic is the observed behaviour in many epidemics of
fast growth in the initial stages of the epidemic, followed by a decline in the
latter stages of the epidemic. The parameter a(p, t) plays a role similar to that
of the control parameter f3p(t) in the influenza model of [17, 54] discussed in
section 2.2.
34


The form of a = a(p,t) must be chosen to provide the appropriate be-
haviour. It can be designed to account for a variety of effects, such as spatial
and seasonal effects. With appropriate data, explanatory covariates can also be
included in the expression for a(p,t). In this case, due to the small amount of
data, the desire for appropriate qualitative behaviour guides the form of a(p, t).
Making the assumption that there is a non-zero, long-term sustainable level for
the disease, set a* = a(5 pt). This acts as a varying acceleration. The ac-
celeration slows down the closer the proportion gets to 5. When pt = 5, the
acceleration term is zero, and the internal rate of change of the DAU is also
zero. The parameter 5 is the long term proportion of infected deer that the
DAU is able to sustain, and plays a role similar to that of the carrying capacity
in the logistic equation 2.2.
For a single DAU, the differential equation becomes:
^ = ap{5-p)(l p). (3.2)
With this modification, prevalence is increasing for levels below 5, and decreasing
for levels above 5. If 5 = 0, then the disease is killing off the existing infectives
faster than it is creating new infectives and there will be an eventual burn out
of the disease. For 5 = 1, then the disease will eventually infect all the deer.
When 0 < 5 < 1, the model has a long-run stable proportion of infectives at
level 5.
Figure 3.2 shows the graph of the derivative, p\ versus p for equation 3.2,
with 5 = 0.25. The equilibrium points, p* are on the p axis, where p' 0. To
graphically determine the nature of the equilibrium point, arrows are drawn to
the right for values p' > 0, and to the left for values p' < 0. When the arrows flow
35


Figure 3.2: Graph of the derivative of p versus p. Stable points (0,1/4) are on
the p axis.
away from each other the point, p*, is unstable, when the arrows flow towards
each other p* is stable. The graph shows that the equilibrium point p* = 0 is
unstable, while the other equilibrium point, p* = 5, is stable, supporting the
interpretation of the parameter 5 as a long term sustainable level. The idea can
be extended to several dimensions as well.
Having developed an appropriate model for the dynamics of prevalence in a
single DAU, the model is extended to multiple DAUs. This is done most directly
36


by replicating over the DAUs to produce the system of differential equations:
=a(<5l-pt)OPtO(l-Pt) (3-3)
where p( is the vector of prevalences, so that pt = (Pi,t,P2,t, ,P54,t). The symbol
O represents the Hadamard (element by element) matrix product. Importantly,
this system of differential equations allows for different prevalence values in each
DAU.
3.2.2 Incorporating Spatial Mixing
Although equation 3.3 is a relatively easy extension that has the appropriate
general temporal dynamics, it treats each DAU separately and does not take into
consideration any spatial dynamics. In equation 3.3, neighboring DAUs do not
affect each other, which is an unlikely scenario. It is known that the disease is
spreading across the state, so a spatial component for the model is needed.
The spatial dynamics are incorporated into the model by adding a spatial
mixing term to equation 3.3, yielding:
^ = c*(<$l pt) pt (1 pt) + Qpt> (3.4)
where Q is a matrix such that the values <7^ represent the instantaneous net
effect that the prevalence in DAU j has on the prevalence of DAU i. Thus the
rate of change in the prevalence over DAUs consists of an linear combination of
the internal dynamic processes governed by logistic equations, and the external
contributions by the other DAUs.
The multi-DAU spatial model is extremely difficult to solve analytically, so
a first-order Euler discretization is used to approximate the differential equation
37


by a difference equation. This discretization is a simple method of approximating
a derivative, and has the form f'(t) ~ _
Since the data is aggregated by year, the time step is assumed to be At = 1.
The difference equation for (3.4) is therefore:
Pt+i = a(5l pt) O Pt O (1 Pt) + Wpt- (3.5)
where IT is a spatial mixing matrix that indirectly represents deer migration,
and is a discrete version of the matrix Q in equation 3.4.
Equation 3.5 is the dynamic spatial model for Chronic Wasting Disease,
and the prevalence data is used to find the parameters for this equation. The
migration of deer changes prevalence in that a single infected deer migrating from
one DAU to another changes the makeup of prevalence for both the from and
to DAUs. However, the matrix W does not directly describe this movement of
deer, but instead describes the movement of the disease prevalence. The change
in prevalence for a given DAU consists of a mixture of the internal dynamic
processes modeled by the logistic equations, and the external contributions from
migration to and from neighboring DAUs. Including the migration matrix W
neither implies nor removes the possibility that deer are the transmission method
for the disease in the model.
The fixed points for the difference equation will be the same as those for the
corresponding differential equation, provided that W1 = 1, where 1 is a unitary
vector [75, 83], and therefore the parameters in the difference equation 3.5 have
the same interpretation as the parameters in the differential equation 3.4.
Various forms of the spatial mixing matrix W could be designed to incorpo-
rate elaborate schemes for disease transport. However, because the mechanism
38


by which the disease moves is currently unknown, two simple cases were con-
sidered. Assuming that the disease can only migrate between adjoining DAUs,
so that if DAUj and DAUj are neighbors, and DAUj and DAU*, are neighbors,
but DAUj and DAU* are not, then in a single time step (one year in this case)
migration occurs between i and j, and between j and k, but not between i and
k. This is a reasonable assumption supported by biological studies of the behav-
ior of deer in the wild and the spread of CWD in Colorado, and the manner in
which the state of Colorado has constructed the DAUs [14, 15, 49, 27, 31, 32],
Let Wij denote an element of the stochastic transition matrix W. For par-
simony, fix = 7 if DAU, and DAUj are first-order neighbors with a sig-
nificant proportion of boundary that touch. If DAUj and DAUj have less of
boundary that touch, or are almost touching, then W,j = u < 7. For DAUs
that are separated and hence are not neighbors, = 0. The diagonal values
of W are fixed so that row sums of W are equal to 1. Thus, the can be
loosely interpreted as the average proportion of deer that migrate from DAUj
to DAUj each year. The classification of first-order neighbors and second-order
neighbors was admittedly subjective. Tables 3.1 and 3.2 show the relationships
between the DAUs that were used.
In order to ensure that equation 3.5 produces values for pt that are inter-
pretable, certain restrictions are made on the parameters a, 5, 7, and uj. For
simplicity, let F(p, 77) = a(5lp)Op(lp)-t-IUp, where 77 = (a, 6,7, uj). The
parameters 7 and w are the nonzero elements of the matrix W. The relationship
in equation 3.5 is therefore written as pe+i = F'(pt, 77).
39


DAU Near Secondary DAU Near Secondary
1 2,6 7 28 33,45,46,47 48
2 1,3,4 8,9 29 24,52 -
3 2,4,9 8,10 30 35,36,52 25
4 2,3,5,10 9,44 31 32,35,36 37
5 4,44,54 - 32 31,34,45 33
6 1,7,11 - 33 28,45 32
7 6,8,11,41,42,43 1 34 16,32,37,45 15,26,48
8 7,9,14,43 2,3,15,16,53 35 30,31,36 37
9 3,8,10,17,27 2,4,16,38 36 25,26,30,31,35,37 -
10 4,9,27,44 3 37 26,34,36 15,31,35
11 6,7,18,41 12 38 16,17 9,50
12 13,41,51 11,18,19,42,43 39 20,21,40 25,51
13 12,15,22,51 42,43 40 19,25,39 24,51
Table 3.1: Neighborhood relationships between DAUs. Digits refer to DAU
number in Figure 3.1.
40


DAU Near Secondary DAU Near Secondary
14 8,43,53 - 41 7,11,12 42
15 13,16,22 8,26,34,37,53 42 7,43 12,13,41
16 15,34,38,48,50 8,9,45 43 7,8,14,42 12,13,53
17 9,38,44,49,50 48 44 5,10,17,54 4,27
18 11,19 12,23 45 28,32,33,34,48 16,46
19 18,23,24,40 12,51 46 28,47,48,49,54 45
20 21,39,51 22 47 28,46,54 -
21 20,22,25,39 - 48 16,45,46,49,50 17,28,34
22 13,15,21,25,26 20 49 17,27,46,48,54 50
23 19 18,24 50 16,17,48 38,49
24 19,29,52 23,40 51 12,13,20 19,39,40
25 21,22,26,36,40 30,39,52 52 24,29,30 25
26 22,25,36,37 15,34 53 14 8,15,43
27 9,10,49 44 54 5,44,46,47,49 -
Table 3.2: Neighborhood relationships between DAUs. Digits refer to DAU
number in Figure 3.1.
41


The parameters a, 5, 7, and u; are restricted so that 0 < a < min{4/(l
5)2,4/52}, 0 < 6 < 1 and 0 < 00 < 7 < l/max^^ Wij). With these
restrictions and W defined as above, there is the following theorem.
Theorem 3.2.1 Let a, 5, 7, and u be restricted as above. Then the update
function F has the following properties:
F(p,77)[0,l]* Vp[0,l]* (3.6)
F{0,r}) = 0 (3.7)
F(6l,ri) = 6l (3.8)
The first property (3.6) guarantees that the dynamic system will always return
prevalences between zero and one. The second property (3.7) implies that if
there is no disease in the system, the disease cannot propagate. The third (3.8)
implies that the system has a steady state where the disease maintains a stable
relationship with the deer.
PROOF: To show that these properties hold, first note that the form of W is
such that Wl = 1, WT1 1 and > 0. Thus, W51 = (51, and Wp 6 [0,1]^
whenever p [0,1]*". It remains to be shown that g(x) = x + ax(l x)(5 x) &
[0,1] whenever x [0,1]. Note that g(x) = x{\ + a.5 a(l + 5)x + ax2) = xq(x)
where q denotes the quadratic form. Now g has a root at x = 0 and may have
up to two more roots at the roots of q. If the roots of q are complex, then
g(x) 0 if and only if x = 0. It is easy to show that the roots of q are complex
for 0 < a < 4/(1 <5)2. Thus g(x) > 0 on [0,1] when 0 < a < 4/(1 <5)2. Let
h(x) 1 g{x) and note that g(x) < 1 whenever h(x) > 0. Factoring out (1 2;)
in h, argue as before to show that g{x) < 1 for x [0,1] whenever a < 4/62.
42


In what follows, let a (0,1) be the proportional acceleration and the ac-
celeration to be 4amin(l/(52,1/(1 6)2). Likewise, re-scale u and 7 to represent
the proportion of complete interchange, so that u and 7 are the probabilities
used in the spatial transition matrix W.
3.2.3 The Hierarchical Statistical Model
Because there are different sampling schemes for different stages of the
knowledge of disease prevalence, a simple modification is required. The change
amounts to modeling (location,time) observations from two possible distribu-
tions. One distribution corresponds to time-periods where a DAU has a large
number of deer tested for the disease. In this case, the effort behind the test-
ing is primarily geared towards determining and understanding prevalence. The
Division of Wildlife knows that the disease is present, wishes to collect as much
data as possible, and has the additional goal of protecting hunters from eating
infected animals. On the other hand, when the number of tested carcasses is
low, that corresponds with a time when the CDOW is opportunistically search-
ing for the disease in an area where it has not yet been found. This data comes in
the form of animals found already dead, and hunters in non-infected areas that
wish to pay for testing. The differences in searching for the disease add biases
which are explicitly included in the model to facilitate estimation. Furthermore,
model selection techniques are used to find a cutoff value which separates the
two processes.
43


The likelihood for the observation at DAUj at year t, denoted by Yit, is
modeled as one of two binomial distributions
Yit ~ B(Niupit) for Nit > C
~ B(Nit, 7r) for Nn < C. (3.9)
depending on some cutoff value C and where Nit is the number of tests completed
for DAUj at year t and pu is the ith component of pt- This duality is required
because of the different sampling schemes for different stages of the knowledge
of disease prevalence. The first distribution corresponds to time-periods where
a DAU has a large number of deer tested for the disease. In these cases, the
Division of Wildlife knows that the disease is present and wishes to protect
hunters from eating infected animals. On the other hand, DAU-years when the
number of tested carcasses is low correspond with the CDOW searching for the
disease in an area where it has not yet been found or confirmed problematic.
This approach is somewhat ad hoc, and only moderately reflect the true nature
of the sampling mechanism. Nonetheless, the modification provides sufficient
flexibility to allow the model to accurately describe the changing prevalences.
This modification essentially eliminates the data values below the cutoff from
the spatio-temporal model, as the posterior of n given all the data is independent
of the other parameters. Because of this independence, data that are used to
improve knowledge regarding 7T cannot be used to increase information about
the other parameters.
The binomial sampling scheme used here is a simplification that considers
the values Niit as fixed. Common practice is to consider these values as random,
44


perhaps modeling them as Poisson or binomial draws from the total population.
There are several reasons why a second layer of hierarchy was not added. First,
although populations estimates were available for most DAUs over a ten year
period, these estimates were considered unreliable. Population estimates are
prepared by game region, and the manner in which they are calculated varies by
region. Game regions can consist of several DAUs, and information about how
the game regions are structured was unavailable. Moreover, the method in which
the populations are estimated has changed over time, as new estimation tech-
niques are adopted, both because of scientific advancement and changes in game
region management. Information on what population estimation methods were
used was unavailable. Finally, the testing data was obtained predominantly from
harvested animals. These animals form a subset of the population, because of
strict regulations that govern which animals can be harvested and which cannot.
In most areas of Colorado, huntable deer consists only of male deer two years or
older [14]. There was no information available on what percentage or number
of deer are harvested each year, although the Colorado Division of Wildlife does
collect estimates on this information, grouped by game management units [14].
The simple sampling scheme that was used means that even though all of the
results presented in this thesis may not be applicable to the total population of
deer, they can still be reliably applied to the population of harvested deer.
Given an initial prevalence vector p0 and the parameters in 77, the values of
pt are fixed. It is important to note the difference between this approach and a
standard state space model [23]. In a state space model, the value Xt at time
t is equal to a function F(Xt-i,rf) plus error. That is, given the parameters 77
45


and the previous value Xt_i, the new value is random. In the approach followed
in this thesis, the value of Xt given the parameter values 77 and the initial value
A0 are deterministic. This approach also differs from the traditional state space
model by using binomial distributions for the observational equation, rather than
normal distributions. By using deterministic rather than stochastic updates, the
size of the parameter space is effectively reduced from 54 x 27 + 4 to 54 + 4. In
fact, the parameter space is smaller because many of the initial conditions (the
54 elements of the vector p0) axe forced to zero.
The likelihood for the prevalence data is given by,
54 27
£(M|p0, 7i, C) = J]n[pf(lPt)(7VC}
i=l t=l
x [7rMi(l 7r)^-M>]1<0<^c) (3.10)
where 1^ is 1 if condition A is true and zero otherwise, pu is the ith component
of pt, defined by recursion in equation 3.5, Nu is the number of deer from DAU*
that are tested in year t, Mu is the number of those that are infected, 7T is
the probability that a tested deer has CWD given that a small number of deer
are tested in that DAU-yeax, C is a cutoff which models the different sampling
schemes relative to ir and 77. The value C = 3 provided a reasonable fit to the
data, and will be discussed later. The posterior distribution of parameters is
proportional to
Po, 77, 7T ) oc £(Y | p0, 77, 7T, C ) <7(p0) 9(77) q(n) (3.11)
where q(-) denotes a prior distribution. For simplicity below, let 6 = (T7,po,7r)
denote the complete vector of unknown parameters.
46


This model has qualitative characteristics, through the deterministic differ-
ential equations, that are consistent with current knowledge and expectations of
CWD. It is also flexible enough to accommodate spatial mixing of prevalences
due to deer migration and possibly other factors. The model is formalized with
an Bayesian hierarchy, giving the added benefit of allowing for comparison of
different models, testing whether or not the disease is spreading spatially (if
7 > ui > 0) and whether or not there disease can coexist with the mule deer
population (if 0 < 5 < 1). The Bayesian hierarchy also provides a method of
checking sampling schemes that will maximize information from data collected
in years to come [37].
Prior distributions for a, 8 and it were all independent Beta distributions,
since each parameter in the model is bounded on [0,1]. The prior distributions
for the different parameters were chosen to reflect the current understanding of
CWD researchers and the statewide, large scale observations of the disease [14].
Information for the prior distributions comes from current research and expert
knowledge that is indirectly based on this and other data. Consequently, the
modeling strategy is not a formal Bayesian model, but instead has an empirical
Bayesian flavor.
The overall level of the disease in the endemic area is currently believed to
be between 8-13% [14]. While the disease still appears to be spreading spatially,
in those areas that have been infected the longest the level of the disease seems
to have stabilized. Consequently, the prior for 5 was chosen to be a Beta(8,82),
which has a mean of 0.09, or 9%., and a standard deviation of 0.0298.
47


Currently, there is little knowledge or expectations for the acceleration pa-
rameter a. Detailed study of the dynamics of Chronic Wasting Disease is rel-
atively recent but is ongoing [31, 32, 27]. Consequently, a uniform density on
[0,1] was chosen as the prior distribution to reflect this lack of information.
Because the DAUs are defined so that deer will tend to remain within their
DAU, the migration parameters 7 and uj are expected, a priori, to be small.
However, there is a lack of information available on the migration rate of the
disease [14], Consequently, the prior distribution for 7 and ui are chosen to be
Beta( 1, 50) and Beta( 1,100) respectively, subject to the constraints that 7 > w
and that max,-^^ W^} < 1.
According to the Colorado Division of Wildlife, the disease was rare before
1990 [14], CWD was unknown to non-experts in 1975 (t = 0), and no cases were
observed in wild populations prior to 1980. Consequently, the initial prevalence
values were expected to be extremely small. Additionally, because the disease
prevalence has only been observed increasing, those DAUs that are not currently
infected were not allowed to have the disease present at year zero. The prior
distributions for po were therefore either Beta(l, 200) or a point mass at zero.
3.3 Finding a Good Model
The first task in choosing a particular model is to determine a reasonable
value for the cutoff C in the likelihood (3.10). The joint posterior distribution
can be factored so that the posterior distribution of tt is independent of the
posterior for 77 and p0.
For a given N, the binomial distribution which accounts for more data
variability than any other binomial model is the distribution with p = 0.5.
48


Figure 3.3 shows the proportion of deer that test positive by year and the total
number tested for the three DAUs which have been infected the longest. There is
a large amount of variability in the early data, where the number of deer tested is
small. As the number of deer tested increases, the variability in the data appears
to decrease. Thus, a search is conducted for the value of C that yields a cutoff
level that has, as close as possible, half of the observations diseased and half
not. By inspecting the data it is found that this corresponds to a cutoff value of
C = 3. Thus in DAU-years where 4 or more deer are tested for Chronic Wasting
Disease, the modeled prevalence will be from the updates given by equation 3.5.
Otherwise, the prevalence is modeled as Bin(Nit,ir).
Endemic Area
Figure 3.3: Proportion of deer which test positive by year for DAUs 4, 5, and
10. The total number tested is shown in the background, also by year. The
proportion which test positive decreases as the number tested increases, as does
the variability.
49


Given the parameters a, 6, 7, ui and 7r, several model variations can be
formulated by determining how many, and which, of the elements of p0 are
non-zero. Because the data begins in 1976, the year 1975 is set as year 0.
The numbering scheme for the DAUs has nothing to do with Chronic Wasting
Disease, so it is necessary to impose an order on them that is sensible for CWD.
This is done by optimizing the posterior distribution over different arrangements
of DAUs to determine which DAU is likely to have the highest infection rate of
the disease down to which is least. There are a total of 54 DAUs, so to reduce
the search space the focus is only on those DAUs in the endemic area where
the disease is currently found. This assumption is reasonable since, at least
presently, the disease is still spreading, making it unlikely that it would have
disappeared from an area. The optimal ordering for the DAUs in the endemic
area is 10, 4, 5, 3, 17, 9, 54, 27, and 44.
The best model given this ordering can be determined using Bayes Factors
[24], Several sets of models are tested. The first set assumes that the disease
was present in only one DAU in 1975, and that all other DAUs had an initial
value of 0. This is accomplished by putting the prior distribution for p^o as a
point mass at zero for all DAUs except for the one of interest. The second set
of models assumes that two DAUs are infected at time 0 and all others are zero.
The pattern continues adding one DAU at a time, up to nine, the number that
are currently in the endemic region. Additionally, models were compared with
oj > 0 and ui = 0 to determine whether or not the second order migration term
was significant.
50


The different models are compared using Bayes factors. Bayes factors are
ratios of the marginal likelihood of the data under one model to the marginal
likelihood under another model. Hence:
nn/i, , x J C{Y\r],p0,-K,Mi)q(po,r],TT\Mi)dpodridn
3 /£(Y|77,p0,7r, A4J)<7(p0,T7,7r|.Mj)dpodT7d^
where Mj implies that the integration is with respect to the prior and likelihood
under model i [24]. Both the numerators and denominators of the Bayes factors
are approximated via a random sample of the prior distributions implied by the
two models Mi and Mj, respectively. For k 1,... K, draw the appropriate
collection of parameters, denoted by Oik, from the prior distribution imposed
by Mi. Let lik denote the evaluation of the log-likelihood implied by (3.10)
evaluated at 0ik and Wik = exp(£ik max*-^*}) and to*. = Yhkwik/K. Then
BF(fi/ti,Mlj') 6Xp{^jmax tj max}'UJi /'U)j..
The Bayes factor gives a weighting of the relative skill of one model compared
with another, competing, model. Under this criteria, it is found that the model
with P4,o,P5,0)Pio,o > 0, with all other DAUs having zero starting values, and
u 0 is as or more effective than other models with the same number or more
parameters. The estimated Bayes factor was 1.18 when comparing this model
with a similar model that also included non-zero initial values for DAUs 3, 17,
and 9. The next smallest Bayes factor (3.55) was for the comparison with the
model that forced p5jo = 0. The Bayes factors for comparison of the model with
P4,o, P5,0j Pio,o > 0 and ui = 0 and all remaining models were all greater than
100.
The results of the Bayes Factor comparisons are supported by the fact that
the first observed case of Chronic Wasting Disease occurred near Fort Collins,
51


Colorado, which is surrounded by DAUs 4, 5, and 10 [22], These DAUs are
currently the heart of the endemic area. Given this prior knowledge of the
disease, it is unlikely that it would have begun anywhere else.
The particular model that is chosen is that with dynamic parameters
t? = (a, (5,7),
and the initial conditions
Po = (Pio,o > P4,0 j P5,o) >
so that 0 = (77, Po)- Thus, it is necessary to find the posterior distribution of
[0|Y], where Y denotes the data. Because this distribution does not have a
convenient or known form, computational methods are needed to simulate the
posterior distribution.
3.4 Accept/Reject Methods for the CWD Model
The posterior distribution for this model depends on many nonlinear rela-
tionships, which makes direct analysis difficult. Consequently, understanding
the posterior is most easily achieved using a simulated random sample. The
posterior distribution of the CWD model was first sampled in [37] using an
acceptance sampler. The results of this sample illustrate the characteristics
of the posterior distribution, and will help to determine if a more adaptable
Markov chain simulation is sampling from the correct density. Developing a
usable Markov chain simulation might allow for a more complete exploration of
the model by future researchers.
The Accept/Reject or acceptance method of sampling results in a random
sample from the desired distribution. Let f(x) be the target density. Let g(x)
52


be a distribution from which it is easy to sample, called the envelope density.
Further, suppose there exists a constant M such that f(x) < Mg(x), everywhere
on the support of the density /.
Algorithm 3.1 The basic Accept/Reject (A/R) algorithm is:
1. Generate a value X from the density g, and a value U from the uniform
density on [0,1].
2. IfU< m^'x) > then accept the value X.
An accepted X is distributed according to the density f.
The acceptance sampler can work extremely well, but its performance de-
pends on finding an envelope density g that closely matches the target density,
as well as knowledge of the constant M. Further, for properly normalized den-
sities, the average probability of accepting the value X in the algorithm 3.1 is
1/M, so that the expected number of iterations until a value is accepted is M
[64]. Thus to make the acceptance sampler efficient, the density g must closely
match the target density so that a sizable random sample can be generated in
a reasonable amount of time [64, 25].
Acceptance samplers have the benefit of generating an independent sample
exactly from the target distribution, and they can be applied to multivariate
densities. However, the problem of finding a suitable envelope density if often
prohibitive, especially in higher dimensions, although there are methods such as
Adaptive Rejection Sampling that can reduce this problem for certain kinds of
target densities. In addition, accept/reject samplers are difficult to modify, so
that slight changes in a model can require the construction of a new sampler.
53


There are a total of seven parameters in the CWD model. However, the sev-
enth parameter, 7r, is independent of all of the others. Its posterior distribution
has a known form, 5efa(18.5,19.5), so it does not need to be included as part of
the simulation, since it can be simulated directly. Hence only the remaining six
parameters, pio.o, P4,0i Ps,o> <*, and 7, must be sampled. Each parameter has a
beta distribution as its prior, and hence the probability density function is non-
zero only on the interval (0,1). Because the posterior distribution is known to
be confined to the six dimensional unit cube, a beta distribution seems a natural
starting point for the envelope distribution. Since there are six parameters, six
independent beta distributions are used with their parameters chosen so that the
modes of the beta distribution match the mode of the posterior distributions.
The mode is calculated by numerically maximizing the log of the posterior func-
tion, using the Nelder-Mead non-linear optimization method [59, 55] built in to
Matlab. The modes for the marginal posterior distributions in this model are
shown in table 3.3. The value of the log posterior distribution at the mode was
2731.9. The specific envelope distributions for the CWD acceptance sampler
were Beta{2.7,106) for the parameter pio.O) Beta(2.1,164.2) for the parameter
p4>0, Beta(0.7,310.7) for the parameter p5i0, Beta(2.5,15.6) for the parameter
a, Beta(9.4,54.4) for the parameter d, and Beta(27.4,3375.3) for the parameter
7. The results of the acceptance sampler are discussed in [37].
The acceptance sampler was run using Matlab, version 6, on a machine with
a 2.2Ghz Pentium 4 processor, with 2GB RAM. The sampler generated a ran-
dom value from the posterior roughly every 3 seconds. To generate a sample of
size 25,000 took nearly 24 hours. The slowness of the acceptance sampler makes
54


it impractical for studying variations of the model, or for applying to future
data sets. The long run time was likely due to the complicated nature of the
parameter space and the independence assumed in the envelope distributions.
Nevertheless, the random sample that was generated by the acceptance sampler
allows for examination of the characteristics of the joint posterior, and provides
a guide in designing a more efficient Markov chain sampler.
The 95% Bayesian credible intervals and the posterior estimates for the
six parameters based on this sample are shown in table 3.3. These summary
statistics will be used as an additional check for the Markov chain sampler.
Parameter L. Limit U.Limit Mode Mean Median Std.Dev.
Pio,o 1.67 5.63 2.51 3.35 3.24 1.01
P4.10 0.82 3.17 1.27 1.77 1.69 0.61
P5,0 0.07 0.69 0.22 0.31 0.28 0.16
a 6.67 22.13 14.08 13.01 12.51 3.98
S 11.18 20.39 14.79 15.08 14.84 2.38
7 0.62 1.09 0.81 0.84 0.83 0.12
Table 3.3: 95% Bayesian credible intervals, posterior estimates, and modes
for the marginal posterior densities simulated using the acceptance sampler. All
values have been multiplied by 100.
Scaled histograms of the variables Pio,OiP4,OjP5,o> 5, and j from the pos-
terior sample are given in Figure 3.4. For comparison, the prior distribution is
shown with a solid line. The prior and histogram approximation of the posterior
have been scaled to highlight the differences between the two distributions. This
55


Relative Frequency Relative Frequency
gives some indication of both the ignorance in the prior distributions and the
strength of the data in the model. Noting that the prior and posterior distribu-
tions are different for each variable, it is evident that the model is making use
of the information in the available data.
Figure 3.4: Comparisons of the posterior sample with prior distributions for pa-
rameters (a) pio.Or (b) a the acceleration parameter, (c) 5 the long run prevalence and
(d) 7, the average propensity to migrate. The prior and histogram approximation of
the posterior have been scaled to have maximum value of one in order to highlight
the difference between the two distributions.
Examination of the posterior distribution indicates that the parameters are
highly correlated. This correlation is demonstrated graphically in the pairwise
56


contour plots of the variables. Figure 3.5 shows the contour plot of the param-
eters a and (5. The points occur along a ridge of high probability. The pattern
indicates a high negative interdependency between these two parameters. The
ridge has a banana shape that is the most pronounced feature of the pos-
terior distribution, and goes far in explaining the slowness of the acceptance
sampler based on independent Beta distributions. Any successful Markov chain
simulation will need to reproduce this ridge.
The pairwise contour plots of the remaining variables are shown in figures 3.6
and 3.7. Each pair plot shows a pattern that indicates a relationship between
the two variables. The initial prevalence values Pw,o, P4,o, and p5)0 show evidence
of positive correlation. The acceleration parameter a shows evidence of negative
correlation with all of the other variables, while 5 and 7 are positively correlated.
Calculating the correlation matrix of the A/R sample confirms the graphical
evidence of relationships amongst the six parameters. The correlation matrix is
shown in 3.13. The initial prevalence in DAU 10 is strongly correlated with the
other initial prevalences themselves, with Corr(pio,pi) = 0.7214 being the high-
est correlation within this group of parameters. The other group of variables-
the dynamic parameters- also exhibits strong interdependence, although it is
clear from the graphical evidence in figure 3.5 that this relationship is nonlin-
57


0.05 0.1 0.15 0.2 0.25 0.3
a
Figure 3.5: Contour plot of a by 5. Most of the values occur along a banana
shaped ridge of high probability.
58


x 10~3
14
0.05 0.1 0.15 0.2 0.25 0.3
a
c
8
T
0.02 0.04 0.06 0.08
P10,0
Figure 3.6: Contour plots of variable pairs for the acceptance sampler. Plot
(a) shows the graph of the acceleration a versus the movement parameter 7.
Plot (b) shows the graph of the long term sustainable level S versus 7. Plot
(c) shows the graph of a versus Pio.o, the initial prevalence in DAU ten. Plot
(d) shows the graph of p4)0 versus the acceleration a. These graphs indicate
correlation among the parameters.
59


b
0.02 0.04 0.06 0.06 0.01 0.02 0.03 0.04 0.05
P10,0 P4,0
Figure 3.7: Contour pair plots of the variables for the acceptance sampler.
Plot (a) shows the graph of the acceleration a versus the initial prevalence in
DAU five, psfl. Plot (b) shows the graph of the initial prevalences pio,o and p^p.
Plot (c) shows the graph of p10,o and p5^. Plot (d) shows the graph of p4i0 versus
the p5)0-
60


ear.
( 1.0000 0.7214 0.5135 -0.4183 0.0461 0.0812 ^
0.7214 1.0000 0.4060 -0.4892 0.1321 0.2553
0.5135 0.4060 1.0000 -0.4440 0.2194 -0.0426
(3.13)
-0.4183-0.4892 -0.4440 1.0000 -0.8582 -0.4340
0.0461 0.1321 0.2194 -0.8582 1.0000 0.4406
^ 0.0812 0.2553 -0.0426-0.4340 0.4406 1.0000 )
There appears to be two natural groups of parameters, the initial prevalence
values, pio.o, P4,o, and p5io, and the dynamic parameters a, <5, and 7. There is
strong interdependence within each of these groups, and some evidence, based
on the pairwise graphs in figures 3.6 and 3.7, of a relationship between the
two groups. However, because the model is highly nonlinear, and because the
graphical evidence indicates that the relationships between the parameters are
nonlinear, a more general measure is needed. A canonical correlation analysis
shows the canonical correlation coefficient for the two groups to be 0.8134. This
is a strong indication that the two groups of variables are interdependent. This
interdependence will be an important consideration in the construction of a
Markov chain sampler for the CWD model.
61


4. Markov Chain Monte Carlo Simulation for the CWD Model
For some distributions, qualities such as independence or conditional in-
dependence make direct sampling analytically tractable and computationally
feasible. However, more complex models, such as the CWD model in chapter 3,
can lead to situations where the target distribution does not have a recognizable
form, or has characteristics that make direct sampling difficult or impossible to
apply. When direct sampling is not possible, alternative methods such as the
acceptance sampler (see section 3.4) or Markov chain Monte Carlo (MCMC)
approaches must be applied. The complex form of the posterior distribution in
the Chronic Wasting Disease model (equation 3.11) requires the use of computa-
tional methods to sample from the posterior distribution. This is not an uncom-
mon situation for Bayesian hierarchical models, but in some cases the hierarchy
does allow for some computational simplifications. Analytical construction and
analysis of the posterior distribution is only possible in very simple hierarchical
models, such as the normal model discussed in [24]. The relative advantages
and disadvantages of some common simulation methods are discussed, and an
efficient Markov chain sampler is constructed.
4.1 Markov Chain Methods
A Markov Chain is a sequence of random variables {A^} such that:
[A(n+1)|A(1),...,A(n)] = [A(n+1)|A(n)] . (4.1)
For the Markov chain {A^}, if there exists a distribution / such that A^*) ~ /
implies A(n+1) /, then the distribution / is called the stationary distribution
62


for the Markov chain [64, 68]. The stationary distribution is a limiting distri-
bution, where convergence is with respect to the total variation norm, || ||rv-
The total variation norm between two distributions /i and fi is defined by:
ll/i Mlrv = sup |/i(x) f2{x)|. (4.2)
X
A sequence of distributions / converges to a limiting distribution / if
lim ||/n f\\TV = 0
n-> oo
A Markov chain is called ergodic if the convergence to a stationary distribution
does not depend on the initial state of the chain X(0) [64], The possible values
of the Markov chain are called the states of the chain. A Markov chain is called
reversible if it has the property
[X(n+1)|X(") = x] = [X(n+1)|X(n+2> = x] . (4.3)
A Markov chain Monte Carlo (MCMC) method for simulating from a distri-
bution / is any method that produces an ergodic Markov chain, {X^}, with /
as the stationary distribution. MCMC methods do not produce an independent
sample exactly from the target density /. Instead, they produce a dependent
sample that is approximately distributed according to /. This may seem inferior
to accept/reject methods, which sample exactly from the target distribution, but
Markov chain methods possess a number of benefits that make their use advan-
tageous. First, it is usually easy to construct a Markov chain that converges,
at least theoretically, to the target distribution. Markov chain methods are
also easier to generalize and alter than their A/R counterparts, and therefore
facilitate the exploration of more complicated models.
63


The Metropolis-Hustings algorithm is a very general method for simulating
a Markov chain [64, 25]. Using virtually any distribution, a Markov chain can
be constructed with the target density / as its stationary distribution.
Algorithm 4.1 Suppose that q(y\x) is a conditional density from which it is
easy to simulate. Given a current value x, generate the next value in the
chain as follows.
1. Generate a value Y from the density q(y\x^).
3. Take = Y with probability min(cn, 1). Otherwise, x= x.
Under broad regularity conditions, the algorithm 4.1 produces a Markov
chain with stationary distribution f. The convergence of algorithm 4.1 depends
on the relationship between the target density / and the conditional density
q{x\y). This relationship is given by theorem 4.1.1 taken from [64].
Theorem 4.1.1 Let S denote the support of the density f, and suppose that
S is connected. If the support of the conditional density q contains S, then the
Markov chain produced by the algorithm 4-1 has f as its stationary distribution.
There are many variations of the Metropolis Hastings algorithm. One of the
most common is random walk Metropolis-Hastings, which is frequently called
the standard random walk algorithm [25, 64, 58], and is the terminology used in
this thesis. In the standard random walk algorithm, given the current state of
the chain, X^\ the proposed value is generated by Y = + et, where et has
distribution g. The conditional distribution q in algorithm 4.1 is equal to g(yx),
64


so that q now has the symmetric property q{x\y) = q(y\x). In many applications,
g is a normal distribution. If the proposal density g and the target density
in the standard random walk satisfies the conditions of theorem 4.1.1, then
the standard random walk will converge in the limit to the correct stationary
distribution.
The conditional density q in algorithm 4.1 is called the instrumental or
proposal density. Finding a proposal density is typically very easy compared
to finding a suitable envelope density for the acceptance sampler. Thus the
Metropolis-Hastings algorithm is in theory easy to apply. The resulting Markov
chain will converge in the limit to the desired stationary distribution. Moreover,
if the conditional density satisfies the conditions of theorem 4.1.1, then the
Markov chain produced by algorithm 4.1 is ergodic [64], After a suitably large
number of iterations, T0, called the bum in period, the chain provides a good
approximation to the target distribution.
For multivariate densities, the components can be sampled individually,
and the target distribution constructed by means of the complete conditional
distributions, that is, the distribution of the component X* given all of the
other components. Suppose that the random variable X has the desired mul-
tivariate density /, and that X can be written as X = (X^.-.X,,). Let
[Xi\xi, ...2j_i,x,-+i, ...,xn] denote the complete conditional distribution of Xj
given all of the other components of X.
Algorithm 4.2 Given X^ = (x^\... ,Xn^), a Markov chain with stationary
distribution f is produced by the following algorithm:
65


1.
2.
3.
<+1> ~ y l-r^ Ai |X2 .. T() 5 Xn
y I (*+!) yi2\X\ Tw x3 )
Wl
Repeat for each conditional until...
I
Generate Xn+1^ ~
(t+i)
i )
T(*+i)'
)xn-l
The algorithm 4.2 is called the Gibbs sampler, and is a classic example
of a strategy called successive substitution sampling (SSS), which enables high
dimensional problems to be broken into a series of univariate or simple block
sampling problems. The Gibbs sampler is a special case of the algorithm 4.1
in which the proposed value is always accepted. The Gibbs sampler is typically
used when the complete conditional densities have a known form and are easy
to sample. In general, the blocks Xi may be either univariate or multivariate.
For problems in which the complete conditional distributions are unknown, or
do not have a simple form, the conditional densities in 4.2 may be sampled using
a Metropolis Hastings step.
In practice, the burn in period for a Markov chain sampler can be pro-
hibitively large. While building a Markov chain that theoretically converges is
relatively easy, building one that works in a reasonable amount of time may be
difficult. This difficulty is more common in higher dimensional problems where
the straight forward application of algorithm 4.2 by means of the complete con-
ditional distributions is impossible. Complex models may lead to situations
where the target distribution has characteristics that make the chains produced
by standard MCMC methods mix slowly, that is, the Markov Chain does not
move rapidly about the support of the target distribution, which causes the
66


chain to converge slowly. This leads to a variety of techniques to accelerate con-
vergence of the chain. The application of these techniques often involves careful
tuning by trial and error.
Complex multivariate distributions can have a number of features that make
a density difficult to simulate. If the target distribution is multi-modal, has
sharp ridges of high probability, exhibits high correlation between individual
variables, or exists on a bounded parameter space, then constructing a Markov
chain that converges to the stationary distribution of interest in a reasonable
amount of time may be difficult. For multi-modal target distributions, the chain
can get stuck near one mode, oversampling near that mode and perhaps missing
others entirely. When a target distribution has sharp ridges of high probability,
a Markov chain can bypass these areas if the average step size, that is, the
distance between the current state X and the proposed future state Y, is
too large. The step size is determined by both the mean and the variance
of the proposal distribution. When there is high correlation between individual
variables, sampling each component separately is inefficient, since the acceptance
or rejection of one variable can have a large impact on the acceptance or rejection
of another. With bounded parameter spaces, care must be taken to not propose
outside of the parameter space too often, since too many rejected proposals will
slow down convergence of the chain.
There are a variety of techniques available to alleviate some of the possible
difficulties when sampling from multivariate distributions. When the variables
are highly correlated, reparameterization can remove some of the correlation.
Examples of regression models and random effect models where this approach
67


is especially effective are given in [25]. Implementation of a reparameterization
technique requires the identification of an appropriate transformation, which
may not be an easy task.
The technique of blocking seeks to utilize the correlation between parame-
ters to accelerate convergence. Blocking makes use of the correlation between
parameters in a multivariate setting by updating highly correlated variables to-
gether to increase the mixing speed of the chain, that is, the speed at which the
chain moves around the support of the target density [29]. This method can
be especially effective in hierarchical models with spatially structured data. A
discussion of how the blocking approach can be used when dealing with areal
data, i.e. data aggregated over discrete spatial regions, is found in [29, 6].
Hit and run samplers are a class of algorithms that have been designed
to simulate target distributions that exhibit signs of multi-modality [72, 11,
25]. The hit and run samplers are based on a generalization of the Metropolis-
Hastings algorithm [72, 11, 58], and use a proposal distribution that chooses a
direction at random, and then moves a random distance in that direction. The
difference between the hit and run method and the standard random walk is
that the new move is always accepted. The hit and run sampler can be effective
in sampling multi-modal target distributions, because the algorithm, with some
small probability, traverses the entire parameter space in a single move, which
allows mode-hopping when the proposed move is in the direction of another
mode. The hit and run algorithm has problems with mixing when the target
distribution has sharp ridges or spikes of high probability. The hit and run
algorithm may not pick out these areas of high probability sufficiently often,
68


causing the resultant chain to mix slowly. The problem of slow mixing becomes
even more significant in higher dimensions.
4.2 Stochastic Dynamics Methods
Stochastic dynamics methods are a class of algorithms motivated by stochas-
tic differential equations. Stochastic dynamics methods are commonly used to
simulate physical systems such as quantum molecular dynamics and heat trans-
fer. There are a number of Markov chain methods based on stochastic differential
equations, many of which are specifically designed for use in statistical physics,
although some have seen broader application [58]. The Langevin algorithm is
perhaps the most common, and is based on the discrete simulation of a contin-
uous time diffusion process [58, 25, 64]. Stochastic dynamics methods suppress
the random walk behaviour of a Markov chain, which has been shown to improve
convergence results for high dimensional target densities [28].
A general diffusion process Xt X(t) is a continuous time stochastic process
defined as the solution to the stochastic differential equation:
dXt = p{Xt)dt + o{Xt)dBt, (4.4)
where Bt is standard Brownian motion [41]. In equation 4.4, p,(Xt) is called the
drift term or drift coefficient, and (?(Xt) is the diffusion term or diffusion coef-
ficient. For simple cases, a solution to 4.4 can be found using the Ito Calculus,
which is derived in [13, 41]. If a solution to equation 4.4 does not exist or Xt
tends to infinity in a finite amount of time, then the diffusion process is called
explosive [41].
If a target density / is continuously differentiable, and if for some real num-
bers N,a,b < oo, when (V f{x))Tx < a||a;|| + b, for all ||x|| > N, then a diffusion
69


process can be constructed that has / as its stationary distribution [41, 71, 66].
The only non-explosive, reversible diffusion with this property is the Langevin
diffusion [64], which has the form:
dXt=l-V\og{f{Xt))dt + dBt. (4.5)
Where Bt is the standard Brownian motion.
A continuous time stochastic process cannot be directly simulated. Instead,
a discretized version of equation 4.5 must be used. The simplest such discretiza-
tion is the Euler discretization. If oj > 0 is the size of the discretization, then
the continuous time diffusion process 4.5 is approximated by a smart random
walk, with steps given by:
Xt+i =Xt + y Vlog f(Xt)+utt, (4.6)
where tt follows a multivariate normal distribution with mean 0 and covariance I.
The random walk in 4.6 is called smart because proposed values are generally
in directions of higher probability than the current state. Equation 4.6 can also
be written as:
Xt+i ~ N(Xt + |viog f(Xt),u2I). (4.7)
The Euler discretization in 4.6 and 4.7 is a first order approximation to
the stochastic differential equation 4.5. Other methods based on higher order
approximations have been developed, notably the Ozaki discretization which
uses a second order approximation [67]. A variety of discretization schemes for
stochastic dynamics methods are discussed in [58].
Unfortunately, the behaviour of the continuous time Markov process and
the discretized version may be radically different. This is because the Markov
70


chain based on stochastic process 4.5 samples from the correct distribution only
as the discretization size goes to zero [58]. Consequently, there are many situa-
tions where the random walks given by equations 4.6 and 4.7 can be transient,
that is, there exists a state X of the chain such that the expected number of
times the Markov chain visits this state is finite. A Markov chain that is not
transient is recurrent. Situations that lead to transience include those for which
the target distribution is insufficiently smooth, or erratic, so that the gradient
experiences large changes in a local area or has a large magnitude. This erratic
behaviour of the gradient makes the chain unstable. Many of these problems
can be corrected with a Metropolis-Hastings rejection step [5, 64, 25, 58]. This
method of correcting for the discretization is sometimes called Hybrid Monte
Carlo, Langevin Monte Carlo [58], or a Metropolis Adjusted Langevin Algorithm
(MALA) [71].
Algorithm 4.3 The MALA can be summarized as follows. Let the target dis-
tribution be denoted by f(x), and let fi(x) = ^Vlog/(a;). Let a(x,y) be defined
by:
a(x,y) = - -Pl- 2aJ. " (4.8)
f(x) eXp(-|li'-x-^)ll2))
Given the current state Xt, the Markov chain is simulated by:
1. Generate Y ~ N(A + fV log/(**), u2I)
2. Find min(a(At, Y), 1).
3. With probability min(a, 1), set Xt+1 = Y. Otherwise, set Xt+1 = X1.
The proposal value is kept with probability min(a, 1). The rejection step 4.8
turns the Langevin algorithm into a Metropolis-Hastings algorithm with a smart
71


proposal [25, 64]. The conditional density in the Metropolis-Hastings algo-
rithm 4.1 for the MALA is q{x\y) cc N(y + //(y),w2/), where the form of the
conditional density in 4.8 comes from the Brownian motion in the stochastic
differential equations 4.4 and 4.5. The key difference of the Metropolis adjusted
Langevin algorithm from the standard random walk is that the MALA sup-
presses the random walk behaviour of the chain by using the local properties
of the target density (i.e. Vlog/(A"t)) to adjust the proposal distribution [28].
This nudges the chain uphill in the direction of a mode, which is a highly
desirable property in high dimensional problems where the behaviour of the
gradient is smooth [71, 64, 25, 67].
The chain 4.3 will tend to propose values that are in the direction of the
mode. Because algorithm 4.3 is a special case of the Metropolis Hastings algo-
rithm, convergence of the chain relies on the properties of algorithm 4.1, and not
those of the Langevin diffusion process. Strategies such as reparameterization
and blocking can be employed with 4.3 to increase the mixing speed. The sim-
ulation can be further adjusted for erratic behaviour of the gradient by using a
truncated proposal step to achieve more robust ergodic properties [5, 71]. This
truncation comes through the imposition of an upper limit on the step length
in the proposal distribution [25, 65].
The MALA has been shown to perform as well as or better than the standard
random walk in many settings [66, 67, 71]. As with other kinds of Metropolis-
Hastings simulations, convergence properties depend largely on the smoothness
of the target distribution. One situation where the MALA works especially well
is when the target distribution is from the exponential family of distributions. In
72


situations where the target distribution is discontinuous or is not differentiable
at some points, then smoothed proposals can be employed [71].
A recent variation of the Langevin algorithm is the tempered Langevin algo-
rithm [67]. The standard MALA approach uses a fixed variance coefficient (w
in equations 4.6 and 4.7). A more general diffusion process can be used to get
another variation of the Langevin algorithm. Note that equation 4.5 is a special
case of equation 4.4, where the drift term is n(Xt) = |V log(/(Xt)), and the
diffusion coefficient is a(Xt) = col. If a more general diffusion term is used in
4.4, then the tempered Langevin diffusion is obtained [67].
If the diffusion coefficient is a(x), then the diffusion matrix is given by a
positive definite matrix a(x) = a(x)aT(x). Let 0 < d < Choosing a(x) =
f~2d(x)I I, and setting the drift term to be p(x) = ^^a(x) V log(/(x))
gives the Langevin tempered diffusion [67]. The tempered diffusion is the solu-
tion to the stochastic differential equation:
dxt = l-=^r2d(Xt)X log(f(Xt))dt + rd(Xt)IdBt (4.9)
The diffusion in equation 4.9 is a more general form of the Langevin diffusion
4.5 and is sometimes called a heated Markov chain. The two special cases of
equation 4.9 that are important are when d = 0, which implies the standard
Langevin diffusion, and d = 1/2, which gives Brownian motion [67]. Choosing
different values of d gives different heated chains.
Heated Markov chains have stronger theoretical convergence properties than
unheated chains, since the diffusion matrix acts as an accelerator [67]. The
accelerator can alleviate some of the problems encountered with multi-modal
target distributions. In areas of low probability, the variance a(x) = f~2d(x)I
73


will be larger, causing the chain to take larger steps in those areas. As the chain
enters areas of high probability, a(x) will decrease, leading to a smaller step size.
The discretized version of the diffusion 4.9 is given by:
Xt+i ~ N(At + -^a(Xt)V\ogf(Xt), a(Xt) I). (4.10)
Using a rejection step gives a Metropolis algorithm. To adjust the probability 4.8
for the tempered diffusion, set ui2 = a(Xt), which gives:
oc{XuYt)
m exp(-m^i)
/Wexp(-m^ggpi)
(4.11)
where /i(x) = ^^a(x)Vlog{f{x)) and a(ATf) = f~2d(Xt).
The tempered Langevin algorithm, like the standard Langevin algorithm 4.3,
is a random walk that makes smart jumps. That is, the chain is likely to
propose values that are in the direction of an area of high probability.
Blocking, hit and run algorithms, Langevin algorithms, and tempered
Langevin algorithms, are only a subset of the many methods for sampling from
complicated distributions. When more than a couple of difficulties are encoun-
tered, it can require the innovative use of several of these or other approaches
to generate a sample.
The joint posterior distribution produced by the Chronic Wasting Disease
model presented in chapter 3 has a number of characteristics that make the
implementation of a Markov Chain simulation difficult. Firstly, the posterior
distribution exists on a bounded parameter space. Secondly, calculation of the
gradient at various points of the parameter space indicates that the surface of
the density function contains sharp ridges of high probability.
74


The sharp ridges of high probability mean that a hit and run type algorithm
could miss the area of high probability, thereby over-sampling regions where
the density is comparatively low. Langevin algorithms also have difficulty in
this situation, because the steepness of the density causes the magnitude of the
gradient to be extremely large in areas of moderate probability, a common source
of instability in Langevin algorithms. The difficulties are compounded with the
bounded parameter space, since proposals are likely to fall outside the support
of parameter space. When using a normal proposal distribution, as with the
standard Langevin algorithm, too many proposals outside the parameter space
will cause the chain to mix slowly.
One possible method of dealing with this problem is to use a proposal distri-
bution in algorithm 4.3 with the same support as the parameter space. However,
even using a different proposal distribution, such as a truncated normal distri-
bution on [0,1], can slow the mixing of the Markov chain, because the potential
asymmetry of the truncated normal can cause difficulties for the standard ran-
dom walk or MALA. Additionally, the convergence theorems for the stochastic
differential equation 4.5 no longer apply, since the noise term is no longer a
Brownian motion.
4.3 A Tempered Langevin Algorithm for Bounded Densities
The posterior sample produced by the acceptance sampler in [37] and dis-
cussed in chapter 3 illustrates the complicated nature of the posterior distribu-
tion 3.11. The largest issue is the fact that the support of the posterior density
exists on a bounded parameter space. All of the parameters, pio,o> P*,Qi P5,o,
a, 6, and 7, exist only within the unit interval, whereas standard approaches
75