SIMULATIONS OF PROSTATE
BIOPSY METHODS
by
Catherine Colby Pellish
B.S.E.E., Marquette University, 1985
A thesis submitted to the
University of Colorado at Denver
in partial fulfillment
of the requirements for the degree of
Master of Science
Applied Mathematics
1997
This thesis for the Master of Science
degree by
Catherine Pellish
has been approved
by
William L. Briggs
James E. Koehler
Weldon A. Lodwick
Date
Pellish, Catherine Colby (M.S., Applied Mathematics)
Simulations of Prostate Biopsy Methods
Thesis directed by Associate Professor William L. Briggs
Abstract
An accepted practice in screening for prostate cancer involves a nee-
dle core biopsy of the prostate gland, which can provide information regarding
if, and how much, cancer is present in a gland. This paper documents several
investigations into prostate gland biopsy techniques. The first phase of study
involves a geometric model of a prostate gland containing one to three tu-
mors. This mathematical model of the gland is then used to simulate various
biopsy techniques and compare the resulting data. Secondly, the best biopsy
procedure, as determined from the geometric model, is simulated on actual
specimen data which have been digitized. These specimen data are also used
for simulation of the six random systematic core biopsy technique (SESCB)
currently in clinical use. The results of the geometric model are compared
to the results of the simulation on actual data. Finally, the geometric model
is used in another series of simulations that investigate the number of needle
samples needed to estimate the tumor to gland volume ratio.
m
This abstract accurately represents the content of the candidates thesis. I
recommend its publication.
Signed ______________________
William L. Briggs
IV
ACKNOWLEDGEMENTS
I would like to sincerely thank a number of people who consistently
provided me with their support, encouragement and guidance as I pursued the
completion of this thesis. Dr. Bill Briggs, my advisor, served as a constant
source of insight and motivation, as well as providing considerable direction
throughout this process. I am also grateful for the time spent with Dr. Jim
Koehler who had to teach me the finer points of statistics again and again.
My thanks to both of these professers for proving to be excellent academic
sources. I also would like to thank Norm LeMay who, out of the generousity
of his heart and his need for a free lunch, assisted me in running the ANOVA
analysis which this thesis required.
Finally, I must thank my family, Mark, Eric and Corinne for encour-
aging me and making me laugh through every crisis.
CONTENTS
Chapter
1 Introduction............................................... 2
1.1 Clinical Prostate Biopsy Analysis.......................... 2
1.2 Summary of Mathematical Methods............................ 4
2 The Geometric Model........................................ 5
2.1 Geometric Model of gland and tumor......................... 5
2.2 Simulations............................................... 10
2.3 Statistical Analysis of Results........................... 14
2.4 Simulation Results........................................ 16
2.4.1 Applying the ANOVA to the Biopsy Simulation Data 18
2.4.2 ANOVA Mechanics......................................... 23
2.4.3 Residuals .............................................. 24
2.4.4 The Null and Alternate Hypotheses....................... 25
2.4.5 Are the Main Effects all Equal? ........................ 27
2.4.6 Recognizing Interaction between Factors................. 30
2.4.7 Clinical Distribution of Tumors......................... 38
vi
3 Digitized Specimen Data
43
3.1 Summary of Software Tool............................... 43
3.2 Specific Algorithms ................................... 45
3.2.1 Locating the Apex..................................... 45
3.2.2 Establishing Needle Positions......................... 47
3.3 Simulations............................................ 49
3.4 Geometric Model vs Clinical Model ..................... 51
3.5 Optimal Technique vs SESCB ............................ 53
4 Geometric Model Volume Estimates....................... 56
4.1 Tumor Volume Estimates................................. 56
4.1.1 One-Dimensional Analysis Line Model............. 58
4.1.2 Two-Dimensional Strip Model .................... 58
4.1.3 Three-Dimensional Cylinder Model.................... 59
4.2 Experiment Setup....................................... 60
4.3 Results................................................ 62
4.4 Interactive Utility.................................... 63
Appendix
A ANOVA Definitions............................................. 65
1
1. Introduction
1.1 Clinical Prostate Biopsy Analysis
Currently the standard method of determining if a given prostate
gland is cancerous involves two procedures. The first is the prostate-specific
antigen (PSA) test which measures the level of antigens in the patients blood,
a high level indicating a higher possibility of cancerous tissue. The second
procedure is the needle biopsy which is carried out if the PSA test so indicates.
The clinician conducts this biopsy by inserting a needle-tool, equipped with
ultrasound capabilities, into the patients rectum. The gland is located and
the urologist fires three needles into the right lobe of the gland and three
needles into the left lobe at approximately symmetric positions. The left-
right division of the gland is determined by the position of the urethra in the
gland. This physical landmark is used as the visual dividing line, enabling
clinicians to execute the biopsy in a systematic manner. The needle-tool is
rotated to the left or right depending on the targeted lobe. This rotation
corresponds to the angle
slight rotation, the needles are inserted at a second independent angle, referred
2
to as 9. The choice of a six-needle biopsy is based on the six random systematic
core biopsies (SESCB) method developed by Hodge et al [1] and currently
thought to achieve the best detection rates.
The results from this diagnostic biopsy are then analyzed in order to
determine the best treatment plan for the patient. There are several factors
that help the urologist choose the optimal treatment plan. The first factor is
obviously whether the biopsy shows any tumor cells at all. According to the
Hodge study, 96% of the 83 men diagnosed with cancer had the cancer detected
by SESCB. However, as investigated by Daneshgari et al [2], in prostate glands
with low tumor volume, the SESCB fails to achieve such a high percentage of
detection. This study concluded that an improved biopsy strategy may be
needed in detection of CaP (carcinoma of the prostate) in patients with low
volume cancer. Secondly, the volume of the tumor itself is a deciding factor
in determining treatment. Thirdly, the location of the tumor, specifically if
the tumor penetrates the capsule of the gland, can define a specific treatment
plan. Some of this information is available from a single needle-core biopsy;
more information is gleaned from successive, strategically placed biopsies.
3
1.2 Summary of Mathematical Methods
As an aid in understanding this problem, as well as researching ways
to improve diagnosis, two methods of analysis are undertaken. The first
method relies on a geometric model of the prostate gland with from one to
three tumors. Various biopsy methods are simulated with this mathematical
model and results are tabulated. The second method involves running the
same biopsy simulations on actual prostate glands which have been digitized
and stored as three-dimensional objects in a computer. The experimental re-
sults from these two methods are then compared. All of the simulations were
executed using software created for this purpose primarily by this author, al-
though the skeletons of these software tools were engineered during the Spring
1995 Math Clinic on this topic by several participants. The simulations are
written in C and C++, running on a UNIX-based computer. They are exten-
sively documented and flexible enough to be useful in a variety of experiments
within this realm of research.
4
2. The Geometric Model
2.1 Geometric Model of gland and tumor
An actual prostate gland is about the size of a walnut with volumes
ranging from 22 cc to 61 cc [3]. The geometry of an ellipsoid closely models
this gland and any tumors present within it. Therefore, an ellipsoid of the
form
x2 y2 z2 1
----b --1----= 1
A2 B2 C2
is used to represent the prostate gland. Ellipsoids are also used to represent
each of the tumors. The dimensions of the gland, A, B, and C, are chosen
randomly in the following experimentally determined ranges:
3.0 cm < A < 4.8 cm
3.8 cm < B < 4.6 cm
3.8 cm < C < 5.2 cm
22 cc < [gland volume} < 61 cc.
The prostate is divided into 3 zones: the peripheral, the central and
the transition region. The peripheral zone comprises approximately 70% of
the mass of the prostate gland. It is located in the lower area of the gland,
5
closest to the rectum. This region is the site of origin of most carcinomas [3].
The central region makes up approximately 25% of the glandular mass and
is resistant to both carcinoma and inflammation [3]. The transition region
contains the remaining 5% of prostate gland tissue and can be the site of some
cancers. Figure 2.1 shows these regions of the prostate gland. Based on this
clinical information, the software-generated tumors are located in the lower
part of the elliptical gland model to simulate tumors residing in the peripheral
zone. Figure 2.2 depicts the geometrical gland and tumor model in the xyz
system. Since the gland model is centered at the origin, the //-coordinate of
the tumor center, yc, is always negative in order to place the tumor in the
peripheral zone of the gland. However, other distributions of y could be used
to improve the model.
Tumors are modeled by an equation of the form
% xc)2 (y yc f (z zc f =
a2 h2 c2
where xc, yc and zc specify the center of the tumor.
The biopsy needle is modeled as a line with the parametric equations
x(t) = Xo + tsm0sin(f)
y(t) = y0 + t sin 9 cos
z(t) = z0 + t cos 9,
6
Figure 2.1. The peripheral (PZ), central (CZ) and tran-
sition (TZ) regions divide the prostate gland into 3 ma-
jor zones.
Figure 2.2. The gland and tumor are modeled by ellip-
soids in the xyz coordinate system.
7
where xQ, yQ, and z0 are the coordinates of the entry point of the needles
(Figure 2.3 and Figure 2.4). The angle
determines a plane. The angle 9 is then assumed to remain in this plane and
is measured from the z-axis. From these definitions, the parametric equations
for the line are determined. The parameter t measures the length of the needle.
Figure 2.3. This figure of the xy plane and needle illus-
trates measurement of ip.
Substituting the parametric equations of the needle into the equation for the
tumor, it is possible to determine values of t corresponding to an intersection.
The equation of the tumor is
(x(t) xc f (:y(t) yc f (z(t) zc f
8
Figure 2.4. This figure of the yz plane and needle illus-
trates measurement of 9.
Replacing x(t), y(t) and z(t) by the parametric equations of the nee-
dle gives
f2{
sin2 6 sin2 9 sin2 9 cos2 6 cos2 9,
----------------1---------------- H----------
a2 b2 c2 '
^f2(x0 xc) smipsm9 | 2(y0 yc) sm9cosip
t( o I JO ^
a2 tr
2(^0 zc) cos 9 (xq-xc)2 (yo-yc)2 (zQ zc)2
c2 1 a2 b2 c2
) = I- (2-1)
If the discriminant (B'2 1.!'("') is positive, two real roots exist. In this case
we have
A
, sin2 6 sin2 9 sin2 9 cos2 6 cos2 9
------1--------1------------ H-------
a2 h2 c2
B
, 2(a;0 xc) sin
C
b2
, (*^o xc)2 (yQ yc)2 {zq zcy
a2 62 c2
9
If real roots t\ and f2 exist, they give the points where the tumor
ellipsoid and the line intersect. If these values are greater than 0 and less than
the actual needle length, the needle has intersected the tumor. The amount
of tumor extracted by the needle is proportional to the difference between the
two roots of the quadratic, | ti |- By comparing the two roots, an estimate
of the volume of the tumor that is contained in the needle can be made. If
real roots do not exist, the needle does not intersect the tumor ellipsoid and
no tumor information is gained by that needle.
In this analysis, each biopsy procedure was simulated on 1000 differ-
ent gland models and the number of times a tumor was detected per procedure
was recorded. This method does not differentiate between one or more nee-
dles detecting the tumor. It simply records a hit or miss per biopsy procedure.
In addition, an estimate of the tumor volume is made whenever a tumor is
detected.
2.2 Simulations
Since a fundamental goal of any biopsy is to determine whether or
not the gland contains cancerous cells, the first series of simulations is intended
to compare the detection rate of several biopsy techniques. The detection rate
is defined as the number of times a biopsy procedure detects a tumor to the
10
total number of biopsies conducted. A set of 54 different biopsy procedures
is simulated with variation in the following parameters: number of needles,
offset between needles in the z direction, 6, and 4>.
The distance in the z direction between needles can be a relative
spacing based on the gland dimension in the z direction or an absolute spacing
of 1 cm between each needle. The first method is referred to as relative
spacing since it depends on the gland size and separates the needles by equal
distance. The second is referred to as the absolute spacing and has its basis
in the SESCB procedure.
As a means of clarification, Figures 2.5 and 2.6 illustrate the analysis
of a single specimen and the execution of the entire experiment. Each of the
54 biopsy procedures is simulated on 1000 different gland models. The random
number generator is seeded once for each series of 1000 simulations using a
specific biopsy technique. Prior to the next technique, the random number
generator is reseeded with the same number, thereby yielding the identical set
of 1000 prostate models. This insures that each of the biopsies is conducted
on the same set of 1000 simulated glands. The detection rate is determined
for each of these procedures and the results of the simulation are documented
in Table 2.1.
11
Figure 2.5. This flow chart depicts the top-level algo-
rithm for modeling a single biopsy with several needles.
12
Figure 2.6. This flow chart depicts the simulation pro-
cess for the entire simulation, each biopsy procedure is
simulated on 1000 geometric gland models.
13
2.3 Statistical Analysis of Results
In order to interpret the output from the simulations legitimately, a
statistical tool is needed. First, we must determine whether or not the various
biopsy settings influence the observed detection rate. In other words, is there a
relationship between the settings of any one or combination of the four factors
(number of needles, ^-spacing, 9 and 0) and the detection rate or are the results
completely random, therefore implying that the biopsy specification does not
determine the detection rate? We need a mathematically sound method to
compare the detection rates provided by the simulation and to infer some
conclusions. The statistical model known as Analysis of Variance (ANOVA)
was used to compare the population means between various treatments, thus
resulting in a statistically valid conclusion. This model can be employed to
determine whether the various factors interact and which factors have the most
impact on the outcome.
In order to describe the ANOVA model, a few definitions are required.
(1) Factors are the independent variables that are under investigation.
In this instance, the biopsy parameters (number of needles, spacing
14
method, 9 and
Number of Needles Spacing Method 9
Factor 4 Absolute 30 30
Levels 6 Relative 45 45
8 o O o O
(2) Factor levels are the values that each of the factors can take on during
a single simulation. As shown in the list of biopsy simulation factors
and levels, each factor does not have the same number of factor levels.
The factor Spacing Method only has two factor levels, whereas the
other three factors each have three factor levels.
(3) A treatment is a particular combination of levels of each of the factors
involved in the experiment, where an experiment is the simulation
of the treatment on 1000 geometric specimens. In this example, a
treatment refers to a biopsy with specific settings (for example, 4 nee-
dles, absolute spacing, 9 = 45,
are 54 different treatments and therefore, 54 different experiments,
corresponding to all the combinations of the levels of the four factors.
(4) A trial is defined to be a simulation of one treatment on one geomet-
ric model. The outcome of a trial is either 1, the biopsy procedure
detected the tumor, or 0, the tumor remained undetected. The out-
come of the experiment is the detection rate achieved by a specific
15
treatment simulated on 1000 geometric specimens. In other words, the
outcome of the experiment is the number of specimens in which
tumor is detected versus the total number of specimens simulated and
is referred to as outcome for the remainder of this thesis.
2.4 Simulation Results
For each of the 54 treatments, the simulation is conducted on 1000
different gland models. The following table summarizes the treatment param-
eters as well as the results:
Treatment Parameters Outcome
Experiment Number of Needles Spacing Method e Detection Rate
1 4 Relative 45 45 0.252
2 6 Relative 45 45 0.307
3 8 Relative 45 45 0.335
4 4 Absolute 45 45 0.263
5 6 Absolute 45 45 0.293
6 8 Absolute 45 45 0.298
7 4 Relative 60 45 0.267
8 6 Relative 60 45 0.341
9 8 Relative 60 45 0.369
10 4 Absolute 60 45 0.270
11 6 Absolute 60 45 0.320
12 8 Absolute 60 45 0.339
13 4 Relative 30 45 0.196
14 6 Relative 30 45 0.225
15 8 Relative 30 45 0.255
16 4 Absolute 30 45 0.207
17 6 Absolute 30 45 0.221
18 8 Absolute 30 45 0.221
Table 2.1. The results from the 54 geometric model
experiments are displayed.
16
Treatment Parameters Outcome
Number of Spacing Detection
Experiment Needles Method e Rate
19 4 Relative 45 60 0.200
20 6 Relative 45 60 0.234
21 8 Relative 45 60 0.268
22 4 Absolute 45 60 0.211
23 6 Absolute 45 60 0.225
24 8 Absolute 45 60 0.228
25 4 Relative 60 60 0.191
26 6 Relative 60 60 0.254
27 8 Relative 60 60 0.268
28 4 Absolute 60 60 0.209
29 6 Absolute 60 60 0.240
30 8 Absolute 60 60 0.246
31 4 Relative 30 60 0.172
32 6 Relative 30 60 0.194
33 8 Relative 30 60 0.219
34 4 Absolute 30 60 0.188
35 6 Absolute 30 60 0.197
36 8 Absolute 30 60 0.197
37 4 Relative 45 30 0.260
38 6 Relative 45 30 0.316
39 8 Relative 45 30 0.341
40 4 Absolute 45 30 0.264
41 6 Absolute 45 30 0.305
42 8 Absolute 45 30 0.316
43 4 Relative o O 30 0.283
44 6 Relative o O 30 0.351
45 8 Relative o O 30 0.385
46 4 Absolute o O 30 0.279
47 6 Absolute o O 30 0.346
48 8 Absolute 60 30 0.372
49 4 Relative 30 30 0.210
50 6 Relative 30 30 0.247
51 8 Relative 30 30 0.273
52 4 Absolute 30 30 0.225
53 6 Absolute 30 30 0.245
54 8 Absolute 30 30 0.247
Table 2.1. (Cont.) The results from the 54 geometric model
experiments are displayed.
17
2.4.1 Applying the ANOVA to the Biopsy Simulation Data
The biopsy simulation is a multi-factored system, in which the four
parameters (number of needles, spacing, 9 and 0) individually and perhaps
in some combinations may have a measurable effect on the detection rate.
Therefore a factor effects model is used in order to determine the impact
of and interactions between these four parameters. This biopsy simulation
is considered a complete factorial study since all possible combinations of the
four parameters were simulated and evaluated. The indices %,j, k, l refer to the
levels of the factors number of needles, spacing method, 9 and respectively.
In this multi-factored system, a true overall mean, p which is equiv-
alent to the true overall detection rate, is assumed to exist. The entire simu-
lation results in 54 observed detection rates, ppui, each of which indicates the
observed detection rate for a given experiment. This set of 54 observed detec-
tion rates is used in the ANOVA to determine estimated factor effects and an
estimated overall mean which are used in the factor effects model. The factor
effects model is used to predict a detection rate, a probability of detection,
pijki, given the levels of the four factors.
A factor level mean is the average detection rate for a group of
18
treatments that have one common factor level held constant while all others
vary. For example, all outcomes from experiments with Number of Needles= 6
are averaged to yield the factor level mean for the factor Number of Needles
at the level i = 6. The overall mean, //. is simply the average outcome
of all experiments. The difference between each factor level mean and the
overall mean yields the main effect for that factor level. Because this model
has 4 factors each with either 2 or 3 levels, the following main effects are
designated.
Q!i the main effect for the factor Number of Needles at each of its
levels (4,6,8): 1 < % < 3.
(3j the main effect for the factor Spacing Method at each of its levels
(0,1): 1 < j < 2.
7fe the main effect for the factor 9 at each of its levels (30,45,60):
1 < k < 3.
8i the Main Effect for the factor
1 < l < 3.
A factor at a particular level may influence another factor either by
inhibiting or enhancing its impact. Because of these interactions between
factors, the interaction effects are included in the model. Pairwise interaction
19
effects are a measure of the combined effect of two factors, across the different
levels, minus the main effects of these factors. We define these two-way effects
as follows.
(a/3)ij number of needles and spacing method
ial)ik number of needles and 9
(aS)u number of needles and
iPl)jk spacing method and 9
{(35)ji spacing method and
{j5)ki 9 and 4>.
Three-way factor effects are a measure of the interaction effect of three factors.
(a(3j)ijk number of needles, spacing method and 9
(a(35)iji number of needles, spacing method and
{l3'yS)jk[ spacing method, 9 and
{ar)8)iki number of needles, 9 and 4>.
The four-way effect is the measure of the interaction effect of all four factors.
{aPj5)ijki number of needles, spacing method, 9 and 4>.
20
Summary of Variables
True overall mean n
Estimated overall mean fi
True treatment mean IMjkl
Estimated treatment mean Pijkl
Observed treatment detection rate Pijkl
Transformed observed treatment detection rate Yijkl
Estimated treatment detection rate Pijkl
Transformed estimated treatment detection rate Yijkl
Average observed detection rate P
True main factor level effects a,h 0, 7fe, S(
Estimated main factor level effects a*, Pj, Ik, St
True two-way effects (af])ij, (ay)ik, (aS)u {Pi)jk, (PS)jt, (7S)m
Estimated two-way effects (^7Ma (Pl)jk> {P8)n, (7S)M
Table 2.2. A list of the variables used in the ANOVA analysis
is displayed.
The factor effects model takes the general form
Pijkl /i + a* + Pj + 7fc + Si + (a0)ij + (ay)ik + (aS)u + (Pl)jk + + (7^)fei
+(a#y)iifc + {oiPS)iji + (PyS)jM + (ajS)m + (aPyS)ijkl.
The observed outcome, the detection rate for a particular treatment,
as given in Table 2.1, is pijki and is the sum of the true mean for that treatment
and a residual term:
Pijkl = IMjkl + Oj'fcl-
21
The goal of the analysis is to formulate a model that predicts the
outcome of a given treatment. Since the true means and true factor effects are
not known, estimates of these terms are determined from the simulation and
used in the model. Estimated values are indicated with the ~ notation. The
predicted outcome is represented by the following relationship:
Pijki = A + (h + Pj + ik + $i + {ptl3)ij + {aj)ik + {aS)u + {f3j)jk + {P$)ji + (7 $)ki
+ M l)ijk + + W)jkl + (al5)iki + (a^5)ijki-
In this equation is the estimated probability of detecting a tumor at the
factor levels indicated by %,j,k,l. This probability is predicted by the model
using least -square estimators for the terms in the equation. The probability
of detection is a function of the estimated overall mean, /2, and the estimated
effects from the four factors, alone and in combination with one another. Not
all of these effects may be significant. In order to determine which of the
factors do significantly effect the detection rate and therefore belong in the
final model, various means are evaluated. If all the means for a particular
factor (or combination of factors) are equal, varying a factor level does not
add to or subtract from the overall mean and therefore the factor does not
belong in the final model. This equality question is put, not only to each
factor individually, but to all the combinations of factors as well.
22
2.4.2 ANOVA Mechanics
Use of the ANOVA model is founded on several assumptions:
(1) The outcomes follow a normal probability distribution.
(2) Each distribution has the same variance.
(3) The outcomes for each factor level are independent of the other factor
level outcomes.
With these assumptions in mind, note that the probability distributions of a
factor at each of its levels differs only with respect to the mean [4], Therefore,
the first step in executing the analysis is to determine if the detection rates,
are statistically different. Secondly, if they are different, one of the intents of
the ANOVA model is to determine if the difference between the detection rate
of two or more treatments is sufficient, after examining the variability within
the treatments, to conclude that one treatment does indeed produce a higher
detection rate. In addition, by evaluating the statistical data, conclusions may
be drawn as to how each factor, both independently and within established
interaction groups (pairwise, three-way or four-way), influences the outcome.
23
2.4.3 Residuals
We define p to be the average of all observations. The model states
that Pijki = IMjki + Â£ijkh therefore the residual term is = Pijki IMjki Since
Pijki is estimated by fiijki, the estimated residual term is e^i = pijki Pijki,
the difference between the observed and the estimated average detection rate.
The set of all 54 residuals, e^i, for all i,j,k and l are evaluated for three
characteristics which indicate whether the fitted data are well-suited for the
analysis. These characteristics are:
1. Normality of error terms.
2. Constancy of error variance.
3. Independence of error terms.
Several statistical tests and plots used on the residual data determine
whether one of the five assumptions is violated. These tests revealed that
the error variances were not stable, thus violating the first characteristic. A
transformation was employed to preserve the statistical information in the
output, but stabilize the error variances. Since nothing is lost by employing a
transformation and the error variances are stabilized, the detection rate data
p is transformed to Y via the following relationship:
Y = 2 arcsin(x/p).
24
The outcome from these simulations is the detection rate, a proportion of the
number of specimens where tumor is detected to the total number of specimens.
The arcsine transformation is the most appropriate transformation when the
outcome is a proportion [4], All ANOVA data referenced from this point on are
transformed unless noted otherwise. The inverse transformation is calculated
at the conclusion of this analysis to get a true estimate of the probability.
2.4.4 The Null and Alternate Hypotheses
A starting point in the ANOVA process is to establish two hypothesis,
a null and alternate hypothesis. The null hypothesis assumes that all effects
are equal, therefore indicating that specific factor levels do not influence the
outcome. The alternate hypothesis assumes that at least two of the effects are
not the same.
The F-test is used to decide which of these two hypotheses concerning
the data will be accepted. The test consists of computing the ratio of between-
effect variation to within-effect variation. This bet weeu-elfeet variation, which
changes depending on the effect, is called the treatment sum of squares
and is denoted SSA, SSB, SSC, and SSD (see Appendix also). It is a
measure of the difference between the detection rate of a set of treatments
and the average detection rate over all treatments. The within-effect variation
25
is called the error sum of squares and is denoted SSE. It is a measure
of the difference between the individual outcome for a given treatment and
the estimated detection rate over that treatment. The error sum of squares
measures variability that is not explained by the SSA, SSB, SSC, or SSD
terms and therefore occurs within the set of treatments. Both of these variation
measurements are evaluated using sum of the squares expressions as detailed
in the Appendix. The means of the SSA, SSB, SSC, SSD and SSE terms
are MSA. MSB. MSC. MSI) and MSE respectively, and are computed by
dividing by the degrees of freedom, df, associated with each term. This results
in /' = MSA/MSE where MSA = SS A fdf\ (MSB = SSB/dfs,etc) and
MSE = SSE/df. Large values of F tend to support the conclusion that all
the effects are not equal (Ha), whereas values of F near 1 support the null
hypothesis (H0). In the event that the alternate hypothesis is indicated via
the F-test, the ANOVA also provides the probability of a TYPE I error. A
TYPE I error occurs when it is concluded that differences between means
exist when, in fact, they do not (i.e. accept Ha when in fact H0 is true). This
information is given in the column labelled Pr(F) in the ANOVA output in
Table 2.3.
26
2.4.5 Are the Main Effects all Equal?
Following the general process of establishing null and alternate hy-
pothesis as described above, a pair of null and alternate hypotheses are stated
for each factor in the biopsy model. The null hypothesis assumes that the
main effects for a given factor at each of its levels are equivalent. The alter-
nate hypothesis obviously assumes that the main effects differ.
H0: Q!i = Q!2 = Q!3 Ha; not all cq are equal.
Pi = /?2 not all Pi are equal.
<$i = 82 = S3 not all 7i are equal.
7i = 72 = 73 not all Si are equal.
The F-test statistic is applied to determine which hypothesis to ac-
cept in each case. The factor sum of squares for each factor, number of nee-
dles, spacing, 9 and p, denoted SSA, SSB, SSC and SSD, respectively,
is computed as shown in the Appendix. The mean of each of these fac-
tor sum of square terms is computed by dividing each term by its associ-
ated degrees of freedom so that MSA = SSA/S/a, MSB = SSB/dfs, etc.
as detailed in the Appendix. The test statistic is formed for each hypoth-
esis in the following manner. To test the effect of the first factor, Num-
ber of Needles, F = MSA/MSE; to test the effect of the spacing factor,
27
F = MSB/MSE; to test the effect of 0. /' = MSC/MSE; and to test the
effect of o. /' = MSD/MSE. Accepting the alternate hypothesis means that
a specific setting of the given factor corresponds to a change in detection rate;
thus that factor has an effect on the overall outcome of the biopsy.
Df Sum of Sq Mean Sq F Value Pr(F)
Needles 2 0.15862 0.07931 607.427 0.0000000
Main Spacing 1 0.00498 0.00498 38.209 0.0000011
Effects e 2 0.29249 0.14624 1120.073 0.0000000
2 0.28115 0.14057 1076.661 0.0000000
Ndls:Spc 2 0.1641 0.00820 62.846 0.0000000
Needles: 9 4 0.01444 0.00361 27.653 0.0000000
2-Way Spacing: 9 2 0.00059 0.00029 2.283 0.1206068
Effects Needles: 4 0.00395 0.00098 7.569 0.0002892
Spacing: 2 0.00046 0.00023 1.794 0.1848710
9: 4 0.02867 0.00716 54.902 0.0000000
Residuals 28 0.00365 0.00013
Table 2.3. The output from the ANOVA is displayed above. See
Appendix for details of the calculations.
Eefering to this ANOVA output, the column of numbers labelled Sum
of Sq refers to the parameters SSA, SSB, SSC and SSD detailed in the Ap-
pendix. The column labelled Mean Square lists the parameters MSA, MSB,
MSC, MSD. The F Value column lists the F-test outcome for each row: (Nee-
dles F Value = MSA/MSE). The larger values in this column tend to support
the alternate hypothesis that the main effect for a given factor differs across
28
the possible levels for that factor. The final column, Pr(F), gives the probabil-
ity of a Type I error. Again, a Type I error occurs if the alternate hypothesis is
concluded when in fact, the null hypothesis is true. The row labelled Residuals
indicates the total degrees of freedom, the SSE and the MSE for this analysis.
Based on the numbers in the table, each of the four main effects
has a significant effect on the outcome with the factor 9 having the great-
est influence on the detection rate, followed by the factors
ber of Needles. This fact is indicated by the high F-value that corre-
sponds to each of the four factors. The rows labelled with two factor names
(for example, Needles: Spacing) indicate the ANOVA output correspond-
ing to pair-wise interactions and include the sum of squares computed for
each pair of factors. The sum of squares for all of the pair-wise interac-
tion terms (SSAB, SSAC, SSAD, SSBC, SSBD, SSCD) are computed as
detailed in the Appendix. The total treatment sum of squares, SSTR =
SSA+SSB+SSC+SSD+SSAB+SSAC+SSAD+SSBC+SSBD+SSCD.
This sum does not include the sum of square terms due to the three-way and
four-way interactions because there are not enough degrees of freedom in the
experiment to use the full model.
29
2.4.6 Recognizing Interaction between Factors
At this point, the F-test has determined that each of the main factor
effects contributes to the overall detection rate. To evaluate the interaction
effects, the F-test is applied again The F-test is applied to determine inter-
action between, in this case, two, three or four factors. A null and alternate
hypothesis is formulated for all possible combinations of factors and sum of
square terms are computed for the factor groups and used in each F-test. The
null and alternate hypothesis are constructed for each of the pairwise interac-
tions.
H0: all (a0)ij = 0 Ha: not all (ap),^ = 0
all (aj)ik = 0 not all (aj)ik = 0
all (aS)u = 0 not all (aS)ii = 0
all {Pi)jk = 0 not all (/3j)jk = 0
all {pS)ji = 0 not a\\((38)ji = 0
all (jS)ki = 0 not all (jS)kl = 0 All three-way combinations are formed, hypotheses are constructed and F-test
results are evaluated. H0: all (a(3j)ijk = 0 Ha: not all (afij)ijk = 0
all (a(38)iji = 0 not all (a(38)iji = 0
all (ajS)jM = 0 not all (aj8)iki = 0
all (076)jkl = 0 not all (/3jS)jkl = 0 The null/alternate set of hypothesis is constructed for the four-way interaction.
30
H0: all {a(3-fS)im = 0
Ha: not all (a(3j8)ijki equal 0
Based on the actual ANOVA results in the preceding table, four of
the pair-wise interactions appear strongly significant: Needles: Spacing,
Needles: 9, Needles:
. The other two pair-wise interactions are
included in the final model even though the strength of their significance is
uncertain. The ANOVA was executed once to include all three-way interac-
tions. Since these interactions proved insignificant, they are not included in
the model. There are not enough degrees of freedom in the experiment to
estimate the residuals and test for the four-way interaction.
As stated previously, the Y notation indicates the transformed de-
tection rate (p). At this point the general model, of the form
1ijkim = /7... T T j3j T 'Tfc T S[ Main effects
+iaP)ij + (al)ik + (aS)u + ((3j)jk + +((38)ji + (j8)ki Pairwise effects
+(a/3j)ijk + (a(38)iji + (/3j8)jki Three-way effects
+ (a/3j8)ijki Four-way effect
residual error
is reduced to the final model for this analysis:
8'ijki (i + &i + (%+ik + 8i + {aj3)ij + {aj)ik + {aS)u + {f3j)jk + iP8)jt + (7 8)kl.
This model yields the transformed probability of detection at the given levels
for %,j, k and l.
31
Now that the factor effects have been identified, the analysis revolves
around determining the factor levels that result in the highest detection rate.
For this part of the analysis, the tables of means and tables of effects are
evaluated.
Ik... Grand Mean 1.072
Needles 4 6 8 Spacing Relative Absolute
/h... 0.999 1.09 1.128 fJ'.j.. 1.082 1.063
e 30 45 60 30 45 60
ik.k. 0.9723 1.098 1.147 lk..i 1.14 1.1104 0.9724
Table 2.4. The ANOVA tables of means list the transformed values.
Needles 30 0 45 o O Spacing 30 0 45 o O
4 0.926 1.027 1.045 Relative 0.978 1.111 1.157
6 0.979 1.113 1.176 Absolute 0.967 1.084 1.137
8 1.012 1.152 1.221
Needles 30 45 60 Spacing 30 45 60
4 1.054 1.028 0.915 Relative 1.148 1.118 0.980
6 1.161 1.123 0.985 Absolute 1.132 1.091 0.965
8 1.205 1.163 1.017
Spacing
Needles Relative Absolute 0 30 45 60
4 0.987 1.011 30 1.026 0.978 0.913
6 1.099 1.080 45 1.159 1.139 .0994
8 1.159 1.097 60 1.235 1.196 1.010
Table 2.5. The transformed values of the pairwise means are shown.
32
Referring to the ANOVA tables of means, the highest numbers in each
category reflect the best setting for a particular factor. On reading through
the tables of means, the conclusion is that a technique of 8 needles, relative
spacing, 9 = 60 and
corroborate this more fully, the interactions that are deemed significant are
analysed to verify that the main effect is not contradicted by an interaction.
Therefore, the table for Needles: 9 is reviewed and it is found that the setting
of 8 needles and 9 = 60 again yields the highest mean. The tables for all of
the pair-wise combinations are reviewed to determine that the best settings
yield the highest means in the interaction tables just as they did in the main
effect tables. This proves to be the case, so none of the interactions contradict
the conclusion drawn from the main effect information.
33
Number of Needles (4, 6, or 8) Q!l &2 &Z
Effect -0.07329 0.01723 0.05607
Spacing (Relative or Absolute) to
Effect 0.009612 -0.009612
e (30, 45, or 60) 7i 72 73
Effect -0.1001 0.02519 0.07486
(30, 45, or 60) 5i S2 S3
Effect 0.0678 0.03215 -0.09995
Table 2.6. The main factor level effects from the ANOVA output
are documented.
34
Spacing Relative Absolute
4 Needles 6 8 -0.02127 0.02127 -0.00017 0.00017 0.02143 -0.02143
e 30 45 60
4 Needles 6 8 0.02680 0.00244 -0.02925 -0.01031 -0.00127 0.01158 -0.01649 -0.00118 0.01767
e 30 45 60
Spacing Relative Absolute -0.004354 0.003708 0.000646 0.004354 -0.003708 -0.000646
30 45 60
4 Needles 6 8 -0.01271 -0.00292 0.01563 0.00363 0.00087 -0.00450 0.00907 0.00206 -0.01113
30 45 60
Spacing Relative Absolute -0.001740 0.004148 -0.002407 0.001740 -0.004148 0.002407
30 45 60
30 6 45 60 -0.01404 -0.02664 0.04067 -0.00621 0.00978 -0.00357 0.02025 0.01686 -0.03711
Table 2.7. The ANOVA table of effects for pairwise interactions
is displayed.
35
By using the values from the tables of effects, a probability for de-
tection is calculated for the optimal setting:
^ 3131 = (l + dz + (h +73 + <5i + (Q!/3)31 + (<27)33 + (<2Â£)31 + (%7)l3 + (^)ll + (7^)31
1.347918 = 1.072 + .05607 + .009612 + .07486 + .0678+
.02143 + .01767 + .00907 + .000646 + ^0.00174 + .02025
This result of 1.347918 is then transformed back (arcsine equation)
to yield a probability of 0.38948 for this setting.
1.347918 = 2 arcsin\f(p)
p = (sin(1.347918/2))2 = 0.38949.
Therefore, with the factors set to 8 needles, relative spacing, 9 = 60 and
4> = 30, the biopsy procedure has a 38.9% probability of detecting the cancer
given the tumor distribution model used. This estimated probability is best
used in comparisons with the other estimated probabilities rather than as
an absolute measure of detection rate. Therefore the conclusion from this
analysis is a relative ranking of treatments in terms of their detection rate.
Since the 1000 simulated specimens were the same for each treatment, the
ANOVA model determined the relative differences between detection rates of
various treatments, not necessarily providing enough data and results to draw
36
conclusions about absolute detection rates. Table 2.8 lists each experiment
and the probability of detection predicted from the factor effects model.
Treatment Parameters
Experiment Number of Spacing Needles Method e Predicted Probability
1 4 Relative 45 45 0.247
2 6 Relative 45 45 0.297
3 8 Relative 45 45 0.327
4 4 Absolute 45 45 0.251
5 6 Absolute 45 45 0.281
6 8 Absolute 45 45 0.291
7 4 Relative o O 45 0.265
8 6 Relative o O 45 0.337
9 8 Relative o O 45 0.369
10 4 Absolute o O 45 0.271
11 6 Absolute o O 45 0.324
12 8 Absolute o O 45 0.335
13 4 Relative 30 45 0.195
14 6 Relative 30 45 0.227
15 8 Relative 30 45 0.251
16 4 Absolute 30 45 0.205
17 6 Absolute 30 45 0.219
18 8 Absolute 30 45 0.224
19 4 Relative 45 60 0.200
20 6 Relative 45 60 0.236
21 8 Relative 45 60 0.260
22 4 Absolute 45 o O 0.208
23 6 Absolute 45 o O 0.227
24 8 Absolute 45 o O 0.232
25 4 Relative o O o O 0.192
26 6 Relative o O o O 0.247
27 8 Relative o O o O 0.273
28 4 Absolute o O o O 0.203
29 6 Absolute o O o O 0.241
Table 2.8. The probabilities of detection for one tumor
simulations are displayed.
37
Treatment Parameters
Experiment Number of Spacing Needles Method e Predicted Probability
30 8 Absolute 60 60 0.248
31 4 Relative 30 60 0.175
32 6 Relative 30 60 0.196
33 8 Relative 30 60 0.215
34 4 Absolute 30 60 0.189
35 6 Absolute 30 60 0.194
36 8 Absolute 30 60 0.195
37 4 Relative 45 30 0.257
38 6 Relative 45 30 0.314
39 8 Relative 45 30 0.346
40 4 Absolute 45 30 0.266
41 6 Absolute 45 30 0.303
42 8 Absolute 45 30 0.315
43 4 Relative 60 30 0.276
44 6 Relative 60 30 0.354
45 8 Relative o O 30 0.389
46 4 Absolute o O 30 0.287
47 6 Absolute o O 30 0.346
48 8 Absolute o O 30 0.360
49 4 Relative 30 30 0.208
50 6 Relative 30 30 0.246
51 8 Relative 30 30 0.272
52 4 Absolute 30 30 0.223
53 6 Absolute 30 30 0.243
54 8 Absolute 30 30 0.250
Table 2.8. (Cont.) The probabilities of detection for one tumor
simulations are displayed.
2.4.7 Clinical Distribution of Tumors
The biopsy simulations were conducted a second time on more real-
istic geometric glands. By using a clinically derived distribution of number
of tumors per gland, a better population was available for these biopsy sim-
ulations. A sample size of 1000 was again used but in this experiment, 1/4
38
of the glands had a single tumor, 1/2 had two tumors and the remaining 1/4
had 3 tumors. The total gland volume was again held to be less than 6.4
cc. This distribution is based on the analysis done by Daneshagari [2]. The
ANOVA results are found in the Appendix and yield the same optimal biopsy
procedure with a slightly different probability resulting from the factor effects
model.
By using the values from this second table of effects, a probability
for detection is calculated for the optimal setting:
^3131 = fi + d3 + (3i +73 + <$i + (oi(3)31 + (0:7)33 + (0^)31 + (Pi) i3 + (Pd)n + (7^)31
1.7535 = 1.429 + 0.0733 + 0.01507 + 0.07456 + 0.07091 +
0.02321 + 0.02650 + 0.01412 0.005442 0.004094 + 0.03638
Transforming this value (arcsine) yields a probability of detection for
the optimal setting of .5908. This probability of 59.08% is higher than the
38.9% achieved by the simulation using geometric models of one tumor as
would be expected. The predicted probabilities for each of the 54 experiments
given this distribution of tumors is shown in Table 2.9.
39
Treatment Parameters
Experiment Number of Spacing Needles Method e Predicted Probability
1 4 Relative 45 45 0.417
2 6 Relative 45 45 0.489
3 8 Relative 45 45 0.526
4 4 Absolute 45 45 0.417
5 6 Absolute 45 45 0.470
6 8 Absolute 45 45 0.482
7 4 Relative o O 45 0.427
8 6 Relative o O 45 0.524
9 8 Relative o O 45 0.569
10 4 Absolute o O 45 0.436
11 6 Absolute o O 45 0.514
12 8 Absolute o O 45 0.533
13 4 Relative 30 45 0.353
14 6 Relative 30 45 0.405
15 8 Relative 30 45 0.431
16 4 Absolute 30 45 0.354
17 6 Absolute 30 45 0.387
18 8 Absolute 30 45 0.388
19 4 Relative 45 60 0.358
20 6 Relative 45 60 0.408
21 8 Relative 45 o O 0.443
22 4 Absolute 45 o O 0.360
23 6 Absolute 45 o O 0.391
24 8 Absolute 45 o O 0.401
25 4 Relative o O o O 0.322
26 6 Relative o O o O 0.395
27 8 Relative o O o O 0.437
28 4 Absolute o O o O 0.332
29 6 Absolute o O o O 0.386
30 8 Absolute o O o O 0.403
Table 2.9. Given the distribution of one to three tumors,
the probabilities of detection predicted by the ANOVA model
are displayed.
40
Treatment Parameters
Experiment Number of Spacing Needles Method 9 Predicted Probability
31 4 Relative 30 60 0.326
32 6 Relative 30 60 0.357
33 8 Relative 30 60 0.381
34 4 Absolute 30 60 0.329
35 6 Absolute 30 60 0.341
36 8 Absolute 30 60 0.340
37 4 Relative 45 30 0.417
38 6 Relative 45 30 0.498
39 8 Relative 45 30 0.541
40 4 Absolute 45 30 0.425
41 6 Absolute 45 30 0.486
42 8 Absolute 45 30 0.504
43 4 Relative 60 30 0.436
44 6 Relative 60 30 0.541
45 8 Relative 60 30 0.590
46 4 Absolute o O 30 0.451
47 6 Absolute o O 30 0.537
48 8 Absolute o O 30 0.562
49 4 Relative 30 30 0.351
50 6 Relative 30 30 0.412
51 8 Relative 30 30 0.444
52 4 Absolute 30 30 0.359
53 6 Absolute 30 30 0.401
54 8 Absolute 30 30 0.407
Table 2.9. (Cont.) Given the distribution of one to three
tumors, the probablities of detection predicted by the
ANOVA model are displayed.
A selection of detection rates are graphed in Figure 2.7 to provide
visualization of the relative ranking of various treatments. The plots indicate
6 and 8 needles, relative spacing and all of the levels for 9 and
41
0 e = 30 6 needles
n e = 45 6 needles
A e = 60 6 needles
e = 30 8 needles
e = 45; 8 needles
e = 60 8 needles
Legend
Figure 2.7. The detection rates for several experiments
are graphed and the common treatment parameters are
noted for each experiment. This gives a visual under-
standing of the ranking of these treatments in terms of
their detection rate.
42
3. Digitized Specimen Data
3.1 Summary of Software Tool
An analysis program, written in C, was created to simulate needle
biopsies on clinical data provided by the University of Colorado Health Sci-
ences Center, Pathology Department. The clinical data were gathered from
autopsies, pathologically investigated and digitized [2].
The data for each specimen are stored as a 3-dimensional array of
information. The software uses an input hie to determine the characteristics of
a given experiment. These characteristics include the number of needles, the
initial placement of the first needle, the angles 9 and (f>, the spacing between
needles, and the needle diameter and length. In this manner, the analysis
software is flexible enough to handle a variety of simulations. The goal of
this biopsy simulation tool is to provide the means to experiment realistically
with various needle parameters on clinical data in order to determine any
correspondence between biopsy methods and detection rates.
The initial needle position is offset by the distance requested (the
^-offset entered by the user), with half of the needles entering the right lobe
43
and the other half entering the left lobe, in symmetry with each other. The
initial position is determined as an absolute (in cm) offset from the apex of the
gland. The other parameters are used to position each needle on the specimen
data set and determine how much of the specimen data is to be returned in
the needle biopsy. This specimen data is analyzed to determine whether and
how much tumor data is present in the needle. This information is available
to the user.
Having read the input hie with parameter values, the code begins a
loop on the specimen data hies requested for simulation. In this loop, the three-
dimensional specimen data hie is opened, the data are read into a 3-d array,
with all of the background trimmed off, the apex of the gland is located, and
the needle positions are translated into array coordinates. These coordinates
are fed to the biopsy routine which extracts the specimen data coinciding with
the needle and analyzes the data for tumor information. The information for
the entire experiment is stored in an output hie that documents the needle
parameters and the results for each image data set.
44
3.2 Specific Algorithms
3.2.1 Locating the Apex
The apex is defined as the first contact with the prostate when ap-
proaching it through the rectum, as done clinically. This location is used as
a landmark for positioning each biopsy needle. In the data set, the algorithm
that searches for this landmark proceeds as follows. The planes are defined as
shown in Figure 3.1.
Each pixel in the three-dimensional specimen file contains a number
indicating the type of data at that location. The possible types are gland,
tumor, capsule or background. Capsule data indicate those pixels defining the
boundary of the gland. The apex is indicated by the first pixel pointing to
capsule data. Therefore one plane of specimen data is evaluated at a time,
until a pixel that points to capsule data is found. This location is recorded as
the apex location.
45
Figure 3.1. The x,y,z axis, as defined for the digital
data, mimic those defined for the geometric models.
46
3.2.2 Establishing Needle Positions
The starting position, the location of the apex, serves as the land-
mark for each additional needle. From this starting point and the additional
user-supplied parameters (^-offset, distance between needles) all of the nee-
dle positions are calculated in terms of a vector. This vector, represented by
(x, y, z) coordinates, along with the
image data. The ^-offset is assumed to be in centimeters and is added to the
initial (x, y, z) of the starting position to locate the first needle position. Each
time any coordinate is changed, the new vector may be pointing to gland,
tumor, background, urethra or capsule data. The pixel represented by the
vector is read to insure that the needle entry position remains located on cap-
sule data. If it does not, the y coordinate is adjusted to make sure that the
entry position of the needle is on capsule data.
At this point in the algorithm, the first needle position is determined.
There are two ways to space the remaining needles. The user may enter
absolute distances in centimeters or a relative measure taken to be a percentage
of the z dimension of the gland. In addition, a zero percentage indicates that
47
the spacing is based on the number of needles in the biopsy; the needles are
equally spaced across the z-axis of the gland. The remaining needle positions
are calculated from the initial needle position: half of the needles are positioned
in the right lobe by using cf>, the remainder use 0 to rotate into the left lobe.
All of the needles have the x coordinate set to the midpoint of the gland in
the x dimension.
The user-entered distance, in centimeters, is converted to a specific
number of pixels. This z distance is added to the first needle position to obtain
the second needle position, added to the second to obtain the third, etc. Each
time a needle position is calculated, the coordinates are evaluated to insure
that they point to capsule data. If the gland is too short in the z direction to
handle all the needles requested, the experiment proceeds with the number of
needles that do stay within the gland.
The experiments that depend on a relative distance between needles,
require additional analysis of the yz slice before determining the z offset. The
z diameter of the particular yz slice is calculated. The z distance required for
a needle of a specific length, inserted at a specific angle is then subtracted from
this z diameter. Rather than having the last needle pierce more background
than gland data, this subtraction enables the full number of needles to be
48
inserted into the gland. This new z diameter is then divided into the number
of segments required by the specified percentage. If the user indicates 0% for
the distance spacing, the software calculates the distance based on the number
of needles requested and the diameter of the yz plane.
3.3 Simulations
The 54 treatments used in the geometric model were used as biopsy
procedures on a maximum of 53 digitized clinical specimens. Some of the
biopsy techniques were simulated on only 52 of these clinical specimens. Table
3.1 shows the results from these simulations on the digitized clinical data.
The table documents both the multiple-tumor geometric model hit rate as
well as the number of hits resulting from the same biopsy on the digitized
clinical data. The first five columns indicate the experiment number and the
biopsy parameter settings for the four variables, number of needles, spacing
method, 9 and
per 1000 simulations of the geometric model. The column labelled Number
of Hits is the number of hits per number of digitized clinical samples. Most
experiments were run on all 53 of the digitized specimens. However, some
of the simulations resulted in an error on one or more of the specimens and
these specimens were then removed from the experiment. The final column,
49
labelled Clincial Detection Rate is the rate for the experiments on the digitized
specimens.
Number Number Clinical
of Spacing Detection of Detection
Experiment Needles Method e Rate Hits Rate
1 4 Relative 45 45 0.417 ff 53 0.1509
2 6 Relative 45 45 0.489 0.2075
3 8 Relative 45 45 0.526 8 ? ff f8 ? 53 0.1538
4 4 Absolute 45 45 0.417 0.1698
5 6 Absolute 45 45 0.470 0.2075
6 8 Absolute 45 45 0.482 0.1923
7 4 Relative 60 45 0.427 0.1698
8 6 Relative 60 45 0.524 9 i S fl ff f I 53 0.1731
9 8 Relative 60 45 0.569 0.2453
10 4 Absolute 60 45 0.436 0.1887
11 6 Absolute 60 45 0.514 0.2264
12 8 Absolute 60 45 0.533 0.2264
13 4 Relative 30 45 0.353 0.1321
14 6 Relative 30 45 0.405 0.2264
15 8 Relative 30 45 0.431 9 Â¥ Â¥ ? ? 53 0.1698
16 4 Absolute 30 45 0.354 0.1321
17 6 Absolute 30 45 0.387 0.1321
18 8 Absolute 30 45 0.388 0.1698
19 4 Relative 45 60 0.358 0.1132
20 6 Relative 45 60 0.408 9 ff f s Â§ 53 0.1698
21 8 Relative 45 60 0.443 0.2115
22 4 Absolute 45 60 0.360 0.1509
23 6 Absolute 45 60 0.391 0.1887
24 8 Absolute 45 60 0.401 0.1887
25 4 Relative 60 60 0.322 8 ? ? f ? ? 52 0.1509
26 6 Relative 60 60 0.395 0.1538
27 8 Relative 60 60 0.437 0.1731
28 4 Absolute 60 60 0.332 0.1154
29 6 Absolute 60 60 0.386 0.1731
30 8 Absolute 60 60 0.403 0.1731
Table 3.1 The detection rates for the geometric and clinical
simulations are displayed.
50
Number Number Clinical
of Spacing Detection of Detection
Experiment Needles Method e Rate Hits Rate
31 4 Relative 30 60 0.326 5 ? ? f f f f i 51 f 58 5? ? 58 58 ? 51 51 f f? 58 f f 58 52 0.0962
32 6 Relative 30 60 0.357 0.0962
33 8 Relative 30 60 0.381 0.1731
34 4 Absolute 30 60 0.329 0.0769
35 6 Absolute 30 60 0.341 0.0769
36 8 Absolute 30 60 0.340 0.0769
37 4 Relative 45 30 0.417 0.1154
38 6 Relative 45 30 0.498 0.1923
39 8 Relative 45 30 0.541 0.2308
40 4 Absolute 45 30 0.425 0.1538
41 6 Absolute 45 30 0.486 0.1923
42 8 Absolute 45 30 0.504 0.2115
43 4 Relative 60 30 0.436 0.1154
44 6 Relative 60 30 0.541 0.1923
45 8 Relative 60 30 0.590 0.1887
46 4 Absolute 60 30 0.451 0.1538
47 6 Absolute 60 30 0.537 0.2308
48 8 Absolute 60 30 0.562 0.2308
49 4 Relative 30 30 0.351 0.1000
50 6 Relative 30 30 0.412 0.2115
51 8 Relative 30 30 0.444 0.1923
52 4 Absolute 30 30 0.359 0.1154
53 6 Absolute 30 30 0.401 0.1538
54 8 Absolute 30 30 0.407 0.1923
Table 3.1 (Cont.) The detection rates for the geometric and
clinical simulations are displayed.
3.4 Geometric Model vs Clinical Model
Comparison of the detection rates between the geometric model and
the clinical model reveals that the geometric simulation produces much higher
rates than its clinical counterpart. In attempting to explain this discrepency,
several characteristics of the experiment are noted.
51
The distribution of the tumors and the total tumor volume in a given
specimen can impact the detection rate of a treatment. A comparison of the
tumor volumes is graphically displayed in Figures 3.2 and 3.3. As shown by
the histograms, the tumor volumes for the autopsy data tend strongly toward
small (< .5 cc) volumes. In contrast, the geometric model produces tumors
with volumes more equally spaced across the spectrum of possible volumes.
In fact, 80% of the autopsy specimens have a total tumor volume less than .5
cc. In contrast, only 49% of the geometric gland models have a total tumor
volume in this range. This difference in the size of the tumors can explain
some of the difference in detection rate between the clinical and geometrical
models.
A second difference is that the relative ranking of detection rates for
the digital data simulations is different than the ranking of detection rates for
the geometric simulations. An example of this discrepency is that experiment
9, ( 8 Needles, Relative Spacing, 9 = 60,
of 0.2453 or 13 hits out of 53 samples. This detection rate is better than
the detection rate of experiment 45, ( 8 Needles, Relative Spacing, 9 = 60,
4> = 30) which is the optimal biopsy as indicated by the geometric simulation.
This difference may be due to the fact that only 53 specimens were used in the
52
digital simulation in contrast to the 1000 models constructed for the geometric
simulation.
3.5 Optimal Technique vs SRSCB
The optimal technique, determined by the geometric model, consists
of 8 needles, relative spacing, 9 = 60 and
uses 6 needles, absolute spacing, 9 = 45 and
simulated on the geometric model as well as the digitized clinical data. The
optimal technique actually proved slightly worse at tumor detection than the
SESCB procedure when simulated on the clinical data. In fact, the optimal
method detected tumor in 10 out of 53 specimens (.189). The SESCB method
detected tumor in 11 out of 53 specimens (.207). These results compare with
the overall results from the geometric simulation as follows. The SESCB had
a detection rate of .47 and the optimal had a detection rate of .59 on the
1000 geometric models. This discrepency is addressed by noting the sample
size available in the two simulations and the distribution of tumor volumes as
noted earilier.
53
25
.05 .5 1 1.5 2 2.5 3 3.5 4 4.5 5
Sum of Tumor Volume
Figure 3.2. The histogram of the clinical data shows the
tumor distribution by volume.
54
Figure 3.3. The histogram of the geometric data shows
the tumor distribution by volume.
55
4. Geometric Model Volume Estimates
4.1 Tumor Volume Estimates
The total volume of tumor in a gland is an important piece of infor-
mation for clinicians who use it to improve both the diagnosis and treatment
plan for a patient. The ultrasound used during a biopsy accurately measures
the prostate gland volume so that an approximate ratio of tumor to gland
volume can be used to estimate the volume of tumor in a gland. These sim-
ulations offered an avenue to explore a means of approximating this volume
ratio by using the volume of the needle that contains tumor information and
the total volume of the needle.
Three methods are used to estimate the amount of tumor intersected
by the needle. The needle can be modeled by a line, a strip, or a cylinder
in one, two, and three dimensions, respectively. The length and diameter
of the needle are constant and are set by clinical limits. This incremental
approach began in one dimension in order to simplify aspects of the simulation
during software verification. As the research progressed, the two- and three-
dimensional needles were introduced in order to model the actual biopsy more
56
closely.
The first method of estimating the volume ratio is R = where
Vi represents the tumor volume within a single needle, Vi represents the volume
of that same needle, and n is the number of needles. This ratio is referred to as
the average of the ratios. A second estimator of volume ratio is r = -||t, where
Vi is the tumor volume within a single needle and Vi is the total volume of that
needle. This ratio is considered the ratio of the average volumes since ^ ]T"=i
is the average tumor volume and ^ ]T"=i V* is the average needle volume. This
yields r = ^. Both methods of estimating the ratio are documented
below.
Figure 4.1. This illustration of the gland, tumor and
one-dimensional needle depicts the variables used in de-
termining the volume ratio estimator.
57
4.1.1 One-Dimensional Analysis Line Model
In this first model, we represent the needle by a line segment as shown
in Figure 4.1. The length of the needle that contains tumor pixels, It, is the
difference between t\ and f2, the two roots of equation ( 2.1): lT =\ t\ t2 |- A
needle length, L, of 1.25 cm is used in the estimate of volume ratio. Thus the
ratio l-j- is an approximation of the true volume ratio p^y', that is, l-j- ~ p^y-
4.1.2 Two-Dimensional Strip Model
In the two-dimensional case we represent the needle by a strip. The
needle entry points (a^t/o^o) are used as a starting point in the two-dimensional
analysis. Two lines are created, each offset from this starting coordinate by the
needle radius. The intersection between these two lines and the tumor ellipse is
determined and the roots of the two resulting quadratics are used to compute
both the occurrence of a detection and the amount of tumor within the needle.
In this case, the estimate of the volume ratio is the area of the tumor over the
area of the needle. Figure 4.2 defines the lengths used in determining the area.
The area of the tumor is calculated by estimating the needle length which con-
tains tumor data with the roots of intersection: lt 1 =| tn~ti2 \;lt2 = | t2l t22 I-
58
The area of tumor is then given by ar = |(/
of the needle. The area of the needle is calculated in the same way using the
length of the needle: aN = |(L + L). Thus ^ serves as an estimate of the
true tumor to gland volume ratio, p^y.
Figure 4.2. This illustration of the gland, tumor and
two-dimensional needle depicts the variables used in de-
termining the volume ratio estimator.
4.1.3 Three-Dimensional Cylinder Model
The three-dimensional analysis models the needle as a cylinder and is
similar to the two-dimensional case in that the entry point of the needle is again
used as a center coordinate for four needles. In this case, the four needles are
constructed symmetrically about this point to generate a cylindrical needle.
Then intersections and roots are computed. A more accurate representation of
the volume ratio is obtained using the volume of the tumor within the needle
59
over the volume of the needle. In this case, the length is estimated to be the
maximum of the lengths determined from the four sets of intersection roots:
lt = max(| tn ti2 |, | t2i t22 |, | hi h2 |, | hi h2 |).
The volume of the needle depends on the known diameter and length: vn =
7r(|)2(L). The estimated volume of the tumor depends on the needle lengths
which contain tumor data as shown in Figure 4.3. This leads to the tumor
volume estimate vt = 7r(|)2(lt). The ratio ^ estimates the true volume ratio,
PGV'
4.2 Experiment Setup
A second set of experiments utilizing the geometric model involved
exploring the question of accurately estimating the tumor volume to gland vol-
ume ratio. The experiment simulated a biopsy on a single specimen, increasing
the number of needles each iteration and comparing the volume ratio obtained
from the biopsy sample to the known volume ratio. The parameters for the
biopsy include the optimal angles 9 and
vestigation The optimal number of needles and distancing method determined
from the ANOVA analysis do not apply to this experiment since the number
of needles increases from 6 to 20 and the distancing of these needles is done so
that the maximum1 number, 20, are equally spaced. The maximum number
60
Figure 4.3. This illustration of the gland, tumor and
three-dimensional needle depicts the variables used in
determining the volume ratio estimator.
61
of needles was set at 20 due to clinical limitations. The spacing of the nee-
dles is dependent on the maximum number so that from one iteration to the
next 2 needles are in the same exact location, yielding the same detection
information. In this manner the comparison between a specimen biopsied by
6 needles and the same specimen biopsied by 10 needles is not dependent on
needle position, but instead compares the gain made by the four additional
needles.
The simulation is executed on 1000 specimens, varying the number
of needles from 6 to 20 in increments of 2. The output from this experiment
consists of a hie for each specimen that contains the results of each set of
needles including the tumor to needle volume ratio achieved and the associated
estimates (R = ^ XX'itAn) and r = -Â§y-). In addition, the actual tumor to
gland volume ratio is noted.
4.3 Results
The results of this experiment were not as anticipated as there ap-
pears to be no pattern of convergence to the actual tumor to gland volume
ratio within the limit of 20 total needles. However, much was learned from
this exercise that provided insight into the next series of investigations. First,
it is noted that in the great majority of cases, a single 8-needle biopsy tends
62
to overestimate the true tumor to gland volume ratio. Secondly, a comparison
between the two methods of calculating the error leads to the conclusion that
the sum of the ratios is the more accurate method at least in this set of limited
trials.
4.4 Interactive Utility
Using the preceding idea as a starting point, an interactive software
tool was created to investigate the volume ratio question in greater detail.
This tool prompts the user for a random number, seeds the random number
generator, creates a gland containing a single tumor and conducts the optimal
8-needle biopsy. This optimal biopsy has 8 needles, relative spacing between
the needles, 9 = 60 and
position, the amount of tumor volume contained in the needle and an estimate
as to the volume ratio of tumor to gland, are displayed for the user. At this
point, the user is able to choose the location for the next needle. This new
needle is then simulated and the tumor volume information it retrieves is
incorporated into the volume ratio. The user can continue this process of
requesting additional needles and evaluate the estimated volume ratio and its
error from the true ratio. A maximum of 20 needles can be simulated on
a single gland, beginning with the 8 original needles and accumulating the
63
additional 12 based on user specifications.
This area of research is full of open-ended questions where tools such
as this interactive utility can help shed light on answers. With involvement
from clinicians and medical researchers, experiments can be designed to gather
more information regarding the two issues of volume ratio and optimal biopsy
technique. In addition, using the results of this body of research, more real-
istic tumor distributions and geometric models can be constructed to better
understand the impact of treatment parameters on detection rate.
64
A. APPENDIX ANOVA Definitions
A dot in the subscript indicates averaging over the variable repre-
sented by that index.
The number of levels for Number of Needles: a = 3.
The number of levels for Distancing Method;, b = 2.
The number of levels for 9: c = 3.
The number of levels for 0: d = 3.
The number of specimens = 1000.
The number of experiments: abed = 54.
In general, Y is an observation, Y is the mean of observations, /i is the true
mean and (i is the least squares estimate of the true mean.
Yijki is the observed detection rate at the factor levels indicated by
i,j, k and l.
F ... is the mean of all specimens over all treatment levels i,j, k, l. It
indicates the overall detection rate for the entire experiment.
i abed
Y = X X X X Ym
abed
i=1j=1k=11=1
65
SSTO, or total sum of squares is a measure of the total variability
of the observations without consideration of factor level.
SSTO = 't't{YijU-Y...f
i=lj=1k=1 1=1
dfssro is the total degrees of freedom. The SSTO has abdc 1 =
54 1 degrees of freedom. One degree of freedom is lost due to the lack of
independence between the deviations.
SSTR or treatment sum of squares measures the extent of differ-
ences between estimated factor level means and the mean over all treatments.
The greater the difference between factor level means (treatment means), the
greater the value of SSTR.
SSTR = 12(Ym y....)2
i=lj=lk=ll=l
df sstr is the degrees of freedom. There are r 1 degrees of freedom
for the SSTR, where r is the number of parameters in the model. In the full
model, r = abed, = 54, the total combinations of factor levels. In the model
used for this simulation, r = (a1) + (5l) + (cl) + (dl) + (a1)(5l) + (a
l)(c- 1) + (a l)(d 1) + (b l)(c 1) + (& l)(d- 1) + (c- l)(d- 1) = 26.
One degree of freedom is lost due to the lack of independence between the
deviations.
66
SSE or error sum of squares, measures variability which is not ex-
plained by the differences between sample means. It is a measure of the varia-
tion within treatments. A smaller value of SSE indicates less variation within
simulations at the same factor level.
SSE = Â£ Â£ Â£ Â£(%:, YijUf
i=ij=ik=il=i
dfssE is the degrees of freedom. Since SSE is the sum of the errors
across factor level, the degrees of freedom is the sum of the degrees of freedom
for each factor level. It is the total number of simulations minus r, abed r.
MSI: is the mean square for error defined by MSE = SSE/dfssE-
Note: The above definitions imply SSTO = SSTR + SSE. Due to this
relationship, this process is referred to as the partitioning of the total sum of
the squares.
In order to measure the variability within a factor level, the fac-
tor sum of square terms are computed. These terms are integral in the test
statistic applied to determine whether a factor main effect is significant. In
addition, interaction sum of squares are computed to measure variability of
the interactions.
67
The factor A sum of squares corresponds to the number of needles
factor.
SSA = bcdjr^iY F...)2
i=1
Similar factor sum of squares are computed for each of the factors:
Factor Sum of Square Mean Sum of Square
Number of Needles Spacing Method e SSA = bcd^ ,(T,.. f SSB = acdVf] ,(T F...)2 SSC = abdEt=i(Y..k. ^ F...)2 SSD = abcYlf=i(Y ...i F...)2 MSA = SSA/(a 1) MSB = SSB/{b- 1) MSC = SSC/(c 1) MSD = SSD/(d 1)
The interaction sum of squares are computed as well for use in the
F-test on the interactions. The first three pair-wise interaction sum of squares
are shown below. The others are computed in the same manner.
68
Number of Needles: Spacing SSAB = cdZti E$=i (Xu.. V,. MSAB = SSAB/{a 1 ){b 1) - y.j.. + Y..y
Number of Needles: 9
SSAC = bdT*=1 ELi . Yk, +F...)2
MS AC = SSAC/(a l)(c 1)
Number of Needles:
The treatment means, jiijki, indicate the mean for the treatment at
the ijkl levels of the respective factors.
The overall mean, /i, is the mean across all factors and all levels
(across all i,j, k, i).
69
References
(1) Hodge K.K., McNeal J.E., Terris M.K., Stamey T.A. Random sys-
tematic versus directed ultrasound guided transrectal core biopsies of
the prostate. Journal of Urology 142 (1989): 71-74.
(2) Daneshgari, Firouz M.D., Taylor, Gerald D. PhD, Miller, Gary J.
M.D., PhD, Crawford, E. David M.D. Computer Simulation of the
Probability of Detecting Low Volume Carcinoma of the Prostate with
Six Random Systematic Core Biopsies. Urology 45 (April 1989): 604-
609.
(3) McNeal, John M.D. Normal Histology of the Prostate The American
Journal of Surgical Pathology (1988): 619-633.
(4) Neter, John, \Vasserman. William, Applied Linear Statistical Mod-
els, Richard D. Irwin, Inc 1974.
70