Citation
Road accident prediction modeling and diagnostics of accident causality

Material Information

Title:
Road accident prediction modeling and diagnostics of accident causality a comprehensive methodology
Creator:
Kononov, Jake
Publication Date:
Language:
English
Physical Description:
xv, 159 leaves : illustrations ; 28 cm

Subjects

Subjects / Keywords:
Traffic accidents -- Forecasting -- Mathematical models ( lcsh )
Traffic safety -- Mathematical models ( lcsh )
Traffic accidents -- Forecasting -- Mathematical models ( fast )
Traffic safety -- Mathematical models ( fast )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Bibliography:
Includes bibliographical references (leaves 157-159).
General Note:
Department of Civil Engineering
Statement of Responsibility:
by Jake Kononov.

Record Information

Source Institution:
|University of Colorado Denver
Holding Location:
|Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
50741610 ( OCLC )
ocm50741610
Classification:
LD1190.E53 2002d .K66 ( lcc )

Full Text
ROAD ACCIDENT PREDICTION MODELING AND DIAGNOSTICS OF
ACCIDENT CAUSALITY A COMPREHENSIVE METHODOLOGY
by
Jake Kononov
B.S., University of Colorado at Denver, 1982
M.S., University of Colorado at Denver, 1990
A thesis submitted to the
University of Colorado at Denver
in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
Civil Engineering
2002


2001 by Jake Kononov
All rights reserved


This thesis for the Doctor of Philosophy
Degree by
Jake Kononov
has been approved
by
~7 Sarosh Khan
Keith Molenaar

f &/
Date


Kononov, Jake (Ph.D., Civil Engineering)
Road Accident Prediction Modeling and Diagnostics of Accident Causality
A Comprehensive Methodology
Thesis directed by Professor Bruce N. Janson
ABSTRACT
This dissertation formulated a comprehensive methodology for road accident
prediction and diagnostics of accident causality. It provides conceptual blueprint
and the analytical framework for the development of the Highway Safety Manual
currently contemplated by the Transportation Research Board. Accident
prediction models known as Safety Performance Functions (SPF) were
developed for the following facilities: rural 4-lane interstate freeways in the
mountainous terrain, rural 2-lane, arterial highways in the mountainous terrain,
rural 4-lane interstate freeways in rolling and flat terrain. Accident models were
developed using Poisson distributional assumptions. Additionally conceptual
formulation of the level of service concept applicable to highway safety was
developed.
Diagnostics of accident causality is performed by using pattern recognition
algorithm and direct diagnostic methods introduced in this dissertation. This
methodology is based on the idea that traffic accidents can be viewed as
independent Bernoulli trials and that it is possible to detect deviation from the
random statistical process by computing cumulative probability for each of the
IV


accident characteristics. In the course of this study a framework of normative
parameters to provide a knowledge base for the diagnostics of safety problems
was developed. Development of the diagnostic knowledge base and of pattern
recognition algorithm led to the following finding: Existence of accident patterns
susceptible to correction may or may not be a accompanied by the over-
representation in accident frequency detected by the safety performance
functions or high accident rates. The implication of this finding on the road
safety policy is as follows: cost-effective safety improvement counter-measures
may be constructed at locations which exhibit overall accident frequency well
within expected range. This point is generally overlooked by the public agencies
funding road safety improvement projects.
This abstract accurately represents the content of the candidates dissertation.
I recommend its publication.
Signed
Bruce N. Janson
v


A woman of valor, who can find her?
Her price is far above rubies,
She works with her hand
as well as her mind.
A Woman of Valor (XV Century Hebrew Text)
Unknown Author
DEDICATION
To my mother, who is a true woman of valor. A courageous physician, a
respected scientist in the field of immunology and AIDS research, a loving
mother and a devoted wife.


ACKNOWLEDGMENTS
I am very grateful to each of the members of my doctoral committee for their
support during the completion of this thesis.
Very special thanks to my advisor, Professor Janson, for the wise guidance and
instructions he has generously given me throughout my studies.


CONTENTS
Figures................................................................xi
Tables................................................................xiv
Chapter
1. Introduction....................................................... 1
1.1 Research Problem................................................. 1
2. Review of Extant Literature.........................................7
2.1 Studies Relating Specific Geometric Features to Safety ...........7
2.2 Studies Relating Exposure to Safety ............................. 11
2.3 Studies of Diagnostic Methodologies.............................. 14
3. Research Objectives and Methodology............................... 17
3.1 Overview.......................................................... 17
4. Modeling Relationships Between Traffic
Volume and Traffic Safety.......................................... 19
4.1 Safety Performance Functions...................................... 19
4.2 Philosophy and Methodology of Model Fitting ......................25
4.2.1 Choice of the Model Form.........................................26
4.2.2 Choice of the Underlying Distributional Assumptions..............29
VIII


4.3.1 Dataset Preparation............................................... 36
4.3.2 Selection of Minimum Segment Length and
its Effect on the Model...........................................43
4.3.3 Removal of Outliers...............................................46
4.4.1 Exploratory Analysis and Model Fitting ............................48
5. Behavioral Interpretation and Levels of Relative Safety ............66
5.1 Behavioral Zones...................................................66
5.2 Levels of Service of Safety and Additional Benefits of SPF ....... 69
6. Direct Diagnostics and Pattern Recognition Analysis ................75
6.1 Direct Diagnostics.................................................75
6.1.1 Example of Application of Direct Diagnostics
Methodology ..................................................... 79
6.2 Pattern Recognition .............................................. 82
6.3 Analytical Framework for Pattern Recognition of
Accident Occurrence on Roadway Segments ............................90
6.4 Chapter Summary and Menu of Normative Characteristics............. 97
7. Integrated Use of Safety Performance Functions
and Diagnostic Techniques........................................... 111
7.1 Diagnostic Expert System......................................... 111
7.2 Application of Diagnostic Methodology to a Roadway
Segment........................................................... 118
IX


8. Conclusions
132
8.1 General....................................................... 132
8.2 Safety Performance Functions.................................. 132
8.3 Diagnostics and Pattern Recognition .......................... 134
8.4 Implications on Policy of Highway Planning and Design......... 135
Appendix......................................................... 138
A. Rural Flat and Rolling 4-Lane Freeways Abbreviated Dataset ... 139
B. Rural Mountainous 2-Lane Arterials Abbreviated Dataset........ 145
C. Rural Mountainous 4-Lane Freeways Abbreviated Dataset ........ 154
References........................................................ 157
x


FIGURES
Figure
4.1 Smoothed Frequency for Total Accidents .....................27
4.2 SPF for 2-Lane Roads........................................28
4.3 Neural Networks Model ......................................28
4.4 Data-set Preparation-Freeways.............................. 38
4.5 Data-set Preparation-Arterials..............................38
4.6 Scatter Plot-Mountainous Freeways ..........................48
4.7 Scatter Plot 2-Lane Mountainous Arterials.................49
4.8 Scatter Plot Flat and Rolling Freeways ...................51
4.9 Fitted Model Mountainous Freeways...........................52
4.10 Mountainous Freeways CURE Plot..............................53
4.11 Fitted Model Mountainous Arterials..........................54
4.12 Mountainous Arterials CURE Plot ............................55
4.13 Fitted Model Flat and Rolling Freeway.......................56
4.14 Flat and Rolling Freeways CURE Plot.........................57
4.15 Rural Flat and Rolling Freeways with Empirically Calibrated
Variance....................................................59
XI


4.16 Comparison Between Poisson and Empirically and
Incrementally Calibrated Error Structure .................... 61
5.1 Driver Behavioral Zones and Levels of Relative Safety...........67
5.2 Safety Performance of 4 and 6 Lane
Mountainous Freeways ......................................... 73
6.1 Accident Distribution by Type.................................. 79
6.2 Accident Diagram................................................81
6.3 Accidents by Segment........................................... 83
6.4 Case History SPF ...............................................86
6.5 Night-Time Accident Concentration Graph.........................87
6.6 Pattern Recognition Algorithm Diagram...........................91
6.7 Pattern Intensity Graph ........................................96
7.1 Expert System-Core Components................................. 113
7.2 Conceptual Composition of Diagnostic Expert
System for Highway Safety ................................... 115
7.3 Study SPF..................................................... 118
7.4 Accident Distribution by Type................................. 119
7.5 Fixed Object Collisions Concentration Graph .................. 122
7.6 Pattern Intensity Graph ...................................... 124
XII


7.7 Fixed Object Collisions by Direction-Cluster 1.................. 129
7.8 Fixed Object Collisions by Direction-Cluster 2.................. 130
7.9 Accident Concentration GIS Map.................................. 131
XIII


TABLES
Table
4.1 Dataset Extract Mountainous Freeways ..................40
4.2 Dataset Extract 2-Lane Mountainous Arterials...........41
4.3 Dataset Extract Flat and Rolling Freeways ..............42
4.4 Relationship Between R2 and Segment Lengths...............44
4.5 Control Limits Rural Flat and Rolling 4-Lane Freeways..63
4.6 Control Limits Rural Mountainous 2-Lane Arterials.......64
4.7 Control Limits Rural Mountainous 4-Lane Freeways....... 65
5.1 Level of Service Criteria in Reference to Expected Safety
Performance for Rural Flat and Rolling 4-Lane Freeways...70
5.2 Level of Service Criteria in Reference to Expected Safety
Performance for Rural Mountainous 2-Lane Arterials ......71
5.3 Level of Service Criteria in Reference to Expected Safety
Performance for Rural Mountainous 4-Lane Freeways........ 72
6.1 Tabulation of Pattern Intensity Score.................... 96
6.2 Normative Percentages for Diagnostics -
Rural Flat 2-Lane Undivided .............................99
XIV


6.3 Normative Percentages for Diagnostics -
Rural Rolling 2-Lane Undivided............................. 101
6.4 Normative Percentages for Diagnostics -
Rural Mountainous 2-Lane Undivided ....................... 103
6.5 Normative Percentages for Diagnostics -
Rural Flat 4-Lane Interstate............................... 105
6.6 Normative Percentages for Diagnostics -
Rural Rolling 4-Lane Interstate .......................... 107
6.7 Normative Percentages for Diagnostics -
Rural Mountainous 4-Lane Interstate........................ 109
7.1 Pattern Intensity Score..................................... 123
xv


'There is nothing as practical as a good theory.
1. Introduction
1.1 Research Problem
Traffic accidents are considered an expected byproduct of highway travel but
just how many accidents we should "expect" per unit of traffic exposure over a
unit of time is not altogether clear at present. In fact it constitutes a highly
complex problem faced by the transportation engineering profession today. If
expectations are not well defined or clearly understood, then the question
becomes how is it possible to identify the deviation from the norm and then do
something about it? Despite many years of modern road building this most
fundamental question of highway safety has not been adequately addressed to
date.
Transportation engineers dealt successfully over the years with the question of
highway capacity. The problem was clearly formulated by the Highway
Research Board in 1944 when the Committee on Highway Capacity was first
established. The first edition of the Highway Capacity Manual (HCM) was
published in 1950, it provided initial fundamentals of capacity for
uninterrupted-flow facilities, signalized intersections, weaving sections and
ramps (TRB,1950). Since that time there have been four new editions of the
HCM (TRB, 2000). The relationship between traffic volumes, capacity and level
1


of service for different types of highway facilities is reasonably well understood
at present. Our understanding of highway capacity is enhanced with each
successive publication of the Highway Capacity Manual by the Transportation
Research Board (TRB).
In contrast to highway capacity, the relationship between traffic volume, physical
characteristics of roads and safety is not well understood or known, at least not
with the kind of precision customary in other engineering disciplines. There has
not been a concerted effort by the TRB to produce a Highway Safety Manual
similar in its intent and scope to the HCM. Conceptually such a document
should systematically examine the expected accident byproduct of roadway
segments (freeways, arterials, 2-lane roads etc.) as well as junctions
(intersections, interchanges). Despite countless studies exploring various
aspects of highway safety of transportation facilities there is no consensus
among transportation engineers as to what constitutes a safe or unsafe highway,
intersection or interchange.
A parallel could be drawn in the field of medicine, how can a physician prescribe
blood pressure reducing medication if there is no consensus in the medical
profession as to what constitutes normal or acceptable blood pressure levels?
Perhaps the question should be reformulated as follows: Should a physician
prescribe medicine effecting blood pressure to a patient whose blood pressure
he can not measure?
Over the years the questions dealing with highway safety have not been
2


formulated or addressed with sufficient clarity or specificity. The difficulties
normally associated with statistical analysis of highway safety problems are
attributed to the large number of interrelated factors contributing to accidents.
These factors generally include human behavior, environmental conditions, and
vehicle and roadway characteristics. The problem is further complicated by the
lack of reliable exposure data coupled with difficulties of obtaining detailed
geometric design information. Earlier research efforts examined this relationship
using different approaches and statistical techniques and yet because of the
complexity of the issue and problems with obtaining reliable data no conclusive
results have been drawn.
Furthermore, traditionally there is an institutional gap between those engineers
who identify hazardous locations and those who design and build roads. These
functions are generally performed separately, which often prevents the
development of a factual knowledge base on matters of highway safety.
Despite insufficient factual knowledge on the subject and absence of consensus
among the professionals, a great majority of efforts initiated by transportation
engineers to improve safety result in accident reduction. Having acknowledged
this important fact, I would like to pose the following question: Is it responsible
or ethical not to make the most out of limited financial resources allocated to the
improvement of safety on public roads?
Over the last 50 years of modern road building the safety of highways has been
measured in accident rates. The use of accident rates is implicitly based on the
3


assumption that the number of accidents on a segment of road is directly
proportional to the amount of traffic. In essence the linearity of the relationship
between exposure and safety implies that driver responses to a specific set of
roadway characteristics do not change with an increase in traffic. This
assumption makes for a number of conceptual difficulties, but more importantly
there is significant amount of empirical data to the contrary. In an effort to
normalize safety on the basis of traffic exposure the concept of accident rate is
widely used throughout the country. Virtually all Departments of Transportation
across the US publish annual reports compiling accident rates typical of roads
of various functional classes. In the absence of other widely accepted
methodology, the main appeal of using accident rate appears to be simplicity.
Its use however often does not reflect the reality of the problem at hand and
often leads to poor investments in safety improvements.
The relationship between the amount of traffic measured in ADT and accident
count for a unit of road section over a unit of time is complex and dynamic. It
reflects the interaction between driver behavior, vehicle characteristics and
roadway environment. In an effort to understand this complex phenomenon it
is critical to consider that changes in driver behavior are induced not only by the
physical characteristics of the roadway and environment but also by other
drivers. In fact it can be said that drivers influence each other differently over a
wide range of traffic exposure and composition.
There is an emerging consensus among traffic safety researchers that a
non-linear relationship exists between traffic exposure and safety. This
relationship is reflected by the Safety Performance Functions (SPF) calibrated
4


for various classes of roads. One of the main uses of SPF is to identify locations
with potential for accident reduction. While this application is certainly important,
its use is limited to identifying sites exhibiting accident frequency higher than
expected for a specific facility at a specific level of Average Daily Traffic (ADT).
SPF provides no information, however, related to the nature of accident
occurrence, it only speaks to the magnitude of the problem. Without being able
to properly and systematically relate accident frequency and severity to roadway
geometries, traffic control devices, roadside features, roadway condition, driver
behavior or vehicle type it is not possible to develop effective counter-measures.
In other words, there can be no effective treatment without accurate
diagnosis.
In the field of medicine, physicians are expected to spend a minimum of 3 years
in apprenticeship after graduation from Medical School. During the periods of
internship and residency, physicians learn how to recognize diseases as well as
how to treat them. In contrast to medicine, transportation engineers are only
trained in administering treatment (i.e designing road safety improvements)
without learning the science of diagnostics. There is no established course of
instruction at the graduate level civil engineering curriculum that provides a
definitive methodology on how to relate accident causality to the roadway
environment. There is also very little reliable information on this subject in
research literature. Most research efforts are focused on the development of
accident prediction models and identification of "black spots". It is somehow
implied that transportation engineering professionals will always know how to
treat a high accident location once it has been identified, when in reality very little
5


is known on the subject.
In the course of performing in-depth project level safety assessments for
hundreds of sites a new methodology was developed to provide guidance in the
diagnostics of safety problems and development of appropriate
counter-measures. A data-set was compiled using accident and traffic data for
different classes of roads over a period of 8 years. A framework of 84 normative
parameters was developed to provide guidance in the diagnostics of accident
causality and recognition of accident patterns. Considering that traffic accidents
can be viewed as random Bernoulli trials it is possible to detect deviation from
the statistical process by computing a cumulative probability for each of the 84
normative parameters. The deviation from a random statistical process in the
direction of reduced safety generally suggests a potential for accident reduction
related to a specific parameter.
The diagnostics process of highway safety problems on a section of road is in
many ways similar to making a medical diagnosis. While diagnostics is an
integral part of medicine, much remains to be done by the transportation
engineering profession in order to institutionalize this critical component of
understanding the highway safety problem.
6


Knowledge is of two kinds. We know a subject
ourselves, or we know where we can find information
upon it.
From James Boswell, Life of Johnson 1791.
2. Review of Extant Literature
The review of extant literature is organized in the following 3 groups:
2.1 Studies relating specific geometric features to safety
2.2 Studies relating exposure to safety.
2.3 Studies of diagnostic methodologies
2.1 Studies Relating Specific Geometric Features to Safety
The complex relationship between the driver, vehicle, roadway characteristics
and safety is not well understood. While numerous researchers attempted to
establish the relationships between safety and specific geometric design
elements, the results are decidedly mixed. The only consensus among
practicing transportation engineers is that none of the documented relationships
is definitive, which is an alarming state of affairs after over half a century of
modern road building. McGee (McGhee,1995) observed that: "Many of these
studies have focused solely on one aspect of the design (e.g., degree of
curvature for individual horizontal curves) without considering other geometric
parameters (e.g., upstream and downstream horizontal alignment). Examining
the relationship between accidents and individual highway design variables
7


without considering the interactive effect of other parameters can yield biased
and masked relationships."
The tendency to relate isolated geometric characteristics to accidents within the
framework of a predictive model influenced thinking of several generations of
researchers but did little to improve safety. Despite the commonly held view
that pavement width significantly affects safety there is little empirical data to
support a scientifically defensible relationship. A 1986 study (Zegeer et
al.,1986) attempted to relate safety to lane width and shoulder width on two-lane
rural roads. This model did not consider the effects of horizontal or vertical
alignment, the frequency of access points or operating speeds. It is possible that
these other factors not considered in the study influence accident frequency
more significantly than shoulder width or lane width, which may partially explain
the relatively low R2 of the model.
Glennon (Glennon,1987) developed an accident prediction model for horizontal
curves. Its main application is to predict accident reduction associated with
curve flattening while maintaining the same central angle. This model does not
consider curve length, superelevation, and roadside or geometric design
consistency. Another horizontal curve model was developed by Zegeer et
al.(Zegeer et al.,1991). It relates expected number of accidents on the curve to
the degree of curve, length of curve, traffic volume, pavement width and
presence of spirals. It does not consider however, the effect of vertical
alignment or the impact of upstream or downstream alignment. Its focus is on
the accident prediction within the confines of an isolated curve. The tendency
8


to isolate only selected geometric features and relate them to safety has many
limitations and makes for a number of conceptual and methodological difficulties.
It negates the very important influence of driver expectancy and design
consistency. For instance, a fairly sharp curve in the mountainous terrain can
experience relatively few accidents, while a much milder curve in a different
environment may present serious difficulties to the driver. Therefore, efforts to
relate specific curvature to safety without carefully considering driver expectancy
and roadway environment are generally not successful. The importance of
curve signing, striping, lighting and delineation becomes more critical at locations
where design consistency or driver expectancy is violated. These factors are not
currently considered by any of the accident prediction models.
Hauer in Safety in Geometric Design Standards (Hauer, 1999) asserts that roads
designed to standards are not safe, not unsafe, nor are they appropriately safe;
roads designed to standards have an unpremeditated level of safety. He
correctly observes that For a road design standard to be the embodiment of
some appropriate safety, it must be true that those who write the standards can
anticipate the extent to which important road design decisions affect safety. It
may come as a surprise that, typically, writers of standards did not know how
what they choose affect safety. To test the verity of this irreverent assertion is
simple. One only has to ask the highway designer or the member of the
standards committee questions such as: Approximately how many crashes will
be saved by increasing the horizontal radius of this road from 100 m to 200 m;
how many by making lanes 12 instead of 11 feet; or by how much will crash
severity be reduced by changing this side-slope from 3:1 to 5:1?. If they cannot
9


answer, then the safety built into the current standards cannot be appropriate.
A clear indication of the veracity of Hauer's claim is the fact that at present we
still do not have a tool that can predict the road safety consequences of
alternative highway designs. Additionally Hauer observes (Hauer, 1999) that
In road safety, intuition is a fallible guide and plausible conjectures often turn out
incorrect. Furthermore, current design standards try to represent road users by
certain fixed parameters and fail to recognize the fact the road users remember
the roads traveled and the road behind and adapts to the road ahead. As a
result, the relationship between design standards and road safety is unclear and
the level of safety designed into roads is unpremeditated.
In A Case for Science-Based Road Safety Design and Management Hauer
formulated a sound strategy to solving a highway safety problem (Hauer, 1988):
Much of what made common sense has been tried; (except for the packaging
and protection of car occupants) could not be shown effective. I think this
pessimism to be unjustified and premature. Did we condemn the efficacy of
medicine because spells, leeches and exorcism have had no demonstrable
healing effect? Of course not. The response of medicine was to develop a
knowledge-based profession. The engine of progress was: insistence on
respectable training that is a prerequisite for license to practice, the institution of
pathology, the experiment, measurement, the planned clinical trial. Talent ,
money and resolve were found to deliver science-based health care. The use
of spells and exorcism in medicine has much diminished and we even seem to
know when the medical use of leaches can be considered. Not so in road
safety. The comparison of the delivery of road safety to the delivery of health is
10


not far fetched. In road safety we are somewhere at the leaches, spells and
exorcism stage. Instead of disillusionment, the response should be a resolve to
embark upon an era of science-based safety management. It alone promises
to deliver results.
2.2 Studies Relating Exposure to Safety
There is an emerging consensus among researchers that there is a non-linear
relationship between exposure and safety, however there is no consensus as to
what this relationship is. Most of the studies assert that accident rate increases
with increase in exposure (Hall and Pendleton, 1990).
Harwood concluded (Harwood, 1995): "It would be extremely valuable to know
how safety varies with Volume to Capacity (V/C) ratio and what V/C ratios
provide minimum accident rate. Only limited research has been conducted on
the variation of safety with V/C ratio. More research of this type is needed, over
a greater range of V/C ratios, to establish valid relationships between safety and
traffic congestion to provide a basis for maximizing the safety benefits from
operational improvement projects."
Hall and Pendleton observed (Hall and Pendelton, 1990) that "The implications
of the existence of a definite relationship between traffic accident rates and the
ratio of current or projected traffic volume to capacity is quite significant .
Knowledge of any such relationship would help engineers and planners assess
the safety implications both of projected traffic growth on existing highways and
11


of highway improvements designed to increase capacity.
Hall and Pendleton expressed a concern about the amount of scatter present in
the initial dataset in previous research efforts. They attribute this scatter to a
variety of driver, vehicle and roadway factors contributing to accident
occurrence. In our experience a significant amount of scatter can be attributed
to the fact that in most previous studies intersection and interchange related
accidents were not isolated from the accidents which occurred on in-between
segments of freeways or arterials. The amount of traffic on the cross-road at an
interchange or a side-road at an intersection, to a large degree, defines the
magnitude of conflict at an interchange or an intersection. All other things being
equal (meaning similar geometries and traffic control) this conflict is largely
responsible for the amount of accidents at these facilities. Belanger developed
the following accident prediction model for rural intersections which supports this
view (Belanger, 1995):
Accidents/year=.00204(AADT major road)042 (AADT minor road)0 51
McDonald (9) developed a similar model for the intersections on divided
highways:
Accidents/year=0.000783(AADT major road)0 455 (AADT minor road)0633
A study by Zhou and Sisiopiku (Zhou and Sisiopiku, 1997) examined the general
relationship between accident rates and hourly traffic volume/capacity ratios. A
12


heavily traveled section of urban Interstate in Michigan was selected as the
study location. This location contained 79 ramp access points over 16 miles of
freeway. Because the ramp weaving activities influence the en tire segment it
was assumed that this segment is representative of urban freeways. It is
possible that merging/diverging and weaving maneuvers can account for a
significant proportion of accidents, and therefore ramp volumes or cross-road
volumes may be viewed as explanatory variable not accounted for in the study.
Considering freeway turbulence in the merge/diverge zones and resulting
accidents identified by Janson (Janson et al., 1998), interchanges should be
isolated and examined separately from the freeway segments in between.
Because of merging/diverging and weaving conflicts the relationship between
accident rates and V/C ratio for freeways does not lend itself well to the
examination in densely developed urban areas. It is more meaningful to study
this relationship in rural areas where interchange spacing is sufficiently long that
it is possible to isolate roadway segments from the interchange-related conflicts.
13


2.3 Studies of Diagnostic Methodologies
Over the last 50 years of modern road building it was somehow implied that
transportation engineering professionals will always know how to treat a high
accident location once it has been identified, when in reality very little is known
on the subject. This sorry state of affaires is best expressed in (Hauer, 1996) lf
the site has been identified because its accident record is unusual, one has also
to find out why. Thus, the detailed safety analysis stage is akin to a process of
medical diagnosis, with perhaps a keener awareness of costs and budgets, a
process requiring knowledge of causes, effects, and economics. One might
expect that this task would be performed by specialists whose training in this
matter is extensive and based on knowledge of fact. Unfortunately, this is not
so. For some reason, perhaps because of a fascination with matters statistical
or perhaps because it is a headquarters function, a great deal of thought has
been devoted to the identification stage. Much less has been written about, or
taught to engineers, how to conduct a detailed safety analysis of a site. Yet, not
common sense, practical experience, engineering judgement, or the usual
highway and traffic engineering lore is a sufficient guide. To be effective, it is
not enough to produce reasonable lists of candidate sites to be investigated in
the order of priority. It is also necessary to equip the engineer with the training
and the tools to make a safety diagnosis on the basis of the specific kinds of
accidents that have occurred, the conditions in which they occurred, and the
characteristics of the site. Furthermore, it is necessary to give the engineer
realistic estimates of what safety improvements can be expected. This, at
present is a tall order.
14


Once again we will turn to the field of medicine for the conceptual and
methodological guidance on how to formulate a solution and also observe an
interesting analogy.
In the United States, the initial impetus for developing a classification of mental
disorders was the need to collect statistical information. In 1917, the Committee
on Statistics of the American Psychiatric Association, together with the National
Commission on Mental Hygiene, formulated a plan that was adopted by the
Bureau of the Census for gathering uniform statistics across mental hospitals.
In 1952 the American Psychiatric Association (APA, 1952) developed an
authoritative guide on diagnostics of mental disorders called Diagnostic and
Statistical Manual of Mental Disorders (DSM-1). In part because of the lack of
widespread acceptance of the mental disorder diagnostic categories the World
Health Organization (WHO) sponsored a comprehensive review of diagnostic
issues that was conducted by the British psychiatrist Stengel. According to the
American Psychiatric Association (APA, 1994) His report can be credited with
having inspired many of the recent advances in diagnostic methodology-most
especially the need for explicit definitions as a means of promoting reliable
clinical diagnosis Since 1952 there have been three more editions of DSM.
Over the last half a century thousands of psychiatrists systematically collected
data to advance diagnostic methodology. In contrast to medicine, no such
undertaking by the transportation engineering profession has taken place. In
highway safety, just like in medicine there can be no effective treatment
without accurate diagnosis.
15


In the fourth edition of the Diagnostic and Statistical Manual of Mental Disorders,
DSM-4 (APA,1994) the American Psychiatric Association (APA) cautions that
The diagnostic categories, criteria, and textual descriptions are meant to be
employed by individuals with appropriate clinical training and experience in
diagnosis. It is important that DSM-IV not be applied mechanically by untrained
individuals. The specific diagnostic criteria included in DSM-IV are meant to
serve as guidelines to be informed by clinical judgement and are not meant to
be used in a cookbook fashion. Furthermore APA cautions that The proper use
of these criteria requires specialized clinical training that provides both a body
of knowledge and clinical skills. A similar caution is relevant to the practice of
transportation engineering, yet at present a very limited factual knowledge base
exists to assist transportation professionals in making diagnostic decisions.
16


Since the measuring device has been constructed by the observer...
we have to remember that what we observe is not nature in itself but
nature exposed to our method of questioning.
Physics and Philosophy [1958]
Wemer Karl Heisenberg, 1901-1976
3. Research Objectives and Methodology
3.1 Overview
The primary objective of this research was to develop a comprehensive
methodology for identifying locations with potential for accident reduction and
conducting a diagnostics analysis of accident causality. Because of conflicts
between traffic flows at junctions, intersections and interchanges have distinctly
different safety performance characteristics from roadway segments. With this
in mind, intersections and roadway segments should be studied separately.
Three methodologies were developed for identification of locations with potential
for accident reduction:
Calibration of the Safety Performance Functions (SPF)
Direct Diagnostics Analysis
Accident Pattern Recognition Analysis
The use of these methodologies were examined and compared in the course of
the study for the following facilities:
Four lane rural freeways in mountainous terrain
Two lane rural highways in mountainous terrain
Four lane rural freeways in rolling and flat terrain
17


Following identification of locations exhibiting higher then expected accident
frequency using Safety Performance Functions a methodology for diagnostics
of accident causality at these locations was introduced.
18


4. Modeling Relationships Between Traffic Volume
and Traffic Safety
4.1 Safety Performance Functions
The relationships between expected level of safety and traffic exposure for
specific types of roadway are known as Safety Performance Functions as
described in (Hauer and Persaud, 1997). Safety Performance Functions reflect
the complex relationship between exposure usually measured in ADT, and
accident count for a unit of road section or a junction over a unit of time. These
safety performance functions are referred to as "simple" because they reflect
the relationship between exposure and safety for road populations sufficiently
similar that the only independent variable required is ADT. We aim to produce
models that relate accident frequency and ADT. The models will be of the form:
Accidents/(mile-year)=f(ADT)
Naturally the question of considering the influence of other explanatory variables
has to be addressed. Why not introduce these variables into the model
explicitly? In other words, why not develop a multiple regression model with as
many variables as possible? Gujarati in Basic Econometrics (Gujarati, 1995)
examined the reasons for limiting the number of variables in the regression
which are discussed below in the framework of accident prediction modeling.
Vagueness of theory: The theory, if any, determining how the accident frequency
19


is influenced by geometric design features, traffic control devices, traffic
operations, driver behavior or vehicle type is at best incomplete at present. We
might know for certain that ADT influences accident frequency, but we are
largely ignorant or unsure about the other variables affecting it. Therefore, the
error term in the regression was used as a substitute for all the excluded or
omitted variables from the model.
Unavailability of data: Even if we know what some of the excluded variables are
and therefore consider a multiple regression rather than a development of simple
Safety Performance Function (SPF), we do not have quantitative information
about these variables. Geometric design data, for instance, which is thought to
be of interest is generally unavailable. Therefore we made a decision to use a
proxy of relevant geometric design variables by stratifying roads by the functional
classification, number of lanes, type of terrain and urban or rural environment.
Additionally, intersection and interchange-related accidents and roadway
segments were isolated from the data-set prior to fitting of the model.
Core variables vs. peripheral variables: Lets consider that besides average daily
traffic (ADT), lane width, shoulder width, stopping sight distances, horizontal
curvature and vertical grades also effect accident frequency. But it is quite
possible that the joint influence of all or some of these variables may be so small
and at best non-systematic that as a practical matter and for cost considerations
it does not pay to introduce them into the model explicitly. We hoped that their
combined effect within the framework of narrowly defined and carefully stratified
SPF can be treated as the error term in the regression.
20


Intrinsic randomness in human behavior. All accidents involve some amount of
driver error. Even if we succeed in introducing all the relevant variables into the
model, there is bound to be some intrinsic randomness in driver behavior that
cannot be explained no matter how hard we try. At the system level, however,
it is true that at some locations within SPF inventory drivers are more likely to
make a driving error than at others and identification of these locations within
SPF framework will reveal to us sites with potential for accident reduction.
Inaccurate measurements of additional explanatory variables: Although the multi-
variate regression model assumes that all variables are measured accurately,
in practice, traffic, accident and geometric design data is replete with errors of
measurement. Therefore there are some advantages to minimizing introduction
of additional error by limiting the number of explanatory variables.
Principal of parsimony. Parsimony is defined by the Websters New Collegiate
Dictionary (Merriam-Webster, 1980) as-Economy in the use of a means to an
end. Employing the Occams razor principal that explanations of unknown
phenomena be sought first in terms of known quantities lead us to the following
question. If we can explain the accident frequency within a framework of SPF
substantially with only ADT and if body of factual knowledge is not extensive
enough at present to suggest what other variables might be included, why
introduce more variables? Let the error term represent all other variables.
Wrong functional form: Even if have identified all correct variables explaining a
phenomenon and somehow are able to obtain accurate data on all of them, in
21


the absence of accepted theory we do not know the form of the underlying
relationship between the regressand and the regressors. In two-variable models
the functional form of the relationship can often be judged from the scattergram.
But in a multiple regression model, it is not easy to determine the appropriate
functional form.
For all of these reasons the magnitude of the error terms in the calibration of the
SPF assumes an extremely critical role, which we will demonstrate as we
progress.
Level of safety in this study is expressed in terms of the number of accidents per
mile over a period of one year and exposure is measured in Average Daily
Traffic (ADT). The expected level of safety represents normally expected
number of accidents per mile of freeway associated with specific level of ADT.
Simple Safety Performance Functions (SPF) were developed for several
facilities:
Four lane rural freeways in mountainous terrain
Two lane rural highways in mountainous terrain
Four lane rural freeways in rolling and flat terrain
Comparison of the accident history of a specific location to the accident
prediction model representing the facility would allow us to assess its potential
for safety improvement.
The scope of application of this methodology is limited to the initial identification
of locations with potential for accident reduction. In order to relate accident
22


causality to roadside features, traffic control devices, roadway geometries and
traffic operations a diagnostic analysis is usually required.
Development of the SPFs lends itself well to the conceptual formulation of the
Level of Service of Safety. If the level of safety predicted by the SPF represents
normal or expected number of accidents at a specific level of ADT, then the
degree of deviation from the norm can be stratified to represent a specific Level
of Service of Safety.
Another application of developing SPFs is comparing safety of different roads at
the same level of traffic exposure. Such a comparison for instance will enable
us to assess the change in safety attributable to providing additional capacity.
Additionally examination of relationships between traffic exposure and safety for
different roads is expected to provide insights into changes in driver behavior
across the wide range of ADT. Traditional use of accident rates in problem
identification and project selection has delayed comprehensive understanding
of the highway safety problem. The assumption of linearity between safety and
exposure inherent in the use of accident rates, often leads to mis-diagnosis and
poor investments in safety improvements.
The relationship between exposure and safety is complex and should be
established by employing statistical modeling techniques. The majority opinion
among traffic safety researchers expressed in (Miau 1993) and in (Maher and
Summersgill, 1996) is that multiple linear regression modeling is not well suited
23


for studying traffic safety problems, yet there is no consensus as to what
technique should be used. More recent efforts have employed Generalized
Linear Models (GLM) with Poisson and Negative-Binomial assumptions.
Although this technique is more statistically appropriate than linear regression,
it generally provides poor to moderate fit to the data. Considering the limitations
of GLMs and linear regression, the alternate methodology was explored in the
course of the study.
24


4.2 Philosophy and Methodology of Model Fitting
Modeling in science remains, partly at least an art. Some principles do exists, however, to guide
the modeler. The first is that all models are wrong; some, though, are better than others and we
can search for the better ones. At the same time we must recognize that eternal truth is not
within our grasp. The second principle (which applies also to artists!) Is not to fall in love with the
model, to the exclusion of alternatives.
Monographs on Statistics and Applied Probability [1989]
McCullagh and Nelder
In statistical modeling of traffic accidents, we are interested in discovering what
we can learn about underlying relationships from empirical data containing a
random component. We suppose that some complex phenomenon manifested
by accident occurrence (data generating mechanism) has produced the
observations and we wish to describe it by some simpler, but still realistic, model
that reveals the nature of the underlying relationship. Generally, in a model, we
distinguish between systematic and random variability, where the former
describes the patterns of the phenomenon in which we are particularly
interested. Thus, the distinction between the two depends on the particular
question being asked. Random variability can be described by a probability
distribution, perhaps multivariate, whereas the systematic part generally involves
a regression model, most often, but not necessarily, a function of the mean
parameter (Lindsey, 1997). Fridstrom and Ingebrigsten observed that Road
casualties are random events. Each single accident is unpredictable in the very
strong sense, that had it been anticipated, it would most probably not have
happened. Yet the number of accidents recorded within reasonable large
geographical units exhibits a striking stability from one year to the next...
Although the single event is all but impossible to predict, the collection of such
events may very well behave in a perfectly predictable way,...( Fridstrom and
25


Ingebrigsten 1991). This observation reflects the essence of accident modeling
and suggests that we should view safety of an entity (road segment for instance)
as an underlying stable property that has the nature of a long-term average.
4.2.1 Choice of the Model Form
Based on substantial empirical evidence derived from observing safety
performance of various roads over extended time periods as well as work of
other researchers the following understanding of relationship between safety and
exposure has emerged. Accident rates decline when ADT reaches certain
threshold endemic to a particular facility in a specific environment. This
understanding suggests a choice of underlying function which would reflect this
phenomenon. Such a function can be represented by a model form which will
show some leveling off associated with approaching some threshold exposure
value. Two general model forms are generally employed:
E{y} = x^e^x+filX ~ > power family
E{y} = X^ (1 + fxX + /32X2...) > polynomial family
In this E{y} is the annual number of accidents expected to occur on a mile of
road, X is the independent variable (here ADT), and fis are parameters to be
estimated. Hauer used Nadaraya-Watson kernel estimator with Gausian kernel
to obtain the relationship presented on Figure below. (Hauer 2001).
26


<0
0)
>
c
-g
'o
o
<
0 5000 10000 15000 20000
AADT
Figure 4.1
Smoothed Frequency for Total Accidents
Non-parametric kernel regression used by Hauer is a smoothing technique used
to obtain clues about the form of the function underlying the data. Figure 4.2
also from (Hauer 2001) shows safety performance function calibrated for 2-lane
rural highways using Poisson maximum-likelihood estimation. Similarfunctional
shapes in Figure 4.2 were developed and described in ( Kononov 1999) using
Neural Networks-Radial Basis Function. Neural Networks are not constrained
by the underlying distributional assumptions and learn by example inferring a
model from training data. It is of interest to note that despite the violation of
normality and homoscedacity assumptions the linear regression fit of the mean
is closely approximating a curve fitted using Neural Networks methodology.
27


Accident* Per Mlla/Yar
CO
d)
>s
C
0)
g
o
o
<
0 5000 10000 15000 20000
AADT
Figure 4,2 SPF for 2-Lane Roads
Rural Mountainous 4-Lane Interstate
Figure 4.3 Neural Networks Model
28


4.2.2 Choice of Underlying Distributional Assumptions
In statistical modeling of traffic accidents, it is assumed that the random
variation follows certain probability laws and can be characterized by a
probability function. It was observed (Miau and Lum, 1993) that The use of a
continuous distribution, such as the normal distribution, is at best an
approximation to a truly discrete process. The Poisson distribution, on the other
hand, is a natural initial candidate distribution for such random discrete and,
typically, sporadic events. At the same time if a Poisson assumption is made
about the underlying random variability, it will have a restricting effect of always
equating the variance to the mean. In our experience with accident data this
assumption is not always true. Similar findings are reported by others (Dean and
Lawles, 1989). In many cases accident data exhibit extra variation or over-
dispersion relative to the Poisson model. In other words, the variance of the
data if often greater then the mean.
Safety performance of roadway segments observed over extended time periods
represent a set of panel or clustered count data. The potential problem for
clustered accident count data is that the multiple accident counts from the same
road segments, although generated by different traffic volumes, may not be
independent. Negative binomial regression models were proposed to replace
Poisson regression models because accident count data are frequently over-
dispersed as compared with those data suitable for the methods of Poisson
regression models (Miaou and Lum 1993). Like Poisson models, however,
negative binomial models still need the assumption of independence, and,
therefore, may not be entirely appropriate. In order to make explicit allowance
29


for correlated observations it was suggested (Guo 1996) that multiple counts in
the same cluster are subjected to a cluster specific random effect. Guos
approach starts with a conventional Poisson regression model and then subjects
the multiple counts in the same cluster to a cluster-specific random effect
representing unobserved influences shared by all the counts of the cluster. A
gamma-distributed cluster-specific effect in this formulation leads to the
multinomial regression model. Similar ideas are behind the strategies developed
for correlated observations in linear regression modeling (Searle 1987).
The Multinomial Regression Model
According to the model formulated by Guo, the mean for entity i is given by
equation (4.2.1)
^ = = (4.2.1)
In this, T.j is exposure, xtj are covariate values, /fare regression constants and
0i is a random variable (for entity i) that comes from Gamma distribution the
density of which is in equation (4.2.2)
m)=
T{ (4.2.2)
For this distribution the mean is 1 and the variance is 1 / rp.
30


For entity i we observe accident counts in time periods
1,2..j..ni.
P{yn,-,ymJ o,) = f(4.2.3)
>1
Assume that these accidents are Poisson distributed with means rjij6i and 6i
given. When 6t is not given, the probability to observe accident counts
y^yn^yiP-^yin is9'ven by

n dd:
(4.2.4)
For notational brevity lets adopt the following:
i
(4.2.5)
i
31


A
Y\eyij=0' =0*
i
With this,
7=1
rij
____________t_______________________________
n,
This expression is the same as provided in (Guo 1996) without the derivation
provided above. Guo states that:
32


VAR\Yg\ =rjv + ifij / (p= ^{1+^1 That is the variance that pertains to trials in which realizations all have the same
Tjy but each time a different 0..
As written equation 4.2.7 seems to imply that^z? is a constant that applies to all
entities, no matter what their rj^ If (p is the same for all entities, then for entities
for which if- (p, VARlYy] will be close to 77^. as for a Poisson variable; while
for entities for which ijy the weight of an observed value Yy is approximately inversely proportional to its
variance. The consequence of assuming that

that entities with rjy q> will have little weight compared to entities with
rjy q>. The practical consequence is that observed values for entities that
have a large mean will have little effect on the regression. Giving little weight to
observations with large rj may not be the right thing to do in regressions.
This can be remedied by considering q> a function of 77,., the average of the rjy
over all time periods. Thus, e.g., were (p y/rf with y/ and u constants, then
VAR\Yij'\ would be 77^.[1 + (77^. / 77") / y/). If o = 0 the assumption in equation
33


4.2.7 holds. If u = 1 then VAR[Yv~\ would be, approximately, 7^.(1 +1 / y/) etc..
For parameter estimation the implication is that parameter is replaced by that depends on some parameters and of the traits of entity i. It was suggested
in (Hauer 2001) to use (pi = q> x (segment length).
Equation 4.2.6 gives the probability to observe yn,.--,yin. Replacing nowqy
by (pi, the log-likelihood for entity i is expressed by eq. (4.2.8) below.
Because the interest in InLi is for maximization with respect to unknown
parameters, the product nyy! has been omitted. Maximum likelihood
parameters are those that maximize
(4.2.8)
(4.2.9)
Were#,. = 1 for all i,
34


(4.2.11)
p(yn>yn>
-JiJ
Again omitting the yy! the log-likelihood for entity i is now
<
LL, = ytj ln( 1J9)] *7, (4.2.12)
>1
Considering that we have available 14 years of accident and traffic data the
multinomial regression with Poisson or Negative-Binomial distributional
assumptions will be used in this study. Its flexibility and statistical properties
make it an attractive methodology to calibrate safety performance functions. It
will be also compared with the modified OLS models with empirically and
incrementally calibrated variance.
35


4.3.1 Dataset Preparation
All of the dataset preparation was performed using the Colorado Department of
Transportation (CDOT) accident database. The system of rural Colorado
highways was subdivided into the following categories:
1. Mountainous Terrain-4 Lanes Freeways
2. Mountainous Terrain-2 Lanes Arterials
3. Rolling/Flat Terrain-4 Lanes Freeways
On the Interstates accident history for each facility was prepared over the period
of 14 years. Average Daily Traffic (ADT) for each roadway segment for each
of the 14 years was entered into the same dataset. All of the interchange
related accidents were isolated from the accident database prior to fitting the
model. The reason for isolating interchange-related accidents was to remove
the influence of accidents resulting from merge/diverge turbulence at an
interchange. A ramp-freeway junction area is a zone of competing traffic
demands for space. At on-ramps upstream traffic competes for space with
entering on-ramp vehicles in merge areas. In the merge area, individual on-
ramp vehicles attempt to find gaps in the traffic stream of the adjacent freeway
lane. The action of individual merging vehicles the freeway creates turbulence
in the traffic stream in the vicinity of the ramp. Approaching freeway vehicles
move toward the left to avoid this turbulence. The turbulence itself as well as
associated lane changing result in the increase of side-swipe and rear-end
collisions in the area around ramp junction. This increased number of accidents
is atypical of safety performance of the segments of freeways away from the
36


interchange. Studies of on-ramp junctions (Roess and Ulerio, 1993) showed that
lane changing and turbulence extend 1500 ft downstream of the physical merge
point. At off-ramps the basic maneuver is a diverge. Exiting vehicles must
occupy the lane adjacent to the off-ramp. Thus, as the off-ramp is approached,
exiting vehicles move right. This movement brings about a redistribution of other
freeway vehicles, which move left to avoid the turbulence in the immediate
diverge area. Again studies of off-ramp junctions (Roess and Ulerio, 1993)
showed that area of turbulence extends 1500 ft. upstream of the physical gore.
Adding 1500 ft. of turbulence on both side of grade separation (3000 ft.) to the
length of the ramps and of the structures will result in isolating approximately one
(1) mile distance centered on the interchange. This distance correlates well with
spread of accidents related to freeway turbulence in the study of interchange
safety (Janson et al., 1998). Figure 4.1 illustrates how the data-set was
prepared. On the 2-lane rural roads the data-set was prepared in a similar
fashion with the exception that intersection related accidents and 0.1 mile
roadway segments containing intersections were removed prior to fitting of the
model. Isolating a distance of approximately 250 ft. on both sides of rural
intersections is a conservative measure, but it will ensure that intersection
related conflicts will not pollute the data-set comprised of non-intersection related
accidents and road segments. Figure 4.2 illustrates how the data-set was
prepared.
37


Accidents and freeway
1 mile segments included in the 1 mile
excluded dataset (min. segment excluded
Included length >= 2 mi.) Included
. ^c


Interchange related
accidents excluded
from the dataset
Interchange related
accidents excluded
from the dataset
Figure 4.4 Dataset Preparation Freeways
Included in SPFdata set segments included in the included in SPF data set
SPF dataset (min. segment length >= 2 mi.)
p I
w if I

Intersection related
accidents studied using
Intersection Diagnostic
Analysis
Intersection related
accidents studied using
Intersection Diagnostic
Analysis
Figure 4.5 Dataset Preparation Arterials
38


Dataset sample extract used for the fitting of the Safety Performance Function
(SPF) reflecting Rural 4-Lane Mountainous Interstate is included in Table 4.1.
Dataset sample extract used for the fitting of the Safety Performance Function
(SPF) reflecting Rural 2-Lane Mountainous Highways is included in Table 4.2.
Dataset sample extract used for the fitting of the Safety Performance Function
(SPF) reflecting Rural 4-Lane Rolling and Flat Freeways is included in Table 4.3.
Complete data-sets for all Safety Performance Functions are provided in the
Appendix.
39


Table 4.1 Dataset Extract Mountainous Freeways
Rural Mountainous 4-Lane Interstate (1986 to 1999)
HWY Milepoints Dates AADT Accidents
# Sec Begin End Len Begin End PDO INJ FAT Accs Per Mile
70 A 2.31 10.61 8.33 01/01/86 12/31/86 3,450 5 4 0 1.08
70 A 2.31 10.61 8.33 01/01/87 12/31/87 3,450 1 1 0 0.24
70 A 2.31 10.61 8.33 01/01/88 12/31/88 3,700 8 5 1 1.68
70 A 2.31 10.61 8.33 01/01/89 12/31/89 4,000 7 3 1 1.32
70 A 2.31 10.61 8.33 01/01/90 12/31/90 4,300 3 4 1 0.96
70 A 2.31 10.61 8.33 01/01/91 12/31/91 4,400 5 9 0 1.68
70 A 2.31 10.61 8.33 01/01/92 12/31/92 4,848 5 4 1 1.20
70 A 2.31 10.61 8.33 01/01/93 12/31/93 5,050 13 5 0 2.16
70 A 2.31 10.61 8.33 01/01/94 12/31/94 5,200 7 9 1 2.04
70 A 2.31 10.61 8.33 01/01/95 12/31/95 5,200 6 9 1 1.92
70 A 2.31 10.61 8.33 01/01/96 12/31/96 5,470 7 2 0 1.08
70 A 2.31 10.61 8.33 01/01/97 12/31/97 5,350 7 7 0 1.68
70 A 2.31 10.61 8.33 01/01/98 12/31/98 5,677 7 7 2 1.92
70 A 49.52 61.15 12.70 01/01/86 12/31/86 6,550 17 17 2 2.84
70 A 49.52 61.15 12.70 01/01/87 12/31/87 6,550 17 8 0 1.97
70 A 49.52 61.15 12.70 01/01/88 12/31/88 7,400 19 13 1 2.60
70 A 49.52 61.15 12.70 01/01/89 12/31/89 7,856 16 10 1 2.13
70 A 49.52 61.15 12.70 01/01/90 12/31/90 9,450 15 12 0 2.13
70 A 49.52 61.15 12.70 01/01/91 12/31/91 9,600 27 17 0 3.47
70 A 49.52 61.15 12.70 01/01/92 12/31/92 9,486 8 8 0 1.26
70 A 49.52 61.15 12.70 01/01/93 12/31/93 10,400 36 7 2 3.54
70 A 49.52 61.15 12.70 01/01/94 12/31/94 10,600 17 16 1 2.68
70 A 49.52 61.15 12.70 01/01/95 12/31/95 12,000 15 15 0 2.36
70 A 49.52 61.15 12.70 01/01/96 12/31/96 11,151 14 18 0 2.52
70 A 49.52 61.15 12.70 01/01/97 12/31/97 12,524 24 9 0 2.60
70 A 49.52 61.15 12.70 01/01/98 12/31/98 13,291 24 13 0 2.91
70 A 49.52 61.15 12.70 01/01/99 12/31/99 13,098 34 16 1 4.02
70 A 62.15 74.18 11.95 01/01/99 12/31/99 10,942 19 17 2 3.18
70 A 75.18 80.74 5.76 01/01/99 12/31/99 11,367 14 5 1 3.47
70 A 81.74 86.35 4.58 01/01/99 12/31/99 12,018 13 4 1 3.93
70 A 97.93 104.76 6.82 01/01/99 12/31/99 15,318 28 15 1 6.45
40


Table 4.2 Dataset Extract 2-Lane Mountainous Arterials
Rural Mountainous 2-Lane Highways (1986 to 1999)
HWY Milepoints . Dates AADT Accidents
# Sec Begin End Len Begin End PDO INJ FAT Accs Per Mile
5 A 0.05 8.95 8.92 01/01/1987 12/31/1987 180 1 4 0 0.56
5 A 0.05 8.95 8.92 01/01/1988 12/31/1988 240 2 0 0 0.22
5 A 0.05 8.95 8.92 01/01/1989 12/31/1989 240 0 0 0 0.00
5 A 0.05 8.95 8.92 01/01/1990 12/31/1990 280 0 1 0 0.11
5 A 0.05 8.95 8.92 01/01/1991 12/31/1991 270 0 2 0 0.22
5 A 0.05 8.95 8.92 01/01/1992 12/31/1992 274 0 0 0 0.00
5 A 0.05 8.95 8.92 01/01/1993 12/31/1993 300 1 1 0 0.22
5 A 0.05 8.95 8.92 01/01/1994 12/31/1994 280 3 5 0 0.90
5 A 0.05 8.95 8.92 01/01/1995 12/31/1995 280 3 1 0 0.45
5 A 0.05 8.95 8.92 01/01/1996 12/31/1996 283 0 1 0 0.11
5 A 0.05 8.95 8.92 01/01/1997 12/31/1997 297 1 0 0 0.11
5 A 0.05 8.95 8.92 01/01/1998 12/31/1998 310 1 0 0 0.11
5 A 9.11 14.89 5.23 01/01/1987 12/31/1987 180 0 0 0 0.00
5 A 9.11 14.89 5.23 01/01/1988 12/31/1988 240 0 0 0 0.00
5 A 9.11 14.89 5.23 01/01/1989 12/31/1989 240 0 0 0 0.00
5 A 9.11 14.89 5.23 01/01/1990 12/31/1990 280 0 0 0 0.00
5 A 9.11 14.89 5.23 01/01/1991 12/31/1991 270 0 0 0 0.00
5 A 9.11 14.89 5.23 01/01/1992 12/31/1992 274 0 0 0 0.00
5 A 9.11 14.89 5.23 01/01/1993 12/31/1993 300 0 1 0 0.19
5 A 9.11 14.89 5.23 01/01/1994 12/31/1994 280 0 0 0 0.00
5 A 9.11 14.89 5.23 01/01/1995 12/31/1995 280 1 1 0 0.38
5 A 9.11 14.89 5.23 01/01/1996 12/31/1996 283 0 0 0 0.00
5 A 9.11 14.89 5.23 01/01/1997 12/31/1997 297 0 1 0 0.19
5 A 9.11 14.89 5.23 01/01/1998 12/31/1998 310 0 1 0 0.19
6 E 145.80 148.71 2.73 01/01/1987 12/31/1987 1,900 2 2 0 1.46
6 E 145.80 148.71 2.73 01/01/1988 12/31/1988 2,200 2 4 0 2.19
6 E 145.80 148.71 2.88 01/01/1989 12/31/1989 2,200 1 2 1 1.39
6 E 145.80 148.71 2.88 01/01/1990 12/31/1990 2,700 2 4 0 2.08
6 E 145.80 148.71 2.87 01/01/1991 12/31/1991 2,647 1 2 0 1.05
6 E 145.80 148.71 2.87 01/01/1992 12/31/1992 3,584 3 1 0 1.39
6 E 145.80 148.71 2.87 01/01/1993 12/31/1993 3,203 2 2 0 1.39
6 E 145.80 148.71 2.87 01/01/1994 12/31/1994 2,982 1 2 0 1.05
41


Table 4.3 Dataset Extract Flat and Rolling Freeways
Rural Flat and Rolling 4-Lane Interstate (1986 to 1999)
HWY Mileooints Dates AADT Accidents
# Sec Begin End Len Begin End PDO INJ FAT Accs Per Mile
25 A 18.23 22.41 4.20 01/01/86 12/31/86 5,300 0 6 1 1.6679
25 A 18.23 22.41 4.20 01/01/87 12/31/87 5,300 3 3 0 1.4296
25 A 18.23 22.41 4.20 01/01/88 12/31/88 5,450 6 3 0 2.1444
25 A 18.23 22.41 4.20 01/01/89 12/31/89 5,550 3 4 0 1.6679
25 A 18.23 22.41 4.20 01/01/90 12/31/90 6,050 8 7 0 3.5740
25 A 18.23 22.41 4.20 01/01/91 12/31/91 6,200 5 6 0 2.6209
25 A 18.23 22.41 4.20 01/01/92 12/31/92 6,851 3 4 1 1.9061
25 A 18.23 22.41 4.20 01/01/93 12/31/93 8,100 15 8 2 5.9566
25 A 18.23 22.41 4.20 01/01/94 12/31/94 8,200 6 7 0 3.0975
25 A 18.23 22.41 4.20 01/01/95 12/31/95 7,700 15 4 0 4.5270
25 A 18.23 22.41 4.20 01/01/96 12/31/96 8,626 8 5 1 3.3357
25 A 18.23 22.41 4.20 01/01/97 12/31/97 8,036 20 7 0 6.4332
25 A 18.23 22.41 4.20 01/01/98 12/31/98 8,528 15 8 2 5.9566
25 A 18.23 22.41 4.20 01/01/99 12/31/99 8,442 8 2 0 2.3827
25 A 34.59 39.99 5.40 01/01/86 12/31/86 5,350 6 2 1 1.6670
25 A 34.59 39.99 5.40 01/01/87 12/31/87 5,350 4 3 0 1.2965
25 A 34.59 39.99 5.40 01/01/88 12/31/88 5,550 3 8 0 2.0374
25 A 34.59 39.99 5.40 01/01/89 12/31/89 5,950 5 3 0 1.4818
25 A 34.59 39.99 5.40 01/01/90 12/31/90 6,200 2 4 0 1.1113
25 A 34.59 39.99 5.40 01/01/91 12/31/91 6,450 4 5 0 1.6670
25 A 34.59 39.99 5.40 01/01/92 12/31/92 6,956 2 9 0 2.0374
25 A 34.59 39.99 5.40 01/01/93 12/31/93 7,300 5 8 0 2.4079
25 A 34.59 39.99 5.40 01/01/94 12/31/94 7,200 5 5 1 2.0374
25 A 34.59 39.99 5.40 01/01/95 12/31/95 7,000 9 7 0 2.9635
25 A 34.59 39.99 5.40 01/01/96 12/31/96 7,574 10 9 1 3.7044
25 A 34.59 39.99 5.40 01/01/97 12/31/97 7,306 10 9 0 3.5192
25 A 34.59 39.99 5.40 01/01/98 12/31/98 7,753 10 7 0 3.1487
25 A 34.59 39.99 5.40 01/01/99 12/31/99 7,668 13 4 0 3.1487
25 A 42.43 48.50 6.01 01/01/86 12/31/86 5,350 2 6 0 1.3313
25 A 42.43 48.50 6.01 01/01/87 12/31/87 5,350 12 11 0 3.8276
25 A 42.43 48.50 6.01 01/01/88 12/31/88 5,650 13 6 0 3.1619
25 A 42.43 48.50 6.01 01/01/89 12/31/89 6,100 2 4 0 0.9985
42


4.3.2 Selection of Minimum Segment Length and
Its Effect on the Model
Accident prediction models where the accident count or accident rate is a
dependent variable are sensitive to the selection of the minimum length of the
segment. This question has been raised and investigated in several studies (e.g.
Resende and Benekohal 1994; Okamato and Koshi 1989; Zegeer et al. 1991).
Most of the findings in these studies converge on the opinion that short road
segments had undesirable impact on the estimation of the model, and therefore
preference for longer sections was expressed. While purely statistical
considerations are of importance, lets examine the consequences of selecting
segments shorter than 1 mile for a model where the number of accidents/mile
per year is a dependent variable.
If an accident cluster is contained within a fraction of a mile, then the number of
accidents/mile is estimated by dividing the number of accidents in the segment
by the segments length. This division creates what can be termed a fictional
mile with the same accident density as in the cluster and introduces
unnecessary error into the data-set. For instance, 20 accidents over 0.2-mile
segment would create a fictional mile with 100 accidents per mile, which in
reality does not exist in the rural environment.
The intent of the SPF is, probably, not to pinpoint accident clusters, but rather
evaluate relative safety of homogenous roadway segments when compared with
other similar segments within the same functional class. At a conceptual level,
when we are discussing safety of the section of road we think in terms of longer
43


segments, rather than spot locations. Similarly, when conducting capacity
analysis of roadway segments, as opposed to junctions, we are working with
longer segments as well. From the stand point of project development or
corridor evaluation, using longer segments for the model fitting makes it more
compatible with segments that are being evaluated.
A study by Resende and Benekohal showed that the section lengths used to
compute accident rates directly effect the form of the accident prediction models
and the models prediction power. For basic sections of rural Interstate
highways and two-lane rural highways a section length of at least 0.5 miles was
recommended by this study. Table 4.4 (Resende and Benekohal 1994) shows
that predictive power of the model steadily increases with the increase of the
minimum segment length.
Table 4.4 Relationship Between R2 and Segment Lengths
Section Length 1986 1987 1988
Sample Size ' Rsq. . Sample Size Rsq. Sample Size Rsq.
0.1 928 0.03 993 0.03 1023 0.05
0.2 743 0.08 806 0.06 793 0.13
0.3 636 0.18 676 0.12 663 0.19
0.4 527 0.18 555 0.13 551 0.21
0.5 470 0.21 491 0.14 487 0.25
0.6 413 0.20 430 0.20 424 0.30
0.7 370 0.21 375 0.28 372 0.27
0.8 342 0.23 344 0.28 345 0.31
0.9 312 0.29 316 0.30 318 0.36
1 301 0.28 307 0.30 307 0.35
44


Similar findings are observed in this study as well. The predictive power of the
model for most safety performance functions is maximized at 2-3 mile minimum
range of segment. As is previously discussed, this minimum segment length has
utility in application to project evaluations as well statistical advantages. When
a segment is very short, 0.1 mile for instance, it is generally adjacent to other
segments with significantly different geometric characteristics. At 60 mph 0.1
mile segment can be traversed in approximately 6 seconds, which would not
allow enough time for the driver to adjust driver expectancy and behavior before
entering the next segment. From the stand point of traffic engineering the driver
performance on a short segment may not be representative of the facility.
45


4.3.3 Removal of Outliers
Prior to the fitting of the models the outliers were identified and removed. In
general, an outlier in a set of data is an observation which appears to be
inconsistent with the remainder of that set of data (Barnet and Lewis, 1978). A
data point can be considered to be an outlier if it does not appear to be predicted
well by the fitted model (Hayter, 1995). It is also an observation with
inappropriate representations for the population from which the sample is drawn
(Hair et al., 1998). In a sense, these definitions leave it up to the analyst to
decide what will be considered abnormal. Considering that all of the information
about each individual accident is collected in the field by the police officer and
then hand-entered in the database by accident coders, there is ample
opportunity for error. Additionally traffic volume counts from the volume
database contain certain amount of error which can infiltrate into the model.
Since the data is heteroscedastic the variance was incrementally calibrated to
reflect the increase in standard deviation associated with increase in ADT. For
each bin of the dependent variable standardized residuals were calculated as
follows:
* ^ij
e w , where eu is a residual in the data bin j and in the same data bin. Data points with standardized residual with an absolute
value larger than 3 were removed. Following isolation of the outlier locations
they were examined for a possible explanation as to why they exhibited
abnormal accident frequencies. In addition to identifying common coding errors
this process revealed the following:
46


A large group of roadway segments in the 2-lane mountainous roads
category had abnormally high accident frequency for the amount of traffic
exposure. After plotting them on the map it was observed that these were
roads leading to the gambling communities of Black Hawk, Central City,
Cripple Creek and Victor. Accident history among these locations had two
characteristics in common, abnormally high number of accidents at night and
abnormally high alcohol involvement. A combination of these factors led us
to the realization that driver behavior and resulting accident frequency is
atypical of routes within this functional class in the mountainous environment.
Furthermore, because of these unique characteristics, a separate safety
performance function would need to be calibrated for the routes leading to
the gambling communities.
A section of the mountainous 4-lane Interstate exhibited an abnormally low
accident frequency for the first 9 years, but in the last 5 years it performed as
expected. Following further examination it was observed that the section in
question contained the work zone for Glenwood Canyon. This particular
work zone was characterized by low operating speeds, reduced lane and
shoulder width as well as altered driver expectancy. The construction in the
section was completed and the section performed as expected over the last
5 years.
47


4.4.1 Exploratory Analysis and Model Fitting
Accident data for roadway segments was collected in the manner described in
section 4.3.1 by isolating sections influenced by the intersections and
interchanges. Scatter plots displaying 14 years of accident and traffic data for
two types of rural freeways and a two-lane arterial are presented in Figures 4.6,
4.7, and 4.8. We intend to calibrate models which relate accident frequency and
ADT. These models will be of the form:
Accident/(mile-year)=f(ADT)
where f stands for some function. The motivation for model fitting is to establish
DAfMPTCToU)
Rural 4-Lane Mountainous Interstate
(1988-1999) Total Graph Sections => 3.0 Mies
Figure 4.6 Scatter Plot Mountainous Freeways
48



iAMCTOJ)
Rml Mountainous 2-Lane Highway
Seleded Points Q-aph -Total Accidents- Sections => 20 Miles
Figure 4.7 Scatter Plot 2-Lane Mountainous Arterials
what accident frequency is expected from certain ADT exposure. Therefore the
model has to fit the accident data reasonably well in all ranges of ADT. Since
ADT is the only variable in the equation and because other variables (curvature,
grade, percent of trucks, etc.) are not explicitly accounted for, the magnitude of
the error terms is a proxy for the influence of these variables. Due to the
complexity of association between ADT and other factors the function f may be
complex in form.
49


Dependent data in all data-sets exhibit well defined heteroscedastic
characteristics, where the variance in accident frequency increases with increase
in mean. The model parameters were estimated using a multinomial maximum-
likelihood function discussed in section 4.2.2. using GLM spreadsheet developed
at the University of Toronto (Hauer 2001). The quality of fit was examined with
the Cumulative Residuals (CURE) method described in (Lord and Persaud 2000)
and (Hauer 2001). This method consists of plotting the cumulative residuals for
each independent variables. The goal is to graphically observe how well the
function fits the data-set. To generate a CURE plot, sites are sorted by their
average ADT. Then, for each site, the residual (=predicted accidents-observed
accidents) is computed. The residuals are then added up and a cumulative
residual value is plotted for each value of the independent variable. Because
of the random nature of accident counts, the cumulative residual line represents
a so called random walk. For a model that fits well in all ranges of ADT, the
cumulative residual plot should oscillate around zero. If cumulative residual
value steadily increases within an ADT range, this means that within that ADT
range the model predicts more accidents than have been observed. Conversely,
a decreasing cumulative residual line in an ADT range indicates that in that
range more accidents have been observed than are predicted by the model. A
frequent departure of the cumulative residual line beyond two standard
deviations of a random walk indicates a presence of outliers or signifies an ill
fitting model. In addition to the multinomial model a modified form of the ordinary
least square (OLS) regression with empirically calibrated variance was used for
comparison.
50


Rural Flat and Rolling, 4-Lane Interstate
(1966*1999) Total Graph Sections => 3.0 Miles
l
0 6,000 10,000 15.000 20.000 25,000 30.000 35.000 40.000 45.000 50.000
Figure 4.8 Scatter Plot Flat and Rolling Freeways
This reality suggests an alternate approach to defining an errorterm, specifically,
incremental and empirical calibration of variance. In this approach ADT is
stratified and a is empirically determined for every range of ADT, it is then
used to compute and plot a lower and upper control limits in relationship to the
mean prediction. The magnitude of the errorterm developed using incremental
empirical calibration will then be compared with that generated by the link
function of best fitted models.
51


Rural Mountainous 4-Lane Interstate
Figure 4.9 Fitted Model Mountainous Freeways
4-Lane Mountainous Interstate, Poisson assumptions of error structure-
polynomial function family produced the best fit with parameters presented
below:
E{k) = ayXfio (1 + fiX + /32X2)
a14 = 2.135 ,J30 = 1.052, $ = 1.842, P2 = -0.571
----------= 2.135(^^)' 052[1 +1.842(^^) 0.571(^-^)2]
{mile-year) 10000 10000 10000
52


RM4I CURE
Figure 4.10 Mountainous Freeways CURE Plot
The cumulative residual plot (CURE) produced from maximizing Poisson
likelihood function presented in Figure 4.9 generally provides a satisfactory fit.
The random walk stays well within 2 standard deviations and oscillates around
zero. Figure 4.10 shows, however, that the model has a tendency to over-
predict in the range of ADT between 10,000 and 14,000. The modified OLS
model provided in Figure 4.9 is generally within 0.5 to 1 accidents per mile/year
from the multinomial Poisson model. The R2 of the polynomial function of the
modified OLS model is 0.7. Despite the conceptual soundness of the Negative
Binomial model it produced very unsatisfactory fits.
53


Rural Mountainous 2-Lane Highways
Figure 4.11 Fitted Model Mountainous Arterials
2-Lane Mountainous Arterial, Poisson assumptions of error structure-polynomial
function family produced the best fit with parameters presented below:
E{k} = ayXA (1 + fiX + p2X2)
aj4 = 1.1716, j30 = 0.524, = 6377, J32 = -2.657
Acc.
{mile year)
= 1.1716(-
ADT )524[1 + 6377(444 2.657(44t)21
10000
10000
10000
54


RM2H CURE
600
-600 -L----:--- - -:-'
ADT
Figure 4.12 Mountainous Arterials CURE Plot
The cumulative residual plot (CURE) produced from maximizing Poisson
likelihood function presented in Figure 4.11 generally provides a satisfactory fit.
The random walk stays well within 1 standard deviations and oscillates around
zero in most cases between 0 and 10,000 ADT. The model has a tendency to
over-predict in the range between 3000 and 3,500 ADT The modified OLS
model provided in Figure 4.11 is generally found within 0 to 0.5 accidents per
mile/year from the multinomial Poisson model. The R2 of the polynomial
function of the modified OLS model is 0.71. Despite the conceptual soundness
of the Negative Binomial model it produced very unsatisfactory fits.
55


Rural Flat and Rolling 4-Lane Interstate
Figure 4.13 Fitted Model Flat and Rolling Freeway
4-Lane Flat and Rolling Interstate, Poisson assumptions of error structure-
polynomial function family:
E{k} = ayX*{\ + filX)
al4 = 2.863, J50 =1.133, A =-0.087
-----------= 2.863(^^-)'133 [1 0.087(-^-^-)]
{mile-year) 10000 10000
56


RF-RR-41 CURE
Figure 4.14 Flat and Rolling Freeways CURE Plot
The cumulative residual plot (CURE) produced from maximizing Poisson
likelihood function presented in Figure 4.14 generally provides a satisfactory fit.
The random walk stays well within 2 standard deviations and oscillates around
zero in most cases between 5,000 and 60,000 ADT. The model has a tendency
to under-predict, however, in the range between 35,000 and 45,000 ADT. The
modified OLS model provided in Figure 4.13 generally matches prediction
produced by the multinomial Poisson model. The R2 of the polynomial function
of the modified OLS model is 0.73. Despite the conceptual soundness of the
Negative Binomial model it produced very unsatisfactory fits.
57


Observed accident frequencies reflecting different levels of ADT align
themselves reasonably well into a curvilinear shape. The selected polynomial
functions for all models demonstrate some leveling-off which reflects reduction
in accident rate with increase of ADT beyond certain specific thresholds. The
variability in the dependent variable steadily increases with the increase in mean,
which is consistent with Poisson error structure. Although the R2 statistic
(R2=0.70 or better) for all calibrated models in the study is not entirely
appropriate here because of some violations of normality and homoscedacity
assumptions, it suggests that the relationship between ADT and the accident
frequency per mile/year is sufficiently strong.
Rather than using a transformation to stabilize variances, the variance was
incrementally and empirically calibrated to reflect the increase in <7 associated
with increase in ADT. An SPF graph reflecting characteristics of Rural Non-
Mountainous Interstates with empirically and incrementally calibrated 1.5
standard deviation is provided on Figure 4.15. Although it is difficult to provide
a definitive explanation at present as to why the variance increases with ADT,
the field of econometrics suggests an interesting analogy which may shed some
light on this issue.
Gujarati (Gujarati 1995) observed that As income grow, people have more
discretionary income and hence more scope for choice about the disposition of
their income. Hence, cr2 is likely to increase with income. Thus in the
regression of savings on income one is likely to find variance increasing with
income because people have more choices about their savings behavior.
58


Similarly, companies with larger profits are generally expected to show greater
variability in their dividend policies than companies with lower profits. In the
context of the driving environment the econometric phenomenon discussed
above can be applied as follows: As traffic volume on the road increases the
opportunities for safe as well as unsafe driving behaviors also increases. For
instance, the traffic can move in platoons at safe speeds and as a result large
volumes of traffic can be moved without incidence, or isolated drivers can
behave aggressively leading to a variety of accidents.
* (Ina Linn (Tots))
(Ml Pi (Tcl4)l
fely.(Up0lfl(Tol4})
Rural Flat and Rolling, 4-Lane Interstate
(1986-1999) Total Graph Sections => 3.0 Miles
ADT
Figure 4.15
Rural Flat and Rolling Freeways with Empirically Calibrated Variance
59


Additionally, data shows that for all populations of roads the variance in the
accident frequency increases with ADT and this fact needs to be considered in
the selection of locations with potential for accident reduction. Predicted mean
values generated by the model generally align themselves reasonably well into
a polynomial equation for all populations of roads. The error structure closely
approximate Poisson assumption, where the variance is equal to the mean.
The models produced from maximizing Poisson likelihood function generally
provide a satisfactory fit. The random walk on CURE plots stays well within 2
standard deviations and oscillates around zero in most ranges of ADT. The
mean predictions generated by the modified OLS models in most cases found
within 0.5 accident per mile-year from predictions produced by the multinomial
Poisson model. The R2 of the modified OLS model is between 0.70 and 0.73.
Despite the conceptual soundness of the Negative Binomial model it produced
unsatisfactory fits. This can possibly be explained by the following. Most
previous efforts of accident modeling used very short and very long segments
which introduced over-dispersion into the model consistent with Negative-
Binomial assumptions. In anticipation of this methodological difficulty we have
assembled data-sets which contain segments of relatively uniform lengths. They
are generally between 2 and 5 miles long. This approach also ensures that
segments representing projects length which are usually also 2 to 5 mile long,
are representing the population of segments of which the model is constructed.
The resulting variance is then much closer to the mean than is implied in the
Negative-Binomial assumtions. The data-set assembled in such a way closely
approximates a Poisson distributed error structure.
60


Figure 4.16 shows 1.5 sigma empirically and incrementally calibrated and 1.5
sigma derived from the link function associated with Poisson distribution. It
confirms that data is Poisson distributed, which is consistent with the fact that
maximum likelihood function with Poisson error structure produces the best fit.
Considering that Poisson error structure implies that each mean value in the
regression Xt is equal to variance Vt, then Poisson compute 95% upper and lower control limits.
UppvWacn
Ur* (Told))
-fWy.(AmPY(T<*rf
Fty.(U*vLMl(TeUA)
Rural Hat and Rolling, 4-Lane Interstate
(1968-1999)Total Graph-Sections => 3.0 Miles
0 5,000 10,000 15,000 20,000 25,000 30,000 35,000 4a000 45,000 5a 000
ADT
Figure 4.16 Comparison Between Poisson and Empirically and
Incrementally Calibrated Error Structure
61


0.95
P(x < A,. + 1.5a-, ) =
^ e* x X
^ x\
x=0
4-1-5 a, -Af 2X
P{x < Jlj -1.5a,.) = X ------------; ~ 0.95
x=0
Tables 4.5,4.6,4.7 present upper control limit at 95% for the multinomial models
with Poisson error structures.
If a roadway segment consistently exhibits accident frequency above the upper
control limit, it suggests a strong potential for accident reduction. How to
conduct a diagnostic investigation of such a site is discussed in chapters 6 and
7 of this dissertation. If a roadway segment consistently exhibits accident
frequency below lower control limit, it probably contains design characteristics
which would give us an insight about how to design safer highways.
62


Table 4.5 Control Limits Rural Flat and Rolling 4-Lane Freeways
Rural Flat and Rolling 4-Lane Freeway
ADT (Vehicles/Day) Mean (Accidents/Mile/Year) Lower (Accidents/Mile/Year) Upper (Accidents/Mile/Year)
5,000 1.23 0.00 2.89
10,000 2.56 0.16 4.95
15,000 3.84 0.90 6.78
20,000 5.07 1.69 8.45
25,000 6.21 2.47 9.95
30,000 7.26 3.22 11.30
35,000 8.19 3.90 12.48
40,000 8.99 4.49 13.48
45,000 9.63 4.97 14.28
50,000 10.10 5.33 14.87
55,000 10.38 5.55 15.22
63


Table 4.6 Control Limits Rural Mountainous 2-Lane Arterials
Rural Mountainous 2-Lane Arterials
ADT (Vehicles/Day) Mean (Accidents/Mile/Year) Lower (Accidents/Mile/Year) Upper (Accidents/Mile/Year)
500 0.31 0.00 1.15
1,000 0.54 0.00 1.64
1,500 0.77 0.00 2.09
2,000 1.02 0.00 2.54
2,500 1.28 0.00 2.98
3,000 1.55 0.00 3.42
3,500 1.83 0.00 3.86
4,000 2.12 0.00 4.30
4,500 2.41 0.08 4.74
5,000 2.71 0.24 5.18
5,500 3.01 0.41 5.61
6,000 3.32 0.58 6.05
6,500 3.62 0.77 6.48
7,000 3.93 0.96 6.91
7,500 4.24 1.15 7.33
8,000 4.55 1.35 7.75
8,500 4.86 1.55 8.16
9,000 5.16 1.75 8.56
9,500 5.45 1.95 8.96
10,000 5.75 2.15 9.34
10,500 6.03 2.35 9.71
11,000 6.31 2.54 10.07
11,500 6.57 2.73 10.42
12,000 6.83 2.91 10.75
64


Table 4.7
Control Limits Rural Mountainous 4-Lane Freeways
Rural Mountainous 4-Lane Freeway
ADT (Vehicles/Day) Mean (Accidents/Mile/Year) Lower (Accidents/Mile/Year) Upper (Accidents/Mile/Year)
4,000 0.87 0.00 2.26
5,000 1.24 0.00 2.91
6,000 1.67 0.00 3.60
7,000 2.14 0.00 4.33
8,000 2.65 0.21 5.10
9,000 3.20 0.52 5.88
10,000 3.78 0.86 6.69
11,000 4.37 1.23 7.51
12,000 4.98 1.63 8.32
13,000 5.59 2.05 9.14
14,000 6.21 2.47 9.94
15,000 6.81 2.90 10.73
16,000 7.41 3.32 11.49
17,000 7.98 3.74 12.22
18,000 8.52 4.14 12.90
19,000 9.03 4.53 13.54
20,000 9.50 4.88 14.13
21,000 9.93 5.20 14.65
22,000 10.29 5.48 15.11
23,000 10.60 5.72 15.48
24,000 10.84 5.90 15.78
25,000 11.00 6.03 15.98
26,000 11.08 6.09 16.08
27,000 11.08 6.08 16.07
28,000 10.97 6.00 15.94
29,000 10.77 5.85 15.69
30,000 10.46 5.61 15.31
65


5. Behavioral Interpretation and Levels of Relative
Safety
5.1 Behavioral Zones
Safety Performance Functions reflect a central tendency in driver behavior
across a wide range of exposure. Driver behavior is characterized by the
expected frequency of accidents over a period of a year at each level of ADT.
The shape of the SPF (Figure 7) common to most highways suggests 3 zones
of driver behavior:
Initial Adaptation Zone
Stabilized Zone
Heightened Driver Attention Zone
Initial Adaptation Zone (Zone 1) is characterized by high driver confidence,
subjective sense of security and high operating speeds. Propensity to make a
driving error is high here, however the dynamics of interaction between the
roadway environment and driver behavior gradually leads to safety improvement.
This phenomenon is reflected by the decrease in the number of accidents per
unit of traffic exposure with increase in ADT. Zone 1 generally represents a
small portion of the Interstate system.
66


AccidentjHer Mile Per Year
18
Driver Behavioral Zones and Levels of Relative Safety
16
14
12
10
6
4
2
0
5,000
10,000
15,000 20,000
Exposure (ADT)
25,000
30,000
35,000
Figure 5.1 Driver Behavioral Zones and Levels of Relative Safety
67


Stabilized Zone (Zone 2) is characterized by a more predictable driving
environment where driver confidence is balanced at the appropriate level with
roadway characteristics of the facility. Operating speeds are somewhat lower
than in Zone 1. In Zone 2 the number of accidents per unit of traffic exposure
remains relatively constant. The Stabilized Zone represents the largest portion
of the highway system.
Heightened Driver Attention Zone (Zone 3) is characterized by an increased
focus on the driving task. Subjective sense of security and driver confidence is
diminished here and as a result, safety is improved. Operating speeds are lower
than in Zone 2. The Number of accidents per unit of traffic exposure decreases
with increase in ADT. Zone 3 represents a smaller portion of the highway
system than Zone 2.
These changes in driver behavior can possibly be explained by continuous
interaction and adaptation to different environments. This approach relates
predictable patterns in driver behavior induced by changes in roadway
environment with expected accident count per unit of roadway length. It may
seam counterintuitive, but it appears that the driver's sense of security and
driving comfort is often in conflict with the driver's safety. When assessing the
relative safety of a roadway segment it is critical to determine which zone of the
appropriate SPF it fits into.
68


5.2 Levels of Service of Safety and Additional
Benefits of SPF
Development of the SPFs lends itself well to the conceptual formulation of the
Level of Service of Safety. The concept of level of service uses qualitative
measures that characterize safety performance of a roadway segment in
reference to its expected performance. If the level of safety predicted by the
SPF will represent normal or expected number of accidents at a specific level of
ADT, then the degree of deviation from the norm can be stratified to represent
specific levels of safety. Figure 5.1 provides a pictorial rendering of the concept
using SPF for the 4-lane mountainous Interstate as an example with the
boundary line delineated 1.5 standard deviations from the mean. Three Levels
of Relative Safety (LORS) can be initially proposed:
LORS-Ex (Expected Level of Safety)
LORS-BtEx (Better Than Expected)
LORS-LetEx (Less Than Expected)
Gradual change in the degree of deviation of the LORS boundary line from the
fitted model mean reflects the observed increase of variability in accidents/mile
as ADT increases. This increase is consistent with Poisson error structure. The
boundary lines should be established by computing 95% upper and lower
control limit derived from the Poisson distribution. Based on this line of
reasoning Tables 5.1, 5.2 and 5.3 represent numerical values of the accident
frequencies corresponding to different levels of relative safety. These tables are
calibrated for the three types of facilities examined in this study.
69


Table 5.1
Level of Service Criteria in Reference to Expected Safety Performance
for Rural Flat and Rolling 4-Lane Freeways
Rural Flat and Rolling 4-Lane Freeway .
ADT (Vehicles/D.ay) BtEx (Accidents/Mile/Year) Ex (Accidents/Mile/Year) LetEx (Accidents/Mile/Year)
5,000 0.00 0.00 2.89 >2.89
10,000 <0.16 0.16 -4.95 >4.95
15,000 <0.90 0.90 6.78 >6.78
20,000 < 1.69 1.69 8.45 > 8.45
25,000 <2.47 2.47 9.95 >9.95
30,000 <3.22 3.22 11.30 > 11.30
35,000 <3.90 3.90 12.48 > 12.48
40,000 <4.49 4.49 13.48 > 13.48
45,000 <4.97 4.97 14.28 > 14.28
50,000 <5.33 5.33 14.87 > 14.87
55,000 <5.55 5.55 15.22 > 15.22
70


Table 5.2
Level of Service Criteria in Reference to Expected Safety Performance
for Rural Mountainous 2-Lane Arterials
Rural Mountainous 2-Lane Arterials
ADT BtEx Ex LetEx
(Vehicles/Day) (Accidents/M il e/Year) (Accidents/Mile/Year) (Accidents/Mile/Year).
500 0.00 0.00 - 1.15 > 1.15
1,000 0.00 0.00 - 1.64 > 1.64
1,500 0.00 0.00 - 2.09 >2.09
2,000 0.00 0.00 - 2.54 >2.54
2,500 0.00 0.00 - 2.98 >2.98
3,000 0.00 0.00 - 3.42 >3.42
3,500 0.00 0.00 - 3.86 > 3.86
4,000 0.00 0.00 - 4.30 >4.30
4,500 <0.08 0.08 - 4.74 >4.74
5,000 <0.24 0.24 - 5.18 >5.18
5,500 <0.41 0.41 - 5.61 >5.61
6,000 <0.58 0.58 - 6.05 >6.05
6,500 <0.77 0.77 - 6.48 >6.48
7,000 <0.96 0.96 - 6.91 >6.91
7,500 < 1.15 1.15 - 7.33 >7.33
8,000 < 1.35 1.35 - 7.75 >7.75
8,500 < 1.55 1.55 - 8.16 >8.16
9,000 < 1.75 1.75 - 8.56 >8.56
9,500 < 1.95 1.95 - 8.96 >8.96
10,000 <2.15 2.15 - 9.34 >9.34
10,500 <2.35 2.35 - 9.71 > 9.71
11,000 <2.54 2.54 - 10.07 > 10.07
11,500 <2.73 2.73 - 10.42 > 10.42
12,000 <2.91 2.91 , 10.75 > 10.75
71


Table 5.3
Level of Service Criteria in Reference to Expected Safety Performance
for Rural Mountainous 4-Lane Freeways
Rural Mountainous 4-Lane Freeway
ADT BtEx j Ex LetEx
(Vehicles/Day) (Accidents/Mile/Year) (Accidents/Mile/Year) (Accidents/Mile/Year)
4,000 0.00 0.00 - 2.26 > 2.26
5,000 0.00 0.00 - 2.91 >2.91
6,000 0.00 0.00 - 3.60 >3.60
7,000 0.00 0.00 - 4.33 >4.33
8,000 <0.21 0.21 - 5.10 > 5.10
9,000 <0.52 0.52 - 5.88 >5.88
10,000 <0.86 0.86 - 6.69 >6.69
11,000 < 1.23 1.23 - 7.51 >7.51
12,000 < 1.63 1.63 - 8.32 >8.32
13,000 <2.05 2.05 - 9.14 >9.14
14,000 <2.47 2.47 - 9.94 >9.94
15,000 <2.90 2.90 - 10.73 > 10.73
16,000 <3.32 3.32 - 11.49 > 11.49
17,000 <3.74 3.74 - 12.22 > 12.22
18,000 <4.14 4.14 - 12.90 > 12.90
19,000 <4.53 4.53 - 13.54 > 13.54
20,000 <4.88 4.88 - 14.13 > 14.13
21,000 <5.20 5.20 - 14.65 > 14.65
22,000 <5.48 5.48 - 15.11 > 15.11
23,000 <5.72 5.72 - 15.48 > 15.48
24,000 <5.90 5.90 - 15.78 > 15.78
25,000 <6.03 6.03 - 15.98 > 15.98
26,000 <6.09 6.09 - 16.08 > 16.08
27,000 <6.08 6.08 - 16.07 > 16.07
28,000 <6.00 6.00 - 15.94 > 15.94
29,000 <5.85 5.85 - 15.69 > 15.69
30,000 <5.61 5.61 - 15.31 > 15.31
72


4 & 6-Lane Mountainous highways
(199&1999) Total Qaphs
Another byproduct of developing SPFs is comparing safety of four and six lane
freeways at the same level of traffic exposure. Preliminary findings reflected on
Figure 5.2 indicate that providing additional capacity in the mountainous freeway
environment also results in additional safety. As ADT increases from 16,000 to
30,000 vehicles safety increase can be observed in addition to reduced delay.
Use of the Safety Performance Functions to compare safety of facilities with
different geometric characteristics carrying the same amount of traffic has a
73


significant potential in alternatives evaluation and transportation planning.
Transportation Equity Act for the 21st Century (TE-21) of 1998 requires explicit
consideration of safety in the transportation planning process. While this
government mandate is well intentioned, little is known about how to accomplish
it. It is difficult to anticipate the safety of highway that has not yet been built. In
the planning process of the system level safety and capacity improvements the
use of well calibrated SPF will provide realistic estimates of expected safety
performance of the network.
74


Art is the imposing of a pattern on experience, and our aesthetic
enjoyment in recognition of the pattern.
Dialogues of Alfred North Whitehead [1953], Chapter 29
Alfred North Whitehead 1861-1947
6. Direct Diagnostics and Pattern Recognition
Analysis
6.1 Direct Diagnostics
Over-representation in the number of accidents above the expected or normal
threshold predicted by the safety Performance Function is only one of many
indicators of a potential for accident reduction. (And it appears that it may not
be the best one). Accident type, severity, road condition, spacial distribution of
accidents, lighting conditions are only few of the many important symptoms of
the accident problem. Furthermore in many cases factors other then over-
representation in frequency are better predictors of susceptibility to corrective
counter-measures.
It is difficult to determine a specific form for the distribution of accidents,
therefore the problem lends itself well to a non-parametric approach, which does
not require assumptions about the shape of the underlying distribution. Accident
occurrence as a process can be thought of as a sequence of Bernoulli trials
where the following holds true:
75


There are only two outcomes at each trial or observation-acc/denf of a
specific type has or has not occurred.
The probability of success is the same for each tha\-the probability of
occurrence of a specific accident related event, overturning for instance, is
the same every time anytime accident occurred.
The trials are independent-eac/7 accident is completely independent from the
previous or the following one.
There are a finite number of trials
The following terminology can be adapted to provide analytical framework of the
pattern recognition through direct diagnostics in accident occurrence.
SFi Denotes a specific Safety Performance Function representing roadway
segment or an intersection
Xai [Xai, Xa2,...,Xan] Represents a feature vector comprised of accident
listing of the roadway segment directionally arranged in relation to roadway
reference system, or reflecting an accident listing at an intersection.
P(SFi) The probability that we are presented with a Safety Performance
function i.
P(Nai/SFi) The probability that Nai accidents of specific type would be
observed given a Safety Performance Function Sfi.
76


Pi The probability of observing a specific accident type during each accident
event
P(SFi/Nai) The a posteriori probability that we are presented with a Safety
Performance Function Sfi given a feature vector Xai, containing Nai accidents
of specific type.
Assume that feature vector Xai represents a sample of accident history drawn
from a roadway facility represented by a safety performance function SFi. The
probability that exactly Nai accidents of a specific type will be observed out of
total of Nti accidents is given by the binomial distribution:
Xai G SFi P(Nai,Nti,Pi) = Pi (1 Pi)m~Nai (6.1.1)
where Nai=0,1,2,...,n accidents, and
kNaJ (Nti Nai)! Nai!
The probability that Nai or fewer accident will be observed out of Nti Bernoulli
trials can be computed as follows:
ATti\ Nai
77


(6.1.2)
Nai
P(X < Nai, Nti; Pi) = ^
Nti\
£o (Nti i)! i
77^(1 -Pi)
Nti-i
The probability that Nai or more accidents will be observed is expressed as:
P(X > Nai, Nti; Pi) = 1 P[X < (Nai -1)] =
Nail
=i-i
Nti l
i=o (Nti i)! i!
tP!0--p,)
Nti-i
(6.1.3)
IF
P(X> Nai,Nti; Pi) < Pcr
Where Per is some established threshold for making a classification decision,
then the feature vector Xai [Xai,Xa2,...,Xan] is classified as not belonging to a
specific Safety Performance Function SFi. In terms of accident analysis it
means that a roadway segment or junction which generated Xai[Xai,Xa2,...,Xan]
contains an element which triggers deviation from a random statistical process
in the direction of reduced safety.
78


6.1.1 Example of Application of Direct Diagnostics
Methodology
To illustrate the application of the concept of direct diagnostics lets examine a
case history of diagnosing and addressing a safety problem at an urban
signalized intersection. A total of 246 accidents were reported in the five-year
period and 97 of them were approach turn accidents. Approach turn accident
is the most frequent accident type at this location. An accident diagram on
(Figure 6.1) presents additional information on the direction of travel, accident
Accident Types: Hampden Ave. at Monaco Pkwy.
SS Opposite
2% (5)
SS Same
7% (1.8)
All Other Types
(2 or less each)
5% (12)
Broadside
9% (21) .
Approach Turn
39% (97)
Rear End
38% (91)
Figure 6.1 Accident Distribution by Type
severity and time of accident occurrence. The critical question in the accident
analysis of this intersection can be formulated as follows: Is it normal to
experience 97 approach turn accidents out of 246 total, or is there something
79


present at the site which triggers increased frequency of these accidents? Direct
Diagnostics Analysis can help us to answer this question. Based on 8 years of
records, approach turn accidents represent 17% of the total at urban signalized
intersections, then Pi=0.17. Lets now compute the probability of observing 97
or more approach turn accidents if 246 accidents have occurred.
P(X i=o (n -/)!*'!
P(X>97) = 1-P(X<96)
96 0A(\ I
P(X > 97) = 1 y---------0.17'(1 0.17)246~' 0
^(246-/)!/!
The probability of observing 97 or more approach turn accidents out of 246 total
accidents at a normal urban signalized intersection is approaching 0, which
suggests that there is a significant potential for accident reduction. In other
words, there is something in the environment of this intersection, which triggers
deviation from the random statistical process in the direction of reduced safety.
Field investigation revealed that double left turn at each approach could be
performed during permitted-protected turn phase. Permitted left turn on green
with double left turn lane assignment is generally associated with limited sight
distance and consequently a high number of approach turn accidents. This sort
of safety problem at a signalized intersection can be effectively addressed by
introducing protected left turn phasing only.
80


81


6.2 Pattern Recognition
There is an important distinction between direct diagnostics and pattern
recognition in studying safety of road segments. If selected accident
characteristics of a roadway segment are compared to some established norm,
more often than not the deviation from the norm will not be statistically
significant. This can be explained by the dilution of the intensity of a particular
component of accident characteristics profile over the length of the segment.
The longer the segment the less likely that it will display significant deviation from
the norm for any of the accident characteristics. This dilution often leads to
overlooking of significant safety problems susceptible to correction. For instance
lets examine a hypothetical roadway segment which is 1 mile long with the
following accident history over a 3 year period:
10 accidents total
7 overturning
2 rear-end
1 fixed object
Lets also assume that overturning accidents on the average represent 20% of
the total for this functional class. Considering that each accident can be viewed
as an independent Bernoulli trial with 20% probability of overturning, we can
compute the probability of having 7 or more overturning accidents out of 10 as
follows:
82


P(X > 7,10;0.20) = 1 P{X < 6,10;0.20)
6 10!
P{X > 7) = 1 Y-----------0.20'(1 0.20)10 = 0.09%
ifo (10 z)!z!
As can be seen from the above calculations the probability that 7 accidents or
more out of 10 will result in overturning as part of a normal statistical process is
extremely low 0.09%. Such low probability suggests that something in the
roadway environment triggers overturning accidents. This element needs to be
identified and corrected.
Lets now consider a different situation where that same segment is located
within project limits of a 5-mile long roadway improvement project, Figure 6.3.
Figure 6.3 Accidents by Segment
CONSTRUCTION PROJECT LIMITS
mile 1 mile 2 mile 3 mile 4 mile 5
2 Overturns 2 Overturns 7 Overturns 3 Overturns 1 Overturn
8 Other 8 Other 3 Other 7 Other 9 Other
50 Accidents Total
<
14 Overturns
36 Other
83


The accident history over a 3 year period is as follows:
50 accidents total
14 overturning
10 rear-end
25 fixed object accidents
Lets compute the probability of 14 or more overturning accidents out of 50 total.
P(X > 14,50;0.20) = 1 P(X < 13,50;0.20) =
= 0.12- certainly a strong possibility
If a 5-mile road improvement project was examined for overturning frequency
using only direct diagnostics method we would have concluded that no
overturning problem is present within project limits, yet we know that at least one
mile outfive(5) has a serious overturning problem. The question then becomes
how can we identify a hidden safety problem within project limits? In other
words how can we systematically recognize a pattern of accidents within a
roadway segment?
The problem is that of detection of deviation outside the boundaries of the
84


random Bernoulli process in the direction of reduced safety. This deviation is
frequently confined to a very limited area and needs to be recognized (ferreted
out) or classified as such through some form of propagation of continuous
statistical testing. In order to make appropriate classification decision some
amount of a priory knowledge is required about the expected system
performance. This knowledge was derived from an extensive data-set
describing various characteristics of accident distribution profile endemic to
specific classes of roads. This data-set was compiled for six (6) classes of roads
over a period of 8 years and contains 84 different parameters related to accident
occurrence such as accident type, severity, roadway conditions etc.. It
represents a source of a priory knowledge base required for computing a
posteriori probabilities.
In order to further illustrate the need for pattern recognition analysis lets
examine a case history involving a 2-lane road in the mountainous area. Over
five years of accident history the 7 mile road segment experienced 142
accidents. Safety Performance Function analysis reveals that accident
frequency is well within expected range for this type of facility. SPF graph
reflecting six years of accident history (averaged over three years period) for the
roadway segments in the study is presented on Figure 6.4. Although accident
frequency is well within expected range, the examination of the accident listing
revealed unusual concentrations of night time accidents. Figure 15 shows
cumulative graph of night time accidents within study limits. Lets test whether
or not the overall number of night time accidents is over-represented using direct
diagnostic approach, considering that 45 out 142 accidents occurred under
85