Citation |

- Permanent Link:
- http://digital.auraria.edu/AA00003442/00001
## Material Information- Title:
- Road accident prediction modeling and diagnostics of accident causality a comprehensive methodology
- Creator:
- Kononov, Jake
- Publication Date:
- 2002
- Language:
- English
- Physical Description:
- xv, 159 leaves : illustrations ; 28 cm
## Subjects- Subjects / Keywords:
- Traffic accidents -- Forecasting -- Mathematical models ( lcsh )
Traffic safety -- Mathematical models ( lcsh ) Traffic accidents -- Forecasting -- Mathematical models ( fast ) Traffic safety -- Mathematical models ( fast ) - Genre:
- bibliography ( marcgt )
theses ( marcgt ) non-fiction ( marcgt )
## Notes- Bibliography:
- Includes bibliographical references (leaves 157-159).
- General Note:
- Department of Civil Engineering
- Statement of Responsibility:
- by Jake Kononov.
## Record Information- Source Institution:
- |University of Colorado Denver
- Holding Location:
- |Auraria Library
- Rights Management:
- All applicable rights reserved by the source institution and holding location.
- Resource Identifier:
- 50741610 ( OCLC )
ocm50741610 - Classification:
- LD1190.E53 2002d .K66 ( lcc )
## Auraria Membership |

Full Text |

ROAD ACCIDENT PREDICTION MODELING AND DIAGNOSTICS OF
ACCIDENT CAUSALITY A COMPREHENSIVE METHODOLOGY by Jake Kononov B.S., University of Colorado at Denver, 1982 M.S., University of Colorado at Denver, 1990 A thesis submitted to the University of Colorado at Denver in partial fulfillment of the requirements for the degree of Doctor of Philosophy Civil Engineering 2002 2001 by Jake Kononov All rights reserved This thesis for the Doctor of Philosophy Degree by Jake Kononov has been approved by ~7 Sarosh Khan Keith Molenaar f &/ Date Kononov, Jake (Ph.D., Civil Engineering) Road Accident Prediction Modeling and Diagnostics of Accident Causality A Comprehensive Methodology Thesis directed by Professor Bruce N. Janson ABSTRACT This dissertation formulated a comprehensive methodology for road accident prediction and diagnostics of accident causality. It provides conceptual blueprint and the analytical framework for the development of the Highway Safety Manual currently contemplated by the Transportation Research Board. Accident prediction models known as Safety Performance Functions (SPF) were developed for the following facilities: rural 4-lane interstate freeways in the mountainous terrain, rural 2-lane, arterial highways in the mountainous terrain, rural 4-lane interstate freeways in rolling and flat terrain. Accident models were developed using Poisson distributional assumptions. Additionally conceptual formulation of the level of service concept applicable to highway safety was developed. Diagnostics of accident causality is performed by using pattern recognition algorithm and direct diagnostic methods introduced in this dissertation. This methodology is based on the idea that traffic accidents can be viewed as independent Bernoulli trials and that it is possible to detect deviation from the random statistical process by computing cumulative probability for each of the IV accident characteristics. In the course of this study a framework of normative parameters to provide a knowledge base for the diagnostics of safety problems was developed. Development of the diagnostic knowledge base and of pattern recognition algorithm led to the following finding: Existence of accident patterns susceptible to correction may or may not be a accompanied by the over- representation in accident frequency detected by the safety performance functions or high accident rates. The implication of this finding on the road safety policy is as follows: cost-effective safety improvement counter-measures may be constructed at locations which exhibit overall accident frequency well within expected range. This point is generally overlooked by the public agencies funding road safety improvement projects. This abstract accurately represents the content of the candidates dissertation. I recommend its publication. Signed Bruce N. Janson v A woman of valor, who can find her? Her price is far above rubies, She works with her hand as well as her mind. A Woman of Valor (XV Century Hebrew Text) Unknown Author DEDICATION To my mother, who is a true woman of valor. A courageous physician, a respected scientist in the field of immunology and AIDS research, a loving mother and a devoted wife. ACKNOWLEDGMENTS I am very grateful to each of the members of my doctoral committee for their support during the completion of this thesis. Very special thanks to my advisor, Professor Janson, for the wise guidance and instructions he has generously given me throughout my studies. CONTENTS Figures................................................................xi Tables................................................................xiv Chapter 1. Introduction....................................................... 1 1.1 Research Problem................................................. 1 2. Review of Extant Literature.........................................7 2.1 Studies Relating Specific Geometric Features to Safety ...........7 2.2 Studies Relating Exposure to Safety ............................. 11 2.3 Studies of Diagnostic Methodologies.............................. 14 3. Research Objectives and Methodology............................... 17 3.1 Overview.......................................................... 17 4. Modeling Relationships Between Traffic Volume and Traffic Safety.......................................... 19 4.1 Safety Performance Functions...................................... 19 4.2 Philosophy and Methodology of Model Fitting ......................25 4.2.1 Choice of the Model Form.........................................26 4.2.2 Choice of the Underlying Distributional Assumptions..............29 VIII 4.3.1 Dataset Preparation............................................... 36 4.3.2 Selection of Minimum Segment Length and its Effect on the Model...........................................43 4.3.3 Removal of Outliers...............................................46 4.4.1 Exploratory Analysis and Model Fitting ............................48 5. Behavioral Interpretation and Levels of Relative Safety ............66 5.1 Behavioral Zones...................................................66 5.2 Levels of Service of Safety and Additional Benefits of SPF ....... 69 6. Direct Diagnostics and Pattern Recognition Analysis ................75 6.1 Direct Diagnostics.................................................75 6.1.1 Example of Application of Direct Diagnostics Methodology ..................................................... 79 6.2 Pattern Recognition .............................................. 82 6.3 Analytical Framework for Pattern Recognition of Accident Occurrence on Roadway Segments ............................90 6.4 Chapter Summary and Menu of Normative Characteristics............. 97 7. Integrated Use of Safety Performance Functions and Diagnostic Techniques........................................... 111 7.1 Diagnostic Expert System......................................... 111 7.2 Application of Diagnostic Methodology to a Roadway Segment........................................................... 118 IX 8. Conclusions 132 8.1 General....................................................... 132 8.2 Safety Performance Functions.................................. 132 8.3 Diagnostics and Pattern Recognition .......................... 134 8.4 Implications on Policy of Highway Planning and Design......... 135 Appendix......................................................... 138 A. Rural Flat and Rolling 4-Lane Freeways Abbreviated Dataset ... 139 B. Rural Mountainous 2-Lane Arterials Abbreviated Dataset........ 145 C. Rural Mountainous 4-Lane Freeways Abbreviated Dataset ........ 154 References........................................................ 157 x FIGURES Figure 4.1 Smoothed Frequency for Total Accidents .....................27 4.2 SPF for 2-Lane Roads........................................28 4.3 Neural Networks Model ......................................28 4.4 Data-set Preparation-Freeways.............................. 38 4.5 Data-set Preparation-Arterials..............................38 4.6 Scatter Plot-Mountainous Freeways ..........................48 4.7 Scatter Plot 2-Lane Mountainous Arterials.................49 4.8 Scatter Plot Flat and Rolling Freeways ...................51 4.9 Fitted Model Mountainous Freeways...........................52 4.10 Mountainous Freeways CURE Plot..............................53 4.11 Fitted Model Mountainous Arterials..........................54 4.12 Mountainous Arterials CURE Plot ............................55 4.13 Fitted Model Flat and Rolling Freeway.......................56 4.14 Flat and Rolling Freeways CURE Plot.........................57 4.15 Rural Flat and Rolling Freeways with Empirically Calibrated Variance....................................................59 XI 4.16 Comparison Between Poisson and Empirically and Incrementally Calibrated Error Structure .................... 61 5.1 Driver Behavioral Zones and Levels of Relative Safety...........67 5.2 Safety Performance of 4 and 6 Lane Mountainous Freeways ......................................... 73 6.1 Accident Distribution by Type.................................. 79 6.2 Accident Diagram................................................81 6.3 Accidents by Segment........................................... 83 6.4 Case History SPF ...............................................86 6.5 Night-Time Accident Concentration Graph.........................87 6.6 Pattern Recognition Algorithm Diagram...........................91 6.7 Pattern Intensity Graph ........................................96 7.1 Expert System-Core Components................................. 113 7.2 Conceptual Composition of Diagnostic Expert System for Highway Safety ................................... 115 7.3 Study SPF..................................................... 118 7.4 Accident Distribution by Type................................. 119 7.5 Fixed Object Collisions Concentration Graph .................. 122 7.6 Pattern Intensity Graph ...................................... 124 XII 7.7 Fixed Object Collisions by Direction-Cluster 1.................. 129 7.8 Fixed Object Collisions by Direction-Cluster 2.................. 130 7.9 Accident Concentration GIS Map.................................. 131 XIII TABLES Table 4.1 Dataset Extract Mountainous Freeways ..................40 4.2 Dataset Extract 2-Lane Mountainous Arterials...........41 4.3 Dataset Extract Flat and Rolling Freeways ..............42 4.4 Relationship Between R2 and Segment Lengths...............44 4.5 Control Limits Rural Flat and Rolling 4-Lane Freeways..63 4.6 Control Limits Rural Mountainous 2-Lane Arterials.......64 4.7 Control Limits Rural Mountainous 4-Lane Freeways....... 65 5.1 Level of Service Criteria in Reference to Expected Safety Performance for Rural Flat and Rolling 4-Lane Freeways...70 5.2 Level of Service Criteria in Reference to Expected Safety Performance for Rural Mountainous 2-Lane Arterials ......71 5.3 Level of Service Criteria in Reference to Expected Safety Performance for Rural Mountainous 4-Lane Freeways........ 72 6.1 Tabulation of Pattern Intensity Score.................... 96 6.2 Normative Percentages for Diagnostics - Rural Flat 2-Lane Undivided .............................99 XIV 6.3 Normative Percentages for Diagnostics - Rural Rolling 2-Lane Undivided............................. 101 6.4 Normative Percentages for Diagnostics - Rural Mountainous 2-Lane Undivided ....................... 103 6.5 Normative Percentages for Diagnostics - Rural Flat 4-Lane Interstate............................... 105 6.6 Normative Percentages for Diagnostics - Rural Rolling 4-Lane Interstate .......................... 107 6.7 Normative Percentages for Diagnostics - Rural Mountainous 4-Lane Interstate........................ 109 7.1 Pattern Intensity Score..................................... 123 xv 'There is nothing as practical as a good theory. 1. Introduction 1.1 Research Problem Traffic accidents are considered an expected byproduct of highway travel but just how many accidents we should "expect" per unit of traffic exposure over a unit of time is not altogether clear at present. In fact it constitutes a highly complex problem faced by the transportation engineering profession today. If expectations are not well defined or clearly understood, then the question becomes how is it possible to identify the deviation from the norm and then do something about it? Despite many years of modern road building this most fundamental question of highway safety has not been adequately addressed to date. Transportation engineers dealt successfully over the years with the question of highway capacity. The problem was clearly formulated by the Highway Research Board in 1944 when the Committee on Highway Capacity was first established. The first edition of the Highway Capacity Manual (HCM) was published in 1950, it provided initial fundamentals of capacity for uninterrupted-flow facilities, signalized intersections, weaving sections and ramps (TRB,1950). Since that time there have been four new editions of the HCM (TRB, 2000). The relationship between traffic volumes, capacity and level 1 of service for different types of highway facilities is reasonably well understood at present. Our understanding of highway capacity is enhanced with each successive publication of the Highway Capacity Manual by the Transportation Research Board (TRB). In contrast to highway capacity, the relationship between traffic volume, physical characteristics of roads and safety is not well understood or known, at least not with the kind of precision customary in other engineering disciplines. There has not been a concerted effort by the TRB to produce a Highway Safety Manual similar in its intent and scope to the HCM. Conceptually such a document should systematically examine the expected accident byproduct of roadway segments (freeways, arterials, 2-lane roads etc.) as well as junctions (intersections, interchanges). Despite countless studies exploring various aspects of highway safety of transportation facilities there is no consensus among transportation engineers as to what constitutes a safe or unsafe highway, intersection or interchange. A parallel could be drawn in the field of medicine, how can a physician prescribe blood pressure reducing medication if there is no consensus in the medical profession as to what constitutes normal or acceptable blood pressure levels? Perhaps the question should be reformulated as follows: Should a physician prescribe medicine effecting blood pressure to a patient whose blood pressure he can not measure? Over the years the questions dealing with highway safety have not been 2 formulated or addressed with sufficient clarity or specificity. The difficulties normally associated with statistical analysis of highway safety problems are attributed to the large number of interrelated factors contributing to accidents. These factors generally include human behavior, environmental conditions, and vehicle and roadway characteristics. The problem is further complicated by the lack of reliable exposure data coupled with difficulties of obtaining detailed geometric design information. Earlier research efforts examined this relationship using different approaches and statistical techniques and yet because of the complexity of the issue and problems with obtaining reliable data no conclusive results have been drawn. Furthermore, traditionally there is an institutional gap between those engineers who identify hazardous locations and those who design and build roads. These functions are generally performed separately, which often prevents the development of a factual knowledge base on matters of highway safety. Despite insufficient factual knowledge on the subject and absence of consensus among the professionals, a great majority of efforts initiated by transportation engineers to improve safety result in accident reduction. Having acknowledged this important fact, I would like to pose the following question: Is it responsible or ethical not to make the most out of limited financial resources allocated to the improvement of safety on public roads? Over the last 50 years of modern road building the safety of highways has been measured in accident rates. The use of accident rates is implicitly based on the 3 assumption that the number of accidents on a segment of road is directly proportional to the amount of traffic. In essence the linearity of the relationship between exposure and safety implies that driver responses to a specific set of roadway characteristics do not change with an increase in traffic. This assumption makes for a number of conceptual difficulties, but more importantly there is significant amount of empirical data to the contrary. In an effort to normalize safety on the basis of traffic exposure the concept of accident rate is widely used throughout the country. Virtually all Departments of Transportation across the US publish annual reports compiling accident rates typical of roads of various functional classes. In the absence of other widely accepted methodology, the main appeal of using accident rate appears to be simplicity. Its use however often does not reflect the reality of the problem at hand and often leads to poor investments in safety improvements. The relationship between the amount of traffic measured in ADT and accident count for a unit of road section over a unit of time is complex and dynamic. It reflects the interaction between driver behavior, vehicle characteristics and roadway environment. In an effort to understand this complex phenomenon it is critical to consider that changes in driver behavior are induced not only by the physical characteristics of the roadway and environment but also by other drivers. In fact it can be said that drivers influence each other differently over a wide range of traffic exposure and composition. There is an emerging consensus among traffic safety researchers that a non-linear relationship exists between traffic exposure and safety. This relationship is reflected by the Safety Performance Functions (SPF) calibrated 4 for various classes of roads. One of the main uses of SPF is to identify locations with potential for accident reduction. While this application is certainly important, its use is limited to identifying sites exhibiting accident frequency higher than expected for a specific facility at a specific level of Average Daily Traffic (ADT). SPF provides no information, however, related to the nature of accident occurrence, it only speaks to the magnitude of the problem. Without being able to properly and systematically relate accident frequency and severity to roadway geometries, traffic control devices, roadside features, roadway condition, driver behavior or vehicle type it is not possible to develop effective counter-measures. In other words, there can be no effective treatment without accurate diagnosis. In the field of medicine, physicians are expected to spend a minimum of 3 years in apprenticeship after graduation from Medical School. During the periods of internship and residency, physicians learn how to recognize diseases as well as how to treat them. In contrast to medicine, transportation engineers are only trained in administering treatment (i.e designing road safety improvements) without learning the science of diagnostics. There is no established course of instruction at the graduate level civil engineering curriculum that provides a definitive methodology on how to relate accident causality to the roadway environment. There is also very little reliable information on this subject in research literature. Most research efforts are focused on the development of accident prediction models and identification of "black spots". It is somehow implied that transportation engineering professionals will always know how to treat a high accident location once it has been identified, when in reality very little 5 is known on the subject. In the course of performing in-depth project level safety assessments for hundreds of sites a new methodology was developed to provide guidance in the diagnostics of safety problems and development of appropriate counter-measures. A data-set was compiled using accident and traffic data for different classes of roads over a period of 8 years. A framework of 84 normative parameters was developed to provide guidance in the diagnostics of accident causality and recognition of accident patterns. Considering that traffic accidents can be viewed as random Bernoulli trials it is possible to detect deviation from the statistical process by computing a cumulative probability for each of the 84 normative parameters. The deviation from a random statistical process in the direction of reduced safety generally suggests a potential for accident reduction related to a specific parameter. The diagnostics process of highway safety problems on a section of road is in many ways similar to making a medical diagnosis. While diagnostics is an integral part of medicine, much remains to be done by the transportation engineering profession in order to institutionalize this critical component of understanding the highway safety problem. 6 Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information upon it. From James Boswell, Life of Johnson 1791. 2. Review of Extant Literature The review of extant literature is organized in the following 3 groups: 2.1 Studies relating specific geometric features to safety 2.2 Studies relating exposure to safety. 2.3 Studies of diagnostic methodologies 2.1 Studies Relating Specific Geometric Features to Safety The complex relationship between the driver, vehicle, roadway characteristics and safety is not well understood. While numerous researchers attempted to establish the relationships between safety and specific geometric design elements, the results are decidedly mixed. The only consensus among practicing transportation engineers is that none of the documented relationships is definitive, which is an alarming state of affairs after over half a century of modern road building. McGee (McGhee,1995) observed that: "Many of these studies have focused solely on one aspect of the design (e.g., degree of curvature for individual horizontal curves) without considering other geometric parameters (e.g., upstream and downstream horizontal alignment). Examining the relationship between accidents and individual highway design variables 7 without considering the interactive effect of other parameters can yield biased and masked relationships." The tendency to relate isolated geometric characteristics to accidents within the framework of a predictive model influenced thinking of several generations of researchers but did little to improve safety. Despite the commonly held view that pavement width significantly affects safety there is little empirical data to support a scientifically defensible relationship. A 1986 study (Zegeer et al.,1986) attempted to relate safety to lane width and shoulder width on two-lane rural roads. This model did not consider the effects of horizontal or vertical alignment, the frequency of access points or operating speeds. It is possible that these other factors not considered in the study influence accident frequency more significantly than shoulder width or lane width, which may partially explain the relatively low R2 of the model. Glennon (Glennon,1987) developed an accident prediction model for horizontal curves. Its main application is to predict accident reduction associated with curve flattening while maintaining the same central angle. This model does not consider curve length, superelevation, and roadside or geometric design consistency. Another horizontal curve model was developed by Zegeer et al.(Zegeer et al.,1991). It relates expected number of accidents on the curve to the degree of curve, length of curve, traffic volume, pavement width and presence of spirals. It does not consider however, the effect of vertical alignment or the impact of upstream or downstream alignment. Its focus is on the accident prediction within the confines of an isolated curve. The tendency 8 to isolate only selected geometric features and relate them to safety has many limitations and makes for a number of conceptual and methodological difficulties. It negates the very important influence of driver expectancy and design consistency. For instance, a fairly sharp curve in the mountainous terrain can experience relatively few accidents, while a much milder curve in a different environment may present serious difficulties to the driver. Therefore, efforts to relate specific curvature to safety without carefully considering driver expectancy and roadway environment are generally not successful. The importance of curve signing, striping, lighting and delineation becomes more critical at locations where design consistency or driver expectancy is violated. These factors are not currently considered by any of the accident prediction models. Hauer in Safety in Geometric Design Standards (Hauer, 1999) asserts that roads designed to standards are not safe, not unsafe, nor are they appropriately safe; roads designed to standards have an unpremeditated level of safety. He correctly observes that For a road design standard to be the embodiment of some appropriate safety, it must be true that those who write the standards can anticipate the extent to which important road design decisions affect safety. It may come as a surprise that, typically, writers of standards did not know how what they choose affect safety. To test the verity of this irreverent assertion is simple. One only has to ask the highway designer or the member of the standards committee questions such as: Approximately how many crashes will be saved by increasing the horizontal radius of this road from 100 m to 200 m; how many by making lanes 12 instead of 11 feet; or by how much will crash severity be reduced by changing this side-slope from 3:1 to 5:1?. If they cannot 9 answer, then the safety built into the current standards cannot be appropriate. A clear indication of the veracity of Hauer's claim is the fact that at present we still do not have a tool that can predict the road safety consequences of alternative highway designs. Additionally Hauer observes (Hauer, 1999) that In road safety, intuition is a fallible guide and plausible conjectures often turn out incorrect. Furthermore, current design standards try to represent road users by certain fixed parameters and fail to recognize the fact the road users remember the roads traveled and the road behind and adapts to the road ahead. As a result, the relationship between design standards and road safety is unclear and the level of safety designed into roads is unpremeditated. In A Case for Science-Based Road Safety Design and Management Hauer formulated a sound strategy to solving a highway safety problem (Hauer, 1988): Much of what made common sense has been tried; (except for the packaging and protection of car occupants) could not be shown effective. I think this pessimism to be unjustified and premature. Did we condemn the efficacy of medicine because spells, leeches and exorcism have had no demonstrable healing effect? Of course not. The response of medicine was to develop a knowledge-based profession. The engine of progress was: insistence on respectable training that is a prerequisite for license to practice, the institution of pathology, the experiment, measurement, the planned clinical trial. Talent , money and resolve were found to deliver science-based health care. The use of spells and exorcism in medicine has much diminished and we even seem to know when the medical use of leaches can be considered. Not so in road safety. The comparison of the delivery of road safety to the delivery of health is 10 not far fetched. In road safety we are somewhere at the leaches, spells and exorcism stage. Instead of disillusionment, the response should be a resolve to embark upon an era of science-based safety management. It alone promises to deliver results. 2.2 Studies Relating Exposure to Safety There is an emerging consensus among researchers that there is a non-linear relationship between exposure and safety, however there is no consensus as to what this relationship is. Most of the studies assert that accident rate increases with increase in exposure (Hall and Pendleton, 1990). Harwood concluded (Harwood, 1995): "It would be extremely valuable to know how safety varies with Volume to Capacity (V/C) ratio and what V/C ratios provide minimum accident rate. Only limited research has been conducted on the variation of safety with V/C ratio. More research of this type is needed, over a greater range of V/C ratios, to establish valid relationships between safety and traffic congestion to provide a basis for maximizing the safety benefits from operational improvement projects." Hall and Pendleton observed (Hall and Pendelton, 1990) that "The implications of the existence of a definite relationship between traffic accident rates and the ratio of current or projected traffic volume to capacity is quite significant . Knowledge of any such relationship would help engineers and planners assess the safety implications both of projected traffic growth on existing highways and 11 of highway improvements designed to increase capacity. Hall and Pendleton expressed a concern about the amount of scatter present in the initial dataset in previous research efforts. They attribute this scatter to a variety of driver, vehicle and roadway factors contributing to accident occurrence. In our experience a significant amount of scatter can be attributed to the fact that in most previous studies intersection and interchange related accidents were not isolated from the accidents which occurred on in-between segments of freeways or arterials. The amount of traffic on the cross-road at an interchange or a side-road at an intersection, to a large degree, defines the magnitude of conflict at an interchange or an intersection. All other things being equal (meaning similar geometries and traffic control) this conflict is largely responsible for the amount of accidents at these facilities. Belanger developed the following accident prediction model for rural intersections which supports this view (Belanger, 1995): Accidents/year=.00204(AADT major road)042 (AADT minor road)0 51 McDonald (9) developed a similar model for the intersections on divided highways: Accidents/year=0.000783(AADT major road)0 455 (AADT minor road)0633 A study by Zhou and Sisiopiku (Zhou and Sisiopiku, 1997) examined the general relationship between accident rates and hourly traffic volume/capacity ratios. A 12 heavily traveled section of urban Interstate in Michigan was selected as the study location. This location contained 79 ramp access points over 16 miles of freeway. Because the ramp weaving activities influence the en tire segment it was assumed that this segment is representative of urban freeways. It is possible that merging/diverging and weaving maneuvers can account for a significant proportion of accidents, and therefore ramp volumes or cross-road volumes may be viewed as explanatory variable not accounted for in the study. Considering freeway turbulence in the merge/diverge zones and resulting accidents identified by Janson (Janson et al., 1998), interchanges should be isolated and examined separately from the freeway segments in between. Because of merging/diverging and weaving conflicts the relationship between accident rates and V/C ratio for freeways does not lend itself well to the examination in densely developed urban areas. It is more meaningful to study this relationship in rural areas where interchange spacing is sufficiently long that it is possible to isolate roadway segments from the interchange-related conflicts. 13 2.3 Studies of Diagnostic Methodologies Over the last 50 years of modern road building it was somehow implied that transportation engineering professionals will always know how to treat a high accident location once it has been identified, when in reality very little is known on the subject. This sorry state of affaires is best expressed in (Hauer, 1996) lf the site has been identified because its accident record is unusual, one has also to find out why. Thus, the detailed safety analysis stage is akin to a process of medical diagnosis, with perhaps a keener awareness of costs and budgets, a process requiring knowledge of causes, effects, and economics. One might expect that this task would be performed by specialists whose training in this matter is extensive and based on knowledge of fact. Unfortunately, this is not so. For some reason, perhaps because of a fascination with matters statistical or perhaps because it is a headquarters function, a great deal of thought has been devoted to the identification stage. Much less has been written about, or taught to engineers, how to conduct a detailed safety analysis of a site. Yet, not common sense, practical experience, engineering judgement, or the usual highway and traffic engineering lore is a sufficient guide. To be effective, it is not enough to produce reasonable lists of candidate sites to be investigated in the order of priority. It is also necessary to equip the engineer with the training and the tools to make a safety diagnosis on the basis of the specific kinds of accidents that have occurred, the conditions in which they occurred, and the characteristics of the site. Furthermore, it is necessary to give the engineer realistic estimates of what safety improvements can be expected. This, at present is a tall order. 14 Once again we will turn to the field of medicine for the conceptual and methodological guidance on how to formulate a solution and also observe an interesting analogy. In the United States, the initial impetus for developing a classification of mental disorders was the need to collect statistical information. In 1917, the Committee on Statistics of the American Psychiatric Association, together with the National Commission on Mental Hygiene, formulated a plan that was adopted by the Bureau of the Census for gathering uniform statistics across mental hospitals. In 1952 the American Psychiatric Association (APA, 1952) developed an authoritative guide on diagnostics of mental disorders called Diagnostic and Statistical Manual of Mental Disorders (DSM-1). In part because of the lack of widespread acceptance of the mental disorder diagnostic categories the World Health Organization (WHO) sponsored a comprehensive review of diagnostic issues that was conducted by the British psychiatrist Stengel. According to the American Psychiatric Association (APA, 1994) His report can be credited with having inspired many of the recent advances in diagnostic methodology-most especially the need for explicit definitions as a means of promoting reliable clinical diagnosis Since 1952 there have been three more editions of DSM. Over the last half a century thousands of psychiatrists systematically collected data to advance diagnostic methodology. In contrast to medicine, no such undertaking by the transportation engineering profession has taken place. In highway safety, just like in medicine there can be no effective treatment without accurate diagnosis. 15 In the fourth edition of the Diagnostic and Statistical Manual of Mental Disorders, DSM-4 (APA,1994) the American Psychiatric Association (APA) cautions that The diagnostic categories, criteria, and textual descriptions are meant to be employed by individuals with appropriate clinical training and experience in diagnosis. It is important that DSM-IV not be applied mechanically by untrained individuals. The specific diagnostic criteria included in DSM-IV are meant to serve as guidelines to be informed by clinical judgement and are not meant to be used in a cookbook fashion. Furthermore APA cautions that The proper use of these criteria requires specialized clinical training that provides both a body of knowledge and clinical skills. A similar caution is relevant to the practice of transportation engineering, yet at present a very limited factual knowledge base exists to assist transportation professionals in making diagnostic decisions. 16 Since the measuring device has been constructed by the observer... we have to remember that what we observe is not nature in itself but nature exposed to our method of questioning. Physics and Philosophy [1958] Wemer Karl Heisenberg, 1901-1976 3. Research Objectives and Methodology 3.1 Overview The primary objective of this research was to develop a comprehensive methodology for identifying locations with potential for accident reduction and conducting a diagnostics analysis of accident causality. Because of conflicts between traffic flows at junctions, intersections and interchanges have distinctly different safety performance characteristics from roadway segments. With this in mind, intersections and roadway segments should be studied separately. Three methodologies were developed for identification of locations with potential for accident reduction: Calibration of the Safety Performance Functions (SPF) Direct Diagnostics Analysis Accident Pattern Recognition Analysis The use of these methodologies were examined and compared in the course of the study for the following facilities: Four lane rural freeways in mountainous terrain Two lane rural highways in mountainous terrain Four lane rural freeways in rolling and flat terrain 17 Following identification of locations exhibiting higher then expected accident frequency using Safety Performance Functions a methodology for diagnostics of accident causality at these locations was introduced. 18 4. Modeling Relationships Between Traffic Volume and Traffic Safety 4.1 Safety Performance Functions The relationships between expected level of safety and traffic exposure for specific types of roadway are known as Safety Performance Functions as described in (Hauer and Persaud, 1997). Safety Performance Functions reflect the complex relationship between exposure usually measured in ADT, and accident count for a unit of road section or a junction over a unit of time. These safety performance functions are referred to as "simple" because they reflect the relationship between exposure and safety for road populations sufficiently similar that the only independent variable required is ADT. We aim to produce models that relate accident frequency and ADT. The models will be of the form: Accidents/(mile-year)=f(ADT) Naturally the question of considering the influence of other explanatory variables has to be addressed. Why not introduce these variables into the model explicitly? In other words, why not develop a multiple regression model with as many variables as possible? Gujarati in Basic Econometrics (Gujarati, 1995) examined the reasons for limiting the number of variables in the regression which are discussed below in the framework of accident prediction modeling. Vagueness of theory: The theory, if any, determining how the accident frequency 19 is influenced by geometric design features, traffic control devices, traffic operations, driver behavior or vehicle type is at best incomplete at present. We might know for certain that ADT influences accident frequency, but we are largely ignorant or unsure about the other variables affecting it. Therefore, the error term in the regression was used as a substitute for all the excluded or omitted variables from the model. Unavailability of data: Even if we know what some of the excluded variables are and therefore consider a multiple regression rather than a development of simple Safety Performance Function (SPF), we do not have quantitative information about these variables. Geometric design data, for instance, which is thought to be of interest is generally unavailable. Therefore we made a decision to use a proxy of relevant geometric design variables by stratifying roads by the functional classification, number of lanes, type of terrain and urban or rural environment. Additionally, intersection and interchange-related accidents and roadway segments were isolated from the data-set prior to fitting of the model. Core variables vs. peripheral variables: Lets consider that besides average daily traffic (ADT), lane width, shoulder width, stopping sight distances, horizontal curvature and vertical grades also effect accident frequency. But it is quite possible that the joint influence of all or some of these variables may be so small and at best non-systematic that as a practical matter and for cost considerations it does not pay to introduce them into the model explicitly. We hoped that their combined effect within the framework of narrowly defined and carefully stratified SPF can be treated as the error term in the regression. 20 Intrinsic randomness in human behavior. All accidents involve some amount of driver error. Even if we succeed in introducing all the relevant variables into the model, there is bound to be some intrinsic randomness in driver behavior that cannot be explained no matter how hard we try. At the system level, however, it is true that at some locations within SPF inventory drivers are more likely to make a driving error than at others and identification of these locations within SPF framework will reveal to us sites with potential for accident reduction. Inaccurate measurements of additional explanatory variables: Although the multi- variate regression model assumes that all variables are measured accurately, in practice, traffic, accident and geometric design data is replete with errors of measurement. Therefore there are some advantages to minimizing introduction of additional error by limiting the number of explanatory variables. Principal of parsimony. Parsimony is defined by the Websters New Collegiate Dictionary (Merriam-Webster, 1980) as-Economy in the use of a means to an end. Employing the Occams razor principal that explanations of unknown phenomena be sought first in terms of known quantities lead us to the following question. If we can explain the accident frequency within a framework of SPF substantially with only ADT and if body of factual knowledge is not extensive enough at present to suggest what other variables might be included, why introduce more variables? Let the error term represent all other variables. Wrong functional form: Even if have identified all correct variables explaining a phenomenon and somehow are able to obtain accurate data on all of them, in 21 the absence of accepted theory we do not know the form of the underlying relationship between the regressand and the regressors. In two-variable models the functional form of the relationship can often be judged from the scattergram. But in a multiple regression model, it is not easy to determine the appropriate functional form. For all of these reasons the magnitude of the error terms in the calibration of the SPF assumes an extremely critical role, which we will demonstrate as we progress. Level of safety in this study is expressed in terms of the number of accidents per mile over a period of one year and exposure is measured in Average Daily Traffic (ADT). The expected level of safety represents normally expected number of accidents per mile of freeway associated with specific level of ADT. Simple Safety Performance Functions (SPF) were developed for several facilities: Four lane rural freeways in mountainous terrain Two lane rural highways in mountainous terrain Four lane rural freeways in rolling and flat terrain Comparison of the accident history of a specific location to the accident prediction model representing the facility would allow us to assess its potential for safety improvement. The scope of application of this methodology is limited to the initial identification of locations with potential for accident reduction. In order to relate accident 22 causality to roadside features, traffic control devices, roadway geometries and traffic operations a diagnostic analysis is usually required. Development of the SPFs lends itself well to the conceptual formulation of the Level of Service of Safety. If the level of safety predicted by the SPF represents normal or expected number of accidents at a specific level of ADT, then the degree of deviation from the norm can be stratified to represent a specific Level of Service of Safety. Another application of developing SPFs is comparing safety of different roads at the same level of traffic exposure. Such a comparison for instance will enable us to assess the change in safety attributable to providing additional capacity. Additionally examination of relationships between traffic exposure and safety for different roads is expected to provide insights into changes in driver behavior across the wide range of ADT. Traditional use of accident rates in problem identification and project selection has delayed comprehensive understanding of the highway safety problem. The assumption of linearity between safety and exposure inherent in the use of accident rates, often leads to mis-diagnosis and poor investments in safety improvements. The relationship between exposure and safety is complex and should be established by employing statistical modeling techniques. The majority opinion among traffic safety researchers expressed in (Miau 1993) and in (Maher and Summersgill, 1996) is that multiple linear regression modeling is not well suited 23 for studying traffic safety problems, yet there is no consensus as to what technique should be used. More recent efforts have employed Generalized Linear Models (GLM) with Poisson and Negative-Binomial assumptions. Although this technique is more statistically appropriate than linear regression, it generally provides poor to moderate fit to the data. Considering the limitations of GLMs and linear regression, the alternate methodology was explored in the course of the study. 24 4.2 Philosophy and Methodology of Model Fitting Modeling in science remains, partly at least an art. Some principles do exists, however, to guide the modeler. The first is that all models are wrong; some, though, are better than others and we can search for the better ones. At the same time we must recognize that eternal truth is not within our grasp. The second principle (which applies also to artists!) Is not to fall in love with the model, to the exclusion of alternatives. Monographs on Statistics and Applied Probability [1989] McCullagh and Nelder In statistical modeling of traffic accidents, we are interested in discovering what we can learn about underlying relationships from empirical data containing a random component. We suppose that some complex phenomenon manifested by accident occurrence (data generating mechanism) has produced the observations and we wish to describe it by some simpler, but still realistic, model that reveals the nature of the underlying relationship. Generally, in a model, we distinguish between systematic and random variability, where the former describes the patterns of the phenomenon in which we are particularly interested. Thus, the distinction between the two depends on the particular question being asked. Random variability can be described by a probability distribution, perhaps multivariate, whereas the systematic part generally involves a regression model, most often, but not necessarily, a function of the mean parameter (Lindsey, 1997). Fridstrom and Ingebrigsten observed that Road casualties are random events. Each single accident is unpredictable in the very strong sense, that had it been anticipated, it would most probably not have happened. Yet the number of accidents recorded within reasonable large geographical units exhibits a striking stability from one year to the next... Although the single event is all but impossible to predict, the collection of such events may very well behave in a perfectly predictable way,...( Fridstrom and 25 Ingebrigsten 1991). This observation reflects the essence of accident modeling and suggests that we should view safety of an entity (road segment for instance) as an underlying stable property that has the nature of a long-term average. 4.2.1 Choice of the Model Form Based on substantial empirical evidence derived from observing safety performance of various roads over extended time periods as well as work of other researchers the following understanding of relationship between safety and exposure has emerged. Accident rates decline when ADT reaches certain threshold endemic to a particular facility in a specific environment. This understanding suggests a choice of underlying function which would reflect this phenomenon. Such a function can be represented by a model form which will show some leveling off associated with approaching some threshold exposure value. Two general model forms are generally employed: E{y} = x^e^x+filX ~ > power family E{y} = X^ (1 + fxX + /32X2...) > polynomial family In this E{y} is the annual number of accidents expected to occur on a mile of road, X is the independent variable (here ADT), and fis are parameters to be estimated. Hauer used Nadaraya-Watson kernel estimator with Gausian kernel to obtain the relationship presented on Figure below. (Hauer 2001). 26 <0 0) > c 'o o < 0 5000 10000 15000 20000 AADT Figure 4.1 Smoothed Frequency for Total Accidents Non-parametric kernel regression used by Hauer is a smoothing technique used to obtain clues about the form of the function underlying the data. Figure 4.2 also from (Hauer 2001) shows safety performance function calibrated for 2-lane rural highways using Poisson maximum-likelihood estimation. Similarfunctional shapes in Figure 4.2 were developed and described in ( Kononov 1999) using Neural Networks-Radial Basis Function. Neural Networks are not constrained by the underlying distributional assumptions and learn by example inferring a model from training data. It is of interest to note that despite the violation of normality and homoscedacity assumptions the linear regression fit of the mean is closely approximating a curve fitted using Neural Networks methodology. 27 Accident* Per Mlla/Yar CO d) >s C 0) g o o < 0 5000 10000 15000 20000 AADT Figure 4,2 SPF for 2-Lane Roads Rural Mountainous 4-Lane Interstate Figure 4.3 Neural Networks Model 28 4.2.2 Choice of Underlying Distributional Assumptions In statistical modeling of traffic accidents, it is assumed that the random variation follows certain probability laws and can be characterized by a probability function. It was observed (Miau and Lum, 1993) that The use of a continuous distribution, such as the normal distribution, is at best an approximation to a truly discrete process. The Poisson distribution, on the other hand, is a natural initial candidate distribution for such random discrete and, typically, sporadic events. At the same time if a Poisson assumption is made about the underlying random variability, it will have a restricting effect of always equating the variance to the mean. In our experience with accident data this assumption is not always true. Similar findings are reported by others (Dean and Lawles, 1989). In many cases accident data exhibit extra variation or over- dispersion relative to the Poisson model. In other words, the variance of the data if often greater then the mean. Safety performance of roadway segments observed over extended time periods represent a set of panel or clustered count data. The potential problem for clustered accident count data is that the multiple accident counts from the same road segments, although generated by different traffic volumes, may not be independent. Negative binomial regression models were proposed to replace Poisson regression models because accident count data are frequently over- dispersed as compared with those data suitable for the methods of Poisson regression models (Miaou and Lum 1993). Like Poisson models, however, negative binomial models still need the assumption of independence, and, therefore, may not be entirely appropriate. In order to make explicit allowance 29 for correlated observations it was suggested (Guo 1996) that multiple counts in the same cluster are subjected to a cluster specific random effect. Guos approach starts with a conventional Poisson regression model and then subjects the multiple counts in the same cluster to a cluster-specific random effect representing unobserved influences shared by all the counts of the cluster. A gamma-distributed cluster-specific effect in this formulation leads to the multinomial regression model. Similar ideas are behind the strategies developed for correlated observations in linear regression modeling (Searle 1987). The Multinomial Regression Model According to the model formulated by Guo, the mean for entity i is given by equation (4.2.1) ^ = = (4.2.1) In this, T.j is exposure, xtj are covariate values, /fare regression constants and 0i is a random variable (for entity i) that comes from Gamma distribution the density of which is in equation (4.2.2) m)= T{
(4.2.2)
dd:
That is the variance that pertains to trials in which realizations all have the same
the weight of an observed value Yy is approximately inversely proportional to its
that entities with rjy q> will have little weight compared to entities with in (Hauer 2001) to use (pi = q> x (segment length). Equation 4.2.6 gives the probability to observe yn,.--,yin. Replacing nowqy by (pi, the log-likelihood for entity i is expressed by eq. (4.2.8) below. Because the interest in InLi is for maximization with respect to unknown parameters, the product nyy! has been omitted. Maximum likelihood parameters are those that maximize (4.2.8) (4.2.9) Were#,. = 1 for all i, 34 (4.2.11) p(yn>yn> -JiJ Again omitting the yy! the log-likelihood for entity i is now < LL, = ytj ln( 1J9)] *7, (4.2.12) >1 Considering that we have available 14 years of accident and traffic data the multinomial regression with Poisson or Negative-Binomial distributional assumptions will be used in this study. Its flexibility and statistical properties make it an attractive methodology to calibrate safety performance functions. It will be also compared with the modified OLS models with empirically and incrementally calibrated variance. 35 4.3.1 Dataset Preparation All of the dataset preparation was performed using the Colorado Department of Transportation (CDOT) accident database. The system of rural Colorado highways was subdivided into the following categories: 1. Mountainous Terrain-4 Lanes Freeways 2. Mountainous Terrain-2 Lanes Arterials 3. Rolling/Flat Terrain-4 Lanes Freeways On the Interstates accident history for each facility was prepared over the period of 14 years. Average Daily Traffic (ADT) for each roadway segment for each of the 14 years was entered into the same dataset. All of the interchange related accidents were isolated from the accident database prior to fitting the model. The reason for isolating interchange-related accidents was to remove the influence of accidents resulting from merge/diverge turbulence at an interchange. A ramp-freeway junction area is a zone of competing traffic demands for space. At on-ramps upstream traffic competes for space with entering on-ramp vehicles in merge areas. In the merge area, individual on- ramp vehicles attempt to find gaps in the traffic stream of the adjacent freeway lane. The action of individual merging vehicles the freeway creates turbulence in the traffic stream in the vicinity of the ramp. Approaching freeway vehicles move toward the left to avoid this turbulence. The turbulence itself as well as associated lane changing result in the increase of side-swipe and rear-end collisions in the area around ramp junction. This increased number of accidents is atypical of safety performance of the segments of freeways away from the 36 interchange. Studies of on-ramp junctions (Roess and Ulerio, 1993) showed that lane changing and turbulence extend 1500 ft downstream of the physical merge point. At off-ramps the basic maneuver is a diverge. Exiting vehicles must occupy the lane adjacent to the off-ramp. Thus, as the off-ramp is approached, exiting vehicles move right. This movement brings about a redistribution of other freeway vehicles, which move left to avoid the turbulence in the immediate diverge area. Again studies of off-ramp junctions (Roess and Ulerio, 1993) showed that area of turbulence extends 1500 ft. upstream of the physical gore. Adding 1500 ft. of turbulence on both side of grade separation (3000 ft.) to the length of the ramps and of the structures will result in isolating approximately one (1) mile distance centered on the interchange. This distance correlates well with spread of accidents related to freeway turbulence in the study of interchange safety (Janson et al., 1998). Figure 4.1 illustrates how the data-set was prepared. On the 2-lane rural roads the data-set was prepared in a similar fashion with the exception that intersection related accidents and 0.1 mile roadway segments containing intersections were removed prior to fitting of the model. Isolating a distance of approximately 250 ft. on both sides of rural intersections is a conservative measure, but it will ensure that intersection related conflicts will not pollute the data-set comprised of non-intersection related accidents and road segments. Figure 4.2 illustrates how the data-set was prepared. 37 Accidents and freeway 1 mile segments included in the 1 mile excluded dataset (min. segment excluded Included length >= 2 mi.) Included . ^c Interchange related accidents excluded from the dataset Interchange related accidents excluded from the dataset Figure 4.4 Dataset Preparation Freeways Included in SPFdata set segments included in the included in SPF data set SPF dataset (min. segment length >= 2 mi.) p I w if I Intersection related accidents studied using Intersection Diagnostic Analysis Intersection related accidents studied using Intersection Diagnostic Analysis Figure 4.5 Dataset Preparation Arterials 38 Dataset sample extract used for the fitting of the Safety Performance Function (SPF) reflecting Rural 4-Lane Mountainous Interstate is included in Table 4.1. Dataset sample extract used for the fitting of the Safety Performance Function (SPF) reflecting Rural 2-Lane Mountainous Highways is included in Table 4.2. Dataset sample extract used for the fitting of the Safety Performance Function (SPF) reflecting Rural 4-Lane Rolling and Flat Freeways is included in Table 4.3. Complete data-sets for all Safety Performance Functions are provided in the Appendix. 39 Table 4.1 Dataset Extract Mountainous Freeways Rural Mountainous 4-Lane Interstate (1986 to 1999) HWY Milepoints Dates AADT Accidents # Sec Begin End Len Begin End PDO INJ FAT Accs Per Mile 70 A 2.31 10.61 8.33 01/01/86 12/31/86 3,450 5 4 0 1.08 70 A 2.31 10.61 8.33 01/01/87 12/31/87 3,450 1 1 0 0.24 70 A 2.31 10.61 8.33 01/01/88 12/31/88 3,700 8 5 1 1.68 70 A 2.31 10.61 8.33 01/01/89 12/31/89 4,000 7 3 1 1.32 70 A 2.31 10.61 8.33 01/01/90 12/31/90 4,300 3 4 1 0.96 70 A 2.31 10.61 8.33 01/01/91 12/31/91 4,400 5 9 0 1.68 70 A 2.31 10.61 8.33 01/01/92 12/31/92 4,848 5 4 1 1.20 70 A 2.31 10.61 8.33 01/01/93 12/31/93 5,050 13 5 0 2.16 70 A 2.31 10.61 8.33 01/01/94 12/31/94 5,200 7 9 1 2.04 70 A 2.31 10.61 8.33 01/01/95 12/31/95 5,200 6 9 1 1.92 70 A 2.31 10.61 8.33 01/01/96 12/31/96 5,470 7 2 0 1.08 70 A 2.31 10.61 8.33 01/01/97 12/31/97 5,350 7 7 0 1.68 70 A 2.31 10.61 8.33 01/01/98 12/31/98 5,677 7 7 2 1.92 70 A 49.52 61.15 12.70 01/01/86 12/31/86 6,550 17 17 2 2.84 70 A 49.52 61.15 12.70 01/01/87 12/31/87 6,550 17 8 0 1.97 70 A 49.52 61.15 12.70 01/01/88 12/31/88 7,400 19 13 1 2.60 70 A 49.52 61.15 12.70 01/01/89 12/31/89 7,856 16 10 1 2.13 70 A 49.52 61.15 12.70 01/01/90 12/31/90 9,450 15 12 0 2.13 70 A 49.52 61.15 12.70 01/01/91 12/31/91 9,600 27 17 0 3.47 70 A 49.52 61.15 12.70 01/01/92 12/31/92 9,486 8 8 0 1.26 70 A 49.52 61.15 12.70 01/01/93 12/31/93 10,400 36 7 2 3.54 70 A 49.52 61.15 12.70 01/01/94 12/31/94 10,600 17 16 1 2.68 70 A 49.52 61.15 12.70 01/01/95 12/31/95 12,000 15 15 0 2.36 70 A 49.52 61.15 12.70 01/01/96 12/31/96 11,151 14 18 0 2.52 70 A 49.52 61.15 12.70 01/01/97 12/31/97 12,524 24 9 0 2.60 70 A 49.52 61.15 12.70 01/01/98 12/31/98 13,291 24 13 0 2.91 70 A 49.52 61.15 12.70 01/01/99 12/31/99 13,098 34 16 1 4.02 70 A 62.15 74.18 11.95 01/01/99 12/31/99 10,942 19 17 2 3.18 70 A 75.18 80.74 5.76 01/01/99 12/31/99 11,367 14 5 1 3.47 70 A 81.74 86.35 4.58 01/01/99 12/31/99 12,018 13 4 1 3.93 70 A 97.93 104.76 6.82 01/01/99 12/31/99 15,318 28 15 1 6.45 40 Table 4.2 Dataset Extract 2-Lane Mountainous Arterials Rural Mountainous 2-Lane Highways (1986 to 1999) HWY Milepoints . Dates AADT Accidents # Sec Begin End Len Begin End PDO INJ FAT Accs Per Mile 5 A 0.05 8.95 8.92 01/01/1987 12/31/1987 180 1 4 0 0.56 5 A 0.05 8.95 8.92 01/01/1988 12/31/1988 240 2 0 0 0.22 5 A 0.05 8.95 8.92 01/01/1989 12/31/1989 240 0 0 0 0.00 5 A 0.05 8.95 8.92 01/01/1990 12/31/1990 280 0 1 0 0.11 5 A 0.05 8.95 8.92 01/01/1991 12/31/1991 270 0 2 0 0.22 5 A 0.05 8.95 8.92 01/01/1992 12/31/1992 274 0 0 0 0.00 5 A 0.05 8.95 8.92 01/01/1993 12/31/1993 300 1 1 0 0.22 5 A 0.05 8.95 8.92 01/01/1994 12/31/1994 280 3 5 0 0.90 5 A 0.05 8.95 8.92 01/01/1995 12/31/1995 280 3 1 0 0.45 5 A 0.05 8.95 8.92 01/01/1996 12/31/1996 283 0 1 0 0.11 5 A 0.05 8.95 8.92 01/01/1997 12/31/1997 297 1 0 0 0.11 5 A 0.05 8.95 8.92 01/01/1998 12/31/1998 310 1 0 0 0.11 5 A 9.11 14.89 5.23 01/01/1987 12/31/1987 180 0 0 0 0.00 5 A 9.11 14.89 5.23 01/01/1988 12/31/1988 240 0 0 0 0.00 5 A 9.11 14.89 5.23 01/01/1989 12/31/1989 240 0 0 0 0.00 5 A 9.11 14.89 5.23 01/01/1990 12/31/1990 280 0 0 0 0.00 5 A 9.11 14.89 5.23 01/01/1991 12/31/1991 270 0 0 0 0.00 5 A 9.11 14.89 5.23 01/01/1992 12/31/1992 274 0 0 0 0.00 5 A 9.11 14.89 5.23 01/01/1993 12/31/1993 300 0 1 0 0.19 5 A 9.11 14.89 5.23 01/01/1994 12/31/1994 280 0 0 0 0.00 5 A 9.11 14.89 5.23 01/01/1995 12/31/1995 280 1 1 0 0.38 5 A 9.11 14.89 5.23 01/01/1996 12/31/1996 283 0 0 0 0.00 5 A 9.11 14.89 5.23 01/01/1997 12/31/1997 297 0 1 0 0.19 5 A 9.11 14.89 5.23 01/01/1998 12/31/1998 310 0 1 0 0.19 6 E 145.80 148.71 2.73 01/01/1987 12/31/1987 1,900 2 2 0 1.46 6 E 145.80 148.71 2.73 01/01/1988 12/31/1988 2,200 2 4 0 2.19 6 E 145.80 148.71 2.88 01/01/1989 12/31/1989 2,200 1 2 1 1.39 6 E 145.80 148.71 2.88 01/01/1990 12/31/1990 2,700 2 4 0 2.08 6 E 145.80 148.71 2.87 01/01/1991 12/31/1991 2,647 1 2 0 1.05 6 E 145.80 148.71 2.87 01/01/1992 12/31/1992 3,584 3 1 0 1.39 6 E 145.80 148.71 2.87 01/01/1993 12/31/1993 3,203 2 2 0 1.39 6 E 145.80 148.71 2.87 01/01/1994 12/31/1994 2,982 1 2 0 1.05 41 Table 4.3 Dataset Extract Flat and Rolling Freeways Rural Flat and Rolling 4-Lane Interstate (1986 to 1999) HWY Mileooints Dates AADT Accidents # Sec Begin End Len Begin End PDO INJ FAT Accs Per Mile 25 A 18.23 22.41 4.20 01/01/86 12/31/86 5,300 0 6 1 1.6679 25 A 18.23 22.41 4.20 01/01/87 12/31/87 5,300 3 3 0 1.4296 25 A 18.23 22.41 4.20 01/01/88 12/31/88 5,450 6 3 0 2.1444 25 A 18.23 22.41 4.20 01/01/89 12/31/89 5,550 3 4 0 1.6679 25 A 18.23 22.41 4.20 01/01/90 12/31/90 6,050 8 7 0 3.5740 25 A 18.23 22.41 4.20 01/01/91 12/31/91 6,200 5 6 0 2.6209 25 A 18.23 22.41 4.20 01/01/92 12/31/92 6,851 3 4 1 1.9061 25 A 18.23 22.41 4.20 01/01/93 12/31/93 8,100 15 8 2 5.9566 25 A 18.23 22.41 4.20 01/01/94 12/31/94 8,200 6 7 0 3.0975 25 A 18.23 22.41 4.20 01/01/95 12/31/95 7,700 15 4 0 4.5270 25 A 18.23 22.41 4.20 01/01/96 12/31/96 8,626 8 5 1 3.3357 25 A 18.23 22.41 4.20 01/01/97 12/31/97 8,036 20 7 0 6.4332 25 A 18.23 22.41 4.20 01/01/98 12/31/98 8,528 15 8 2 5.9566 25 A 18.23 22.41 4.20 01/01/99 12/31/99 8,442 8 2 0 2.3827 25 A 34.59 39.99 5.40 01/01/86 12/31/86 5,350 6 2 1 1.6670 25 A 34.59 39.99 5.40 01/01/87 12/31/87 5,350 4 3 0 1.2965 25 A 34.59 39.99 5.40 01/01/88 12/31/88 5,550 3 8 0 2.0374 25 A 34.59 39.99 5.40 01/01/89 12/31/89 5,950 5 3 0 1.4818 25 A 34.59 39.99 5.40 01/01/90 12/31/90 6,200 2 4 0 1.1113 25 A 34.59 39.99 5.40 01/01/91 12/31/91 6,450 4 5 0 1.6670 25 A 34.59 39.99 5.40 01/01/92 12/31/92 6,956 2 9 0 2.0374 25 A 34.59 39.99 5.40 01/01/93 12/31/93 7,300 5 8 0 2.4079 25 A 34.59 39.99 5.40 01/01/94 12/31/94 7,200 5 5 1 2.0374 25 A 34.59 39.99 5.40 01/01/95 12/31/95 7,000 9 7 0 2.9635 25 A 34.59 39.99 5.40 01/01/96 12/31/96 7,574 10 9 1 3.7044 25 A 34.59 39.99 5.40 01/01/97 12/31/97 7,306 10 9 0 3.5192 25 A 34.59 39.99 5.40 01/01/98 12/31/98 7,753 10 7 0 3.1487 25 A 34.59 39.99 5.40 01/01/99 12/31/99 7,668 13 4 0 3.1487 25 A 42.43 48.50 6.01 01/01/86 12/31/86 5,350 2 6 0 1.3313 25 A 42.43 48.50 6.01 01/01/87 12/31/87 5,350 12 11 0 3.8276 25 A 42.43 48.50 6.01 01/01/88 12/31/88 5,650 13 6 0 3.1619 25 A 42.43 48.50 6.01 01/01/89 12/31/89 6,100 2 4 0 0.9985 42 4.3.2 Selection of Minimum Segment Length and Its Effect on the Model Accident prediction models where the accident count or accident rate is a dependent variable are sensitive to the selection of the minimum length of the segment. This question has been raised and investigated in several studies (e.g. Resende and Benekohal 1994; Okamato and Koshi 1989; Zegeer et al. 1991). Most of the findings in these studies converge on the opinion that short road segments had undesirable impact on the estimation of the model, and therefore preference for longer sections was expressed. While purely statistical considerations are of importance, lets examine the consequences of selecting segments shorter than 1 mile for a model where the number of accidents/mile per year is a dependent variable. If an accident cluster is contained within a fraction of a mile, then the number of accidents/mile is estimated by dividing the number of accidents in the segment by the segments length. This division creates what can be termed a fictional mile with the same accident density as in the cluster and introduces unnecessary error into the data-set. For instance, 20 accidents over 0.2-mile segment would create a fictional mile with 100 accidents per mile, which in reality does not exist in the rural environment. The intent of the SPF is, probably, not to pinpoint accident clusters, but rather evaluate relative safety of homogenous roadway segments when compared with other similar segments within the same functional class. At a conceptual level, when we are discussing safety of the section of road we think in terms of longer 43 segments, rather than spot locations. Similarly, when conducting capacity analysis of roadway segments, as opposed to junctions, we are working with longer segments as well. From the stand point of project development or corridor evaluation, using longer segments for the model fitting makes it more compatible with segments that are being evaluated. A study by Resende and Benekohal showed that the section lengths used to compute accident rates directly effect the form of the accident prediction models and the models prediction power. For basic sections of rural Interstate highways and two-lane rural highways a section length of at least 0.5 miles was recommended by this study. Table 4.4 (Resende and Benekohal 1994) shows that predictive power of the model steadily increases with the increase of the minimum segment length. Table 4.4 Relationship Between R2 and Segment Lengths Section Length 1986 1987 1988 Sample Size ' Rsq. . Sample Size Rsq. Sample Size Rsq. 0.1 928 0.03 993 0.03 1023 0.05 0.2 743 0.08 806 0.06 793 0.13 0.3 636 0.18 676 0.12 663 0.19 0.4 527 0.18 555 0.13 551 0.21 0.5 470 0.21 491 0.14 487 0.25 0.6 413 0.20 430 0.20 424 0.30 0.7 370 0.21 375 0.28 372 0.27 0.8 342 0.23 344 0.28 345 0.31 0.9 312 0.29 316 0.30 318 0.36 1 301 0.28 307 0.30 307 0.35 44 Similar findings are observed in this study as well. The predictive power of the model for most safety performance functions is maximized at 2-3 mile minimum range of segment. As is previously discussed, this minimum segment length has utility in application to project evaluations as well statistical advantages. When a segment is very short, 0.1 mile for instance, it is generally adjacent to other segments with significantly different geometric characteristics. At 60 mph 0.1 mile segment can be traversed in approximately 6 seconds, which would not allow enough time for the driver to adjust driver expectancy and behavior before entering the next segment. From the stand point of traffic engineering the driver performance on a short segment may not be representative of the facility. 45 4.3.3 Removal of Outliers Prior to the fitting of the models the outliers were identified and removed. In general, an outlier in a set of data is an observation which appears to be inconsistent with the remainder of that set of data (Barnet and Lewis, 1978). A data point can be considered to be an outlier if it does not appear to be predicted well by the fitted model (Hayter, 1995). It is also an observation with inappropriate representations for the population from which the sample is drawn (Hair et al., 1998). In a sense, these definitions leave it up to the analyst to decide what will be considered abnormal. Considering that all of the information about each individual accident is collected in the field by the police officer and then hand-entered in the database by accident coders, there is ample opportunity for error. Additionally traffic volume counts from the volume database contain certain amount of error which can infiltrate into the model. Since the data is heteroscedastic the variance was incrementally calibrated to reflect the increase in standard deviation associated with increase in ADT. For each bin of the dependent variable standardized residuals were calculated as follows: * ^ij e w , where eu is a residual in the data bin j and value larger than 3 were removed. Following isolation of the outlier locations they were examined for a possible explanation as to why they exhibited abnormal accident frequencies. In addition to identifying common coding errors this process revealed the following: 46 A large group of roadway segments in the 2-lane mountainous roads category had abnormally high accident frequency for the amount of traffic exposure. After plotting them on the map it was observed that these were roads leading to the gambling communities of Black Hawk, Central City, Cripple Creek and Victor. Accident history among these locations had two characteristics in common, abnormally high number of accidents at night and abnormally high alcohol involvement. A combination of these factors led us to the realization that driver behavior and resulting accident frequency is atypical of routes within this functional class in the mountainous environment. Furthermore, because of these unique characteristics, a separate safety performance function would need to be calibrated for the routes leading to the gambling communities. A section of the mountainous 4-lane Interstate exhibited an abnormally low accident frequency for the first 9 years, but in the last 5 years it performed as expected. Following further examination it was observed that the section in question contained the work zone for Glenwood Canyon. This particular work zone was characterized by low operating speeds, reduced lane and shoulder width as well as altered driver expectancy. The construction in the section was completed and the section performed as expected over the last 5 years. 47 4.4.1 Exploratory Analysis and Model Fitting Accident data for roadway segments was collected in the manner described in section 4.3.1 by isolating sections influenced by the intersections and interchanges. Scatter plots displaying 14 years of accident and traffic data for two types of rural freeways and a two-lane arterial are presented in Figures 4.6, 4.7, and 4.8. We intend to calibrate models which relate accident frequency and ADT. These models will be of the form: Accident/(mile-year)=f(ADT) where f stands for some function. The motivation for model fitting is to establish DAfMPTCToU) Rural 4-Lane Mountainous Interstate (1988-1999) Total Graph Sections => 3.0 Mies Figure 4.6 Scatter Plot Mountainous Freeways 48 iAMCTOJ) Rml Mountainous 2-Lane Highway Seleded Points Q-aph -Total Accidents- Sections => 20 Miles Figure 4.7 Scatter Plot 2-Lane Mountainous Arterials what accident frequency is expected from certain ADT exposure. Therefore the model has to fit the accident data reasonably well in all ranges of ADT. Since ADT is the only variable in the equation and because other variables (curvature, grade, percent of trucks, etc.) are not explicitly accounted for, the magnitude of the error terms is a proxy for the influence of these variables. Due to the complexity of association between ADT and other factors the function f may be complex in form. 49 Dependent data in all data-sets exhibit well defined heteroscedastic characteristics, where the variance in accident frequency increases with increase in mean. The model parameters were estimated using a multinomial maximum- likelihood function discussed in section 4.2.2. using GLM spreadsheet developed at the University of Toronto (Hauer 2001). The quality of fit was examined with the Cumulative Residuals (CURE) method described in (Lord and Persaud 2000) and (Hauer 2001). This method consists of plotting the cumulative residuals for each independent variables. The goal is to graphically observe how well the function fits the data-set. To generate a CURE plot, sites are sorted by their average ADT. Then, for each site, the residual (=predicted accidents-observed accidents) is computed. The residuals are then added up and a cumulative residual value is plotted for each value of the independent variable. Because of the random nature of accident counts, the cumulative residual line represents a so called random walk. For a model that fits well in all ranges of ADT, the cumulative residual plot should oscillate around zero. If cumulative residual value steadily increases within an ADT range, this means that within that ADT range the model predicts more accidents than have been observed. Conversely, a decreasing cumulative residual line in an ADT range indicates that in that range more accidents have been observed than are predicted by the model. A frequent departure of the cumulative residual line beyond two standard deviations of a random walk indicates a presence of outliers or signifies an ill fitting model. In addition to the multinomial model a modified form of the ordinary least square (OLS) regression with empirically calibrated variance was used for comparison. 50 Rural Flat and Rolling, 4-Lane Interstate (1966*1999) Total Graph Sections => 3.0 Miles l 0 6,000 10,000 15.000 20.000 25,000 30.000 35.000 40.000 45.000 50.000 Figure 4.8 Scatter Plot Flat and Rolling Freeways This reality suggests an alternate approach to defining an errorterm, specifically, incremental and empirical calibration of variance. In this approach ADT is stratified and a is empirically determined for every range of ADT, it is then used to compute and plot a lower and upper control limits in relationship to the mean prediction. The magnitude of the errorterm developed using incremental empirical calibration will then be compared with that generated by the link function of best fitted models. 51 Rural Mountainous 4-Lane Interstate Figure 4.9 Fitted Model Mountainous Freeways 4-Lane Mountainous Interstate, Poisson assumptions of error structure- polynomial function family produced the best fit with parameters presented below: E{k) = ayXfio (1 + fiX + /32X2) a14 = 2.135 ,J30 = 1.052, $ = 1.842, P2 = -0.571 ----------= 2.135(^^)' 052[1 +1.842(^^) 0.571(^-^)2] {mile-year) 10000 10000 10000 52 RM4I CURE Figure 4.10 Mountainous Freeways CURE Plot The cumulative residual plot (CURE) produced from maximizing Poisson likelihood function presented in Figure 4.9 generally provides a satisfactory fit. The random walk stays well within 2 standard deviations and oscillates around zero. Figure 4.10 shows, however, that the model has a tendency to over- predict in the range of ADT between 10,000 and 14,000. The modified OLS model provided in Figure 4.9 is generally within 0.5 to 1 accidents per mile/year from the multinomial Poisson model. The R2 of the polynomial function of the modified OLS model is 0.7. Despite the conceptual soundness of the Negative Binomial model it produced very unsatisfactory fits. 53 Rural Mountainous 2-Lane Highways Figure 4.11 Fitted Model Mountainous Arterials 2-Lane Mountainous Arterial, Poisson assumptions of error structure-polynomial function family produced the best fit with parameters presented below: E{k} = ayXA (1 + fiX + p2X2) aj4 = 1.1716, j30 = 0.524, = 6377, J32 = -2.657 Acc. {mile year) = 1.1716(- ADT )524[1 + 6377(444 2.657(44t)21 10000 10000 10000 54 RM2H CURE 600 -600 -L----:--- - -:-' ADT Figure 4.12 Mountainous Arterials CURE Plot The cumulative residual plot (CURE) produced from maximizing Poisson likelihood function presented in Figure 4.11 generally provides a satisfactory fit. The random walk stays well within 1 standard deviations and oscillates around zero in most cases between 0 and 10,000 ADT. The model has a tendency to over-predict in the range between 3000 and 3,500 ADT The modified OLS model provided in Figure 4.11 is generally found within 0 to 0.5 accidents per mile/year from the multinomial Poisson model. The R2 of the polynomial function of the modified OLS model is 0.71. Despite the conceptual soundness of the Negative Binomial model it produced very unsatisfactory fits. 55 Rural Flat and Rolling 4-Lane Interstate Figure 4.13 Fitted Model Flat and Rolling Freeway 4-Lane Flat and Rolling Interstate, Poisson assumptions of error structure- polynomial function family: E{k} = ayX*{\ + filX) al4 = 2.863, J50 =1.133, A =-0.087 -----------= 2.863(^^-)'133 [1 0.087(-^-^-)] {mile-year) 10000 10000 56 RF-RR-41 CURE Figure 4.14 Flat and Rolling Freeways CURE Plot The cumulative residual plot (CURE) produced from maximizing Poisson likelihood function presented in Figure 4.14 generally provides a satisfactory fit. The random walk stays well within 2 standard deviations and oscillates around zero in most cases between 5,000 and 60,000 ADT. The model has a tendency to under-predict, however, in the range between 35,000 and 45,000 ADT. The modified OLS model provided in Figure 4.13 generally matches prediction produced by the multinomial Poisson model. The R2 of the polynomial function of the modified OLS model is 0.73. Despite the conceptual soundness of the Negative Binomial model it produced very unsatisfactory fits. 57 Observed accident frequencies reflecting different levels of ADT align themselves reasonably well into a curvilinear shape. The selected polynomial functions for all models demonstrate some leveling-off which reflects reduction in accident rate with increase of ADT beyond certain specific thresholds. The variability in the dependent variable steadily increases with the increase in mean, which is consistent with Poisson error structure. Although the R2 statistic (R2=0.70 or better) for all calibrated models in the study is not entirely appropriate here because of some violations of normality and homoscedacity assumptions, it suggests that the relationship between ADT and the accident frequency per mile/year is sufficiently strong. Rather than using a transformation to stabilize variances, the variance was incrementally and empirically calibrated to reflect the increase in <7 associated with increase in ADT. An SPF graph reflecting characteristics of Rural Non- Mountainous Interstates with empirically and incrementally calibrated 1.5 standard deviation is provided on Figure 4.15. Although it is difficult to provide a definitive explanation at present as to why the variance increases with ADT, the field of econometrics suggests an interesting analogy which may shed some light on this issue. Gujarati (Gujarati 1995) observed that As income grow, people have more discretionary income and hence more scope for choice about the disposition of their income. Hence, cr2 is likely to increase with income. Thus in the regression of savings on income one is likely to find variance increasing with income because people have more choices about their savings behavior. 58 Similarly, companies with larger profits are generally expected to show greater variability in their dividend policies than companies with lower profits. In the context of the driving environment the econometric phenomenon discussed above can be applied as follows: As traffic volume on the road increases the opportunities for safe as well as unsafe driving behaviors also increases. For instance, the traffic can move in platoons at safe speeds and as a result large volumes of traffic can be moved without incidence, or isolated drivers can behave aggressively leading to a variety of accidents. * (Ina Linn (Tots)) (Ml Pi (Tcl4)l fely.(Up0lfl(Tol4}) Rural Flat and Rolling, 4-Lane Interstate (1986-1999) Total Graph Sections => 3.0 Miles ADT Figure 4.15 Rural Flat and Rolling Freeways with Empirically Calibrated Variance 59 Additionally, data shows that for all populations of roads the variance in the accident frequency increases with ADT and this fact needs to be considered in the selection of locations with potential for accident reduction. Predicted mean values generated by the model generally align themselves reasonably well into a polynomial equation for all populations of roads. The error structure closely approximate Poisson assumption, where the variance is equal to the mean. The models produced from maximizing Poisson likelihood function generally provide a satisfactory fit. The random walk on CURE plots stays well within 2 standard deviations and oscillates around zero in most ranges of ADT. The mean predictions generated by the modified OLS models in most cases found within 0.5 accident per mile-year from predictions produced by the multinomial Poisson model. The R2 of the modified OLS model is between 0.70 and 0.73. Despite the conceptual soundness of the Negative Binomial model it produced unsatisfactory fits. This can possibly be explained by the following. Most previous efforts of accident modeling used very short and very long segments which introduced over-dispersion into the model consistent with Negative- Binomial assumptions. In anticipation of this methodological difficulty we have assembled data-sets which contain segments of relatively uniform lengths. They are generally between 2 and 5 miles long. This approach also ensures that segments representing projects length which are usually also 2 to 5 mile long, are representing the population of segments of which the model is constructed. The resulting variance is then much closer to the mean than is implied in the Negative-Binomial assumtions. The data-set assembled in such a way closely approximates a Poisson distributed error structure. 60 Figure 4.16 shows 1.5 sigma empirically and incrementally calibrated and 1.5 sigma derived from the link function associated with Poisson distribution. It confirms that data is Poisson distributed, which is consistent with the fact that maximum likelihood function with Poisson error structure produces the best fit. Considering that Poisson error structure implies that each mean value in the regression Xt is equal to variance Vt, then Poisson UppvWacn Ur* (Told)) -fWy.(AmPY(T<*rf Fty.(U*vLMl(TeUA) Rural Hat and Rolling, 4-Lane Interstate (1968-1999)Total Graph-Sections => 3.0 Miles 0 5,000 10,000 15,000 20,000 25,000 30,000 35,000 4a000 45,000 5a 000 ADT Figure 4.16 Comparison Between Poisson and Empirically and Incrementally Calibrated Error Structure 61 0.95 P(x < A,. + 1.5a-, ) = ^ e* x X ^ x\ x=0 4-1-5 a, -Af 2X P{x < Jlj -1.5a,.) = X ------------; ~ 0.95 x=0 Tables 4.5,4.6,4.7 present upper control limit at 95% for the multinomial models with Poisson error structures. If a roadway segment consistently exhibits accident frequency above the upper control limit, it suggests a strong potential for accident reduction. How to conduct a diagnostic investigation of such a site is discussed in chapters 6 and 7 of this dissertation. If a roadway segment consistently exhibits accident frequency below lower control limit, it probably contains design characteristics which would give us an insight about how to design safer highways. 62 Table 4.5 Control Limits Rural Flat and Rolling 4-Lane Freeways Rural Flat and Rolling 4-Lane Freeway ADT (Vehicles/Day) Mean (Accidents/Mile/Year) Lower (Accidents/Mile/Year) Upper (Accidents/Mile/Year) 5,000 1.23 0.00 2.89 10,000 2.56 0.16 4.95 15,000 3.84 0.90 6.78 20,000 5.07 1.69 8.45 25,000 6.21 2.47 9.95 30,000 7.26 3.22 11.30 35,000 8.19 3.90 12.48 40,000 8.99 4.49 13.48 45,000 9.63 4.97 14.28 50,000 10.10 5.33 14.87 55,000 10.38 5.55 15.22 63 Table 4.6 Control Limits Rural Mountainous 2-Lane Arterials Rural Mountainous 2-Lane Arterials ADT (Vehicles/Day) Mean (Accidents/Mile/Year) Lower (Accidents/Mile/Year) Upper (Accidents/Mile/Year) 500 0.31 0.00 1.15 1,000 0.54 0.00 1.64 1,500 0.77 0.00 2.09 2,000 1.02 0.00 2.54 2,500 1.28 0.00 2.98 3,000 1.55 0.00 3.42 3,500 1.83 0.00 3.86 4,000 2.12 0.00 4.30 4,500 2.41 0.08 4.74 5,000 2.71 0.24 5.18 5,500 3.01 0.41 5.61 6,000 3.32 0.58 6.05 6,500 3.62 0.77 6.48 7,000 3.93 0.96 6.91 7,500 4.24 1.15 7.33 8,000 4.55 1.35 7.75 8,500 4.86 1.55 8.16 9,000 5.16 1.75 8.56 9,500 5.45 1.95 8.96 10,000 5.75 2.15 9.34 10,500 6.03 2.35 9.71 11,000 6.31 2.54 10.07 11,500 6.57 2.73 10.42 12,000 6.83 2.91 10.75 64 Table 4.7 Control Limits Rural Mountainous 4-Lane Freeways Rural Mountainous 4-Lane Freeway ADT (Vehicles/Day) Mean (Accidents/Mile/Year) Lower (Accidents/Mile/Year) Upper (Accidents/Mile/Year) 4,000 0.87 0.00 2.26 5,000 1.24 0.00 2.91 6,000 1.67 0.00 3.60 7,000 2.14 0.00 4.33 8,000 2.65 0.21 5.10 9,000 3.20 0.52 5.88 10,000 3.78 0.86 6.69 11,000 4.37 1.23 7.51 12,000 4.98 1.63 8.32 13,000 5.59 2.05 9.14 14,000 6.21 2.47 9.94 15,000 6.81 2.90 10.73 16,000 7.41 3.32 11.49 17,000 7.98 3.74 12.22 18,000 8.52 4.14 12.90 19,000 9.03 4.53 13.54 20,000 9.50 4.88 14.13 21,000 9.93 5.20 14.65 22,000 10.29 5.48 15.11 23,000 10.60 5.72 15.48 24,000 10.84 5.90 15.78 25,000 11.00 6.03 15.98 26,000 11.08 6.09 16.08 27,000 11.08 6.08 16.07 28,000 10.97 6.00 15.94 29,000 10.77 5.85 15.69 30,000 10.46 5.61 15.31 65 5. Behavioral Interpretation and Levels of Relative Safety 5.1 Behavioral Zones Safety Performance Functions reflect a central tendency in driver behavior across a wide range of exposure. Driver behavior is characterized by the expected frequency of accidents over a period of a year at each level of ADT. The shape of the SPF (Figure 7) common to most highways suggests 3 zones of driver behavior: Initial Adaptation Zone Stabilized Zone Heightened Driver Attention Zone Initial Adaptation Zone (Zone 1) is characterized by high driver confidence, subjective sense of security and high operating speeds. Propensity to make a driving error is high here, however the dynamics of interaction between the roadway environment and driver behavior gradually leads to safety improvement. This phenomenon is reflected by the decrease in the number of accidents per unit of traffic exposure with increase in ADT. Zone 1 generally represents a small portion of the Interstate system. 66 AccidentjHer Mile Per Year 18 Driver Behavioral Zones and Levels of Relative Safety 16 14 12 10 6 4 2 0 5,000 10,000 15,000 20,000 Exposure (ADT) 25,000 30,000 35,000 Figure 5.1 Driver Behavioral Zones and Levels of Relative Safety 67 Stabilized Zone (Zone 2) is characterized by a more predictable driving environment where driver confidence is balanced at the appropriate level with roadway characteristics of the facility. Operating speeds are somewhat lower than in Zone 1. In Zone 2 the number of accidents per unit of traffic exposure remains relatively constant. The Stabilized Zone represents the largest portion of the highway system. Heightened Driver Attention Zone (Zone 3) is characterized by an increased focus on the driving task. Subjective sense of security and driver confidence is diminished here and as a result, safety is improved. Operating speeds are lower than in Zone 2. The Number of accidents per unit of traffic exposure decreases with increase in ADT. Zone 3 represents a smaller portion of the highway system than Zone 2. These changes in driver behavior can possibly be explained by continuous interaction and adaptation to different environments. This approach relates predictable patterns in driver behavior induced by changes in roadway environment with expected accident count per unit of roadway length. It may seam counterintuitive, but it appears that the driver's sense of security and driving comfort is often in conflict with the driver's safety. When assessing the relative safety of a roadway segment it is critical to determine which zone of the appropriate SPF it fits into. 68 5.2 Levels of Service of Safety and Additional Benefits of SPF Development of the SPFs lends itself well to the conceptual formulation of the Level of Service of Safety. The concept of level of service uses qualitative measures that characterize safety performance of a roadway segment in reference to its expected performance. If the level of safety predicted by the SPF will represent normal or expected number of accidents at a specific level of ADT, then the degree of deviation from the norm can be stratified to represent specific levels of safety. Figure 5.1 provides a pictorial rendering of the concept using SPF for the 4-lane mountainous Interstate as an example with the boundary line delineated 1.5 standard deviations from the mean. Three Levels of Relative Safety (LORS) can be initially proposed: LORS-Ex (Expected Level of Safety) LORS-BtEx (Better Than Expected) LORS-LetEx (Less Than Expected) Gradual change in the degree of deviation of the LORS boundary line from the fitted model mean reflects the observed increase of variability in accidents/mile as ADT increases. This increase is consistent with Poisson error structure. The boundary lines should be established by computing 95% upper and lower control limit derived from the Poisson distribution. Based on this line of reasoning Tables 5.1, 5.2 and 5.3 represent numerical values of the accident frequencies corresponding to different levels of relative safety. These tables are calibrated for the three types of facilities examined in this study. 69 Table 5.1 Level of Service Criteria in Reference to Expected Safety Performance for Rural Flat and Rolling 4-Lane Freeways Rural Flat and Rolling 4-Lane Freeway . ADT (Vehicles/D.ay) BtEx (Accidents/Mile/Year) Ex (Accidents/Mile/Year) LetEx (Accidents/Mile/Year) 5,000 0.00 0.00 2.89 >2.89 10,000 <0.16 0.16 -4.95 >4.95 15,000 <0.90 0.90 6.78 >6.78 20,000 < 1.69 1.69 8.45 > 8.45 25,000 <2.47 2.47 9.95 >9.95 30,000 <3.22 3.22 11.30 > 11.30 35,000 <3.90 3.90 12.48 > 12.48 40,000 <4.49 4.49 13.48 > 13.48 45,000 <4.97 4.97 14.28 > 14.28 50,000 <5.33 5.33 14.87 > 14.87 55,000 <5.55 5.55 15.22 > 15.22 70 Table 5.2 Level of Service Criteria in Reference to Expected Safety Performance for Rural Mountainous 2-Lane Arterials Rural Mountainous 2-Lane Arterials ADT BtEx Ex LetEx (Vehicles/Day) (Accidents/M il e/Year) (Accidents/Mile/Year) (Accidents/Mile/Year). 500 0.00 0.00 - 1.15 > 1.15 1,000 0.00 0.00 - 1.64 > 1.64 1,500 0.00 0.00 - 2.09 >2.09 2,000 0.00 0.00 - 2.54 >2.54 2,500 0.00 0.00 - 2.98 >2.98 3,000 0.00 0.00 - 3.42 >3.42 3,500 0.00 0.00 - 3.86 > 3.86 4,000 0.00 0.00 - 4.30 >4.30 4,500 <0.08 0.08 - 4.74 >4.74 5,000 <0.24 0.24 - 5.18 >5.18 5,500 <0.41 0.41 - 5.61 >5.61 6,000 <0.58 0.58 - 6.05 >6.05 6,500 <0.77 0.77 - 6.48 >6.48 7,000 <0.96 0.96 - 6.91 >6.91 7,500 < 1.15 1.15 - 7.33 >7.33 8,000 < 1.35 1.35 - 7.75 >7.75 8,500 < 1.55 1.55 - 8.16 >8.16 9,000 < 1.75 1.75 - 8.56 >8.56 9,500 < 1.95 1.95 - 8.96 >8.96 10,000 <2.15 2.15 - 9.34 >9.34 10,500 <2.35 2.35 - 9.71 > 9.71 11,000 <2.54 2.54 - 10.07 > 10.07 11,500 <2.73 2.73 - 10.42 > 10.42 12,000 <2.91 2.91 , 10.75 > 10.75 71 Table 5.3 Level of Service Criteria in Reference to Expected Safety Performance for Rural Mountainous 4-Lane Freeways Rural Mountainous 4-Lane Freeway ADT BtEx j Ex LetEx (Vehicles/Day) (Accidents/Mile/Year) (Accidents/Mile/Year) (Accidents/Mile/Year) 4,000 0.00 0.00 - 2.26 > 2.26 5,000 0.00 0.00 - 2.91 >2.91 6,000 0.00 0.00 - 3.60 >3.60 7,000 0.00 0.00 - 4.33 >4.33 8,000 <0.21 0.21 - 5.10 > 5.10 9,000 <0.52 0.52 - 5.88 >5.88 10,000 <0.86 0.86 - 6.69 >6.69 11,000 < 1.23 1.23 - 7.51 >7.51 12,000 < 1.63 1.63 - 8.32 >8.32 13,000 <2.05 2.05 - 9.14 >9.14 14,000 <2.47 2.47 - 9.94 >9.94 15,000 <2.90 2.90 - 10.73 > 10.73 16,000 <3.32 3.32 - 11.49 > 11.49 17,000 <3.74 3.74 - 12.22 > 12.22 18,000 <4.14 4.14 - 12.90 > 12.90 19,000 <4.53 4.53 - 13.54 > 13.54 20,000 <4.88 4.88 - 14.13 > 14.13 21,000 <5.20 5.20 - 14.65 > 14.65 22,000 <5.48 5.48 - 15.11 > 15.11 23,000 <5.72 5.72 - 15.48 > 15.48 24,000 <5.90 5.90 - 15.78 > 15.78 25,000 <6.03 6.03 - 15.98 > 15.98 26,000 <6.09 6.09 - 16.08 > 16.08 27,000 <6.08 6.08 - 16.07 > 16.07 28,000 <6.00 6.00 - 15.94 > 15.94 29,000 <5.85 5.85 - 15.69 > 15.69 30,000 <5.61 5.61 - 15.31 > 15.31 72 4 & 6-Lane Mountainous highways (199&1999) Total Qaphs Another byproduct of developing SPFs is comparing safety of four and six lane freeways at the same level of traffic exposure. Preliminary findings reflected on Figure 5.2 indicate that providing additional capacity in the mountainous freeway environment also results in additional safety. As ADT increases from 16,000 to 30,000 vehicles safety increase can be observed in addition to reduced delay. Use of the Safety Performance Functions to compare safety of facilities with different geometric characteristics carrying the same amount of traffic has a 73 significant potential in alternatives evaluation and transportation planning. Transportation Equity Act for the 21st Century (TE-21) of 1998 requires explicit consideration of safety in the transportation planning process. While this government mandate is well intentioned, little is known about how to accomplish it. It is difficult to anticipate the safety of highway that has not yet been built. In the planning process of the system level safety and capacity improvements the use of well calibrated SPF will provide realistic estimates of expected safety performance of the network. 74 Art is the imposing of a pattern on experience, and our aesthetic enjoyment in recognition of the pattern. Dialogues of Alfred North Whitehead [1953], Chapter 29 Alfred North Whitehead 1861-1947 6. Direct Diagnostics and Pattern Recognition Analysis 6.1 Direct Diagnostics Over-representation in the number of accidents above the expected or normal threshold predicted by the safety Performance Function is only one of many indicators of a potential for accident reduction. (And it appears that it may not be the best one). Accident type, severity, road condition, spacial distribution of accidents, lighting conditions are only few of the many important symptoms of the accident problem. Furthermore in many cases factors other then over- representation in frequency are better predictors of susceptibility to corrective counter-measures. It is difficult to determine a specific form for the distribution of accidents, therefore the problem lends itself well to a non-parametric approach, which does not require assumptions about the shape of the underlying distribution. Accident occurrence as a process can be thought of as a sequence of Bernoulli trials where the following holds true: 75 There are only two outcomes at each trial or observation-acc/denf of a specific type has or has not occurred. The probability of success is the same for each tha\-the probability of occurrence of a specific accident related event, overturning for instance, is the same every time anytime accident occurred. The trials are independent-eac/7 accident is completely independent from the previous or the following one. There are a finite number of trials The following terminology can be adapted to provide analytical framework of the pattern recognition through direct diagnostics in accident occurrence. SFi Denotes a specific Safety Performance Function representing roadway segment or an intersection Xai [Xai, Xa2,...,Xan] Represents a feature vector comprised of accident listing of the roadway segment directionally arranged in relation to roadway reference system, or reflecting an accident listing at an intersection. P(SFi) The probability that we are presented with a Safety Performance function i. P(Nai/SFi) The probability that Nai accidents of specific type would be observed given a Safety Performance Function Sfi. 76 Pi The probability of observing a specific accident type during each accident event P(SFi/Nai) The a posteriori probability that we are presented with a Safety Performance Function Sfi given a feature vector Xai, containing Nai accidents of specific type. Assume that feature vector Xai represents a sample of accident history drawn from a roadway facility represented by a safety performance function SFi. The probability that exactly Nai accidents of a specific type will be observed out of total of Nti accidents is given by the binomial distribution: Xai G SFi P(Nai,Nti,Pi) = Pi (1 Pi)m~Nai (6.1.1) where Nai=0,1,2,...,n accidents, and The probability that Nai or fewer accident will be observed out of Nti Bernoulli trials can be computed as follows: ATti\ Nai 77 (6.1.2) Nai P(X < Nai, Nti; Pi) = ^ Nti\ Â£o (Nti i)! i 77^(1 -Pi) Nti-i The probability that Nai or more accidents will be observed is expressed as: P(X > Nai, Nti; Pi) = 1 P[X < (Nai -1)] = Nail =i-i Nti l i=o (Nti i)! i! tP!0--p,) Nti-i (6.1.3) IF P(X> Nai,Nti; Pi) < Pcr Where Per is some established threshold for making a classification decision, then the feature vector Xai [Xai,Xa2,...,Xan] is classified as not belonging to a specific Safety Performance Function SFi. In terms of accident analysis it means that a roadway segment or junction which generated Xai[Xai,Xa2,...,Xan] contains an element which triggers deviation from a random statistical process in the direction of reduced safety. 78 6.1.1 Example of Application of Direct Diagnostics Methodology To illustrate the application of the concept of direct diagnostics lets examine a case history of diagnosing and addressing a safety problem at an urban signalized intersection. A total of 246 accidents were reported in the five-year period and 97 of them were approach turn accidents. Approach turn accident is the most frequent accident type at this location. An accident diagram on (Figure 6.1) presents additional information on the direction of travel, accident Accident Types: Hampden Ave. at Monaco Pkwy. SS Opposite 2% (5) SS Same 7% (1.8) All Other Types (2 or less each) 5% (12) Broadside 9% (21) . Approach Turn 39% (97) Rear End 38% (91) Figure 6.1 Accident Distribution by Type severity and time of accident occurrence. The critical question in the accident analysis of this intersection can be formulated as follows: Is it normal to experience 97 approach turn accidents out of 246 total, or is there something 79 present at the site which triggers increased frequency of these accidents? Direct Diagnostics Analysis can help us to answer this question. Based on 8 years of records, approach turn accidents represent 17% of the total at urban signalized intersections, then Pi=0.17. Lets now compute the probability of observing 97 or more approach turn accidents if 246 accidents have occurred. P(X P(X>97) = 1-P(X<96) 96 0A(\ I P(X > 97) = 1 y---------0.17'(1 0.17)246~' 0 ^(246-/)!/! The probability of observing 97 or more approach turn accidents out of 246 total accidents at a normal urban signalized intersection is approaching 0, which suggests that there is a significant potential for accident reduction. In other words, there is something in the environment of this intersection, which triggers deviation from the random statistical process in the direction of reduced safety. Field investigation revealed that double left turn at each approach could be performed during permitted-protected turn phase. Permitted left turn on green with double left turn lane assignment is generally associated with limited sight distance and consequently a high number of approach turn accidents. This sort of safety problem at a signalized intersection can be effectively addressed by introducing protected left turn phasing only. 80 81 6.2 Pattern Recognition There is an important distinction between direct diagnostics and pattern recognition in studying safety of road segments. If selected accident characteristics of a roadway segment are compared to some established norm, more often than not the deviation from the norm will not be statistically significant. This can be explained by the dilution of the intensity of a particular component of accident characteristics profile over the length of the segment. The longer the segment the less likely that it will display significant deviation from the norm for any of the accident characteristics. This dilution often leads to overlooking of significant safety problems susceptible to correction. For instance lets examine a hypothetical roadway segment which is 1 mile long with the following accident history over a 3 year period: 10 accidents total 7 overturning 2 rear-end 1 fixed object Lets also assume that overturning accidents on the average represent 20% of the total for this functional class. Considering that each accident can be viewed as an independent Bernoulli trial with 20% probability of overturning, we can compute the probability of having 7 or more overturning accidents out of 10 as follows: 82 P(X > 7,10;0.20) = 1 P{X < 6,10;0.20) 6 10! P{X > 7) = 1 Y-----------0.20'(1 0.20)10 = 0.09% ifo (10 z)!z! As can be seen from the above calculations the probability that 7 accidents or more out of 10 will result in overturning as part of a normal statistical process is extremely low 0.09%. Such low probability suggests that something in the roadway environment triggers overturning accidents. This element needs to be identified and corrected. Lets now consider a different situation where that same segment is located within project limits of a 5-mile long roadway improvement project, Figure 6.3. Figure 6.3 Accidents by Segment CONSTRUCTION PROJECT LIMITS mile 1 mile 2 mile 3 mile 4 mile 5 2 Overturns 2 Overturns 7 Overturns 3 Overturns 1 Overturn 8 Other 8 Other 3 Other 7 Other 9 Other 50 Accidents Total < 14 Overturns 36 Other 83 The accident history over a 3 year period is as follows: 50 accidents total 14 overturning 10 rear-end 25 fixed object accidents Lets compute the probability of 14 or more overturning accidents out of 50 total. P(X > 14,50;0.20) = 1 P(X < 13,50;0.20) = = 0.12- certainly a strong possibility If a 5-mile road improvement project was examined for overturning frequency using only direct diagnostics method we would have concluded that no overturning problem is present within project limits, yet we know that at least one mile outfive(5) has a serious overturning problem. The question then becomes how can we identify a hidden safety problem within project limits? In other words how can we systematically recognize a pattern of accidents within a roadway segment? The problem is that of detection of deviation outside the boundaries of the 84 random Bernoulli process in the direction of reduced safety. This deviation is frequently confined to a very limited area and needs to be recognized (ferreted out) or classified as such through some form of propagation of continuous statistical testing. In order to make appropriate classification decision some amount of a priory knowledge is required about the expected system performance. This knowledge was derived from an extensive data-set describing various characteristics of accident distribution profile endemic to specific classes of roads. This data-set was compiled for six (6) classes of roads over a period of 8 years and contains 84 different parameters related to accident occurrence such as accident type, severity, roadway conditions etc.. It represents a source of a priory knowledge base required for computing a posteriori probabilities. In order to further illustrate the need for pattern recognition analysis lets examine a case history involving a 2-lane road in the mountainous area. Over five years of accident history the 7 mile road segment experienced 142 accidents. Safety Performance Function analysis reveals that accident frequency is well within expected range for this type of facility. SPF graph reflecting six years of accident history (averaged over three years period) for the roadway segments in the study is presented on Figure 6.4. Although accident frequency is well within expected range, the examination of the accident listing revealed unusual concentrations of night time accidents. Figure 15 shows cumulative graph of night time accidents within study limits. Lets test whether or not the overall number of night time accidents is over-represented using direct diagnostic approach, considering that 45 out 142 accidents occurred under 85 |