IDENTIFICATION OF CRASH CONTRIBUTING FACTORS: U. R. R. Manepalli, *
Transcription
IDENTIFICATION OF CRASH CONTRIBUTING FACTORS: U. R. R. Manepalli, *
1 IDENTIFICATION OF CRASH CONTRIBUTING FACTORS: 2 EFFECTS OF SPATIAL AUTOCORRELATION AND SAMPLE DATA SIZE 3 4 1 U. R. R. Manepalli, *2G. H. Bham, Ph.D. 5 6 7 8 9 10 11 12 13 1 Civil, Architectural and Environmental Engineering Missouri University of Science and Technology 2 Civil Engineering Department University of Alaska, Anchorage *Corresponding Author E-mail: [email protected], [email protected] 14 15 16 17 18 Total words in abstract = 255 19 Total words in paper = 5143 + 255 + 1*250 (Figures) + 6*250 (Tables) 20 = 7148 21 22 23 24 25 26 27 28 29 30 31 32 33 1 TRB 2013 Annual Meeting Paper revised from original submittal. Manepalli and Bham 1 IDENTIFICATION OF CRASH CONTRIBUTING FACTORS: 2 EFFECTS OF SPATIAL AUTOCORRELATION AND SAMPLE DATA SIZE 3 4 5 6 7 8 9 10 11 12 U. R. R. Manepalli Civil, Architectural and Environmental Engineering Missouri University of Science and Technology 13 G. H. Bham Ph.D. Civil Engineering Department University of Alaska, Anchorage Corresponding Author E-mail: [email protected], [email protected] ABSTRACT 14 This paper uses sample sets of crash data to examine the similarities in crash contributing 15 factors among various counties that have similar effects on spatial autocorrelation (SA). Moran’s 16 I and Getis-Ord G i * statistics were used to determine the correlation, and the multinomial logistic 17 regression to identify the crash contributing factors. Seventy-five counties in the state of 18 Arkansas were divided into five categories based on the Z-values of the Getis-Ord G i * statistic. 19 Depending on the sample size, crash data from a county or a group of counties from each of 20 these categories were used, and factors contributing to crashes in each of the categories were 21 identified based on the crash severity index. Results indicated that most of the crash contributing 22 factors identified for each category were also identified by the crash data from a county or a 23 group of counties in that category. Pulaski county, with the highest Z-value from the first 24 category indicated largest cluster of crashes and identified the highest percentage (55%) of 25 factors that contributed to crashes in that category using sample crash data. From the sample data 26 used, the multinomial logistic regression indicated the following factors to be positively 27 associated with crash severity: nighttime driving, driving under the influence of alcohol, roadway 28 gradient, curved alignment, rural areas, and head-on and sideswipe-same direction collision 29 types. The results of this research can be used for better allocation of funds by departments of 30 transportation to identify crash contributing factors that are associated with higher levels of crash 31 severity by analyzing smaller sets of data. 32 Keywords 33 Spatial autocorrelation, multinomial logistic regression, Moran’s I, Getis-Ord G i * statistic, 34 Geographic Information Systems (GIS) 2 TRB 2013 Annual Meeting Paper revised from original submittal. Manepalli and Bham 1 INTRODUCTION 2 In 2010, the United Nations General Assembly adopted a resolution, which proclaimed 3 the current decade (2011-2020) as the ‘Decade of Action for Road Safety’ (1). By highlighting 4 the need for increased activities at the national, regional and global levels, the action aims to 5 rethink approaches to traffic safety. Traffic fatalities and injuries constitute a major public health 6 concern worldwide, with nearly 1.3 million fatalities as a result of road crashes, and between 20 7 million and 50 million injuries (1). Highway crashes also cost society an estimated more than 8 $230 billion a year in the United States (2). Thus, there is a vital need to identify crash 9 contributing factors at sites with fatal and sever injury crashes. 10 In recent years, the techniques for estimation of crash prediction and crash contributing 11 factors have improved. With improved methods, the location of crash concentration and the main 12 crash causal factors can be identified. Further research to identify these crash contributing factors 13 with minimal resources is essential as budgets for departments of transportation (DOT) are 14 declining. The Missouri DOT will cut its expenditures by $512 million over the next five-years. 15 The future annual budget for construction and maintenance is also expected to reduce by half, 16 from $1.2 billion to $600 million (3). Therefore, it is imperative to examine techniques that save 17 personnel time and resources, and can identify major crash contributing factors with limited data 18 that will save lives and crash costs to society. 19 Past studies have used crash data to identify high spatial concentration of crashes (4-8). 20 Kim et al. (9) identified spatial and temporal patterns among crashes. McMohan (10) used 21 buffering, cluster analysis, and made spatial queries to analyze pedestrian crash risk in 22 Geographic Information Systems (GIS). Peled et al. (11) generated maps for the distribution of 23 crash concentrations. Premo (12) used global spatial statistics to determine the presence of 24 spatial autocorrelation (SA) in archaeological data. To identify the local trends, spatial statistics 25 was used and previously undetected archaeological trends were discovered. In traffic safety, 26 when researchers considered the effects of SA, they found a significant impact on the estimation 27 of crash risk factors (13-15). Huang et al. (16) and Siddiqui et al. (17) developed Bayesian 28 models to relate various county-level socioeconomic factors and traffic data to crash occurrence 29 while accounting for possible SA among adjacent counties. Li et al. (18) used Bayesian approach 30 to identify and rank road segments for hotspot identification. They generated three dimensional 31 maps to express the crash risk for various road segments and recommended determining the 3 TRB 2013 Annual Meeting Paper revised from original submittal. Manepalli and Bham 1 contributing factors for high risk road segments. In summary, extensive research has been 2 conducted in identifying spatial clusters. However, to identify if any relation exists among these 3 clusters requires further research. 4 Identification of crash contributing factors is a vital area of research; however, very few 5 researchers has examined the validity of sample size e.g., data at the level of a county to identify 6 factors that contribute to crashes over a state and research in this area can play a significant role 7 in identifying crash clusters (10). The first law of geography states that “everything is related to 8 everything else, but near things are more related than distant things” (19). Therefore, it can be 9 hypothesized that spatially, crashes have (some) similarities that may include common crash 10 contributing factors. SA is used in this paper to identify these similarities between crashes 11 (events). Further, to reduce fatal and severe injury crashes, multinomial logistic regression 12 (MLR) is used to determine the relative risk among crashes, given that a crash has occurred. 13 These objectives are the focus of this paper. The details of SA and MLR are described in the 14 subsequent sections. To identify sample data for analyses, a county or a set of counties with the 15 highest crash severity index (CSI) were chosen. Details of CSI are presented under 16 ‘Methodology’. 17 Further, Guerts et al. (20) analyzed crash sites in Belgium to identify hotspots. A 18 sensitivity analysis indicated that 190 sites (23.8%) showed a higher risk of crashes when crash 19 severity was considered. Hauer et al. (21) indicated that identification of sites with high crash 20 severity led to more cost-effective projects. From the literature reviewed, it was clearly observed 21 that consideration of crash severity compared to crash frequency, lead to cost-effective projects. 22 Crash severity is used in this paper by utilizing the crash severity index. 23 This paper next presents the data set analyzed, a description of methodology, spatial 24 autocorrelation (SA), the various indices used in this paper, and the multinomial logistic 25 regression (MLR) technique. This is followed by analysis and application of results. The paper 26 ends with discussion, conclusions and recommendations for future research. 27 CRASH DATA ANALYZED 28 Arkansas crash data from eight Interstate, 19 U.S. and 239 State highways were used for 29 this study. The data comprised of 112,695 crashes in 75 counties from 2004 through 2006. 30 Tables 1 and 2, list the factors used to analyze crash severity injury and the SA among them. The 31 distance between the crashes required to calculate SA was obtained from the crash data set. 4 TRB 2013 Annual Meeting Paper revised from original submittal. Manepalli and Bham 1 2 METHODOLOGY The first step in the methodology was the categorization of counties. The second step 3 identified the county(ies) from each category that provided the sample data. The county with the 4 highest crash severity index (CSI) was selected to represent the sample data. For MLR models, 5 the minimum sample size required is 2000 (22). Therefore, county (ies) were chosen to satisfy 6 this criterion. The third step identified the crash contributing factors from the sample data and the 7 remaining data in each category. In the final step, the percentage of common crash contributing 8 factors between the sample data (county) and the remaining data for each category were 9 computed. ‘Common’ refers to factors that followed similar trend (increase/decrease) in the odds 10 ratio and statistically significant crash contributing factors between the county data and the data 11 for the remaining counties in each category. Details of SA, categorization of the counties, 12 computation of CSI, and MLR are presented in the following. 13 Spatial Autocorrelation (SA) Indices 14 The basic principle of SA is similar to the first law of geography mentioned previously. 15 SA is defined as the correlation of a variable with itself in space. SA measures the strength of 16 autocorrelation and the assumption of independence. A variable is said to be spatially 17 autocorrelated if there are any systematic patterns in its spatial distribution. SA is positive if 18 nearby areas (regions) are alike. Negative autocorrelation applies to neighboring areas that are 19 unlike, and no SA is exhibited by random patterns. Spatial autocorrelation indices, however, do 20 not explain why locations that indicate a cluster of crashes have a higher incidence of crashes 21 compared to other locations; therefore, SA methods cannot identify factors that cause crashes 22 (23). In this paper, Moran’s I was used to determine SA. Getis-Ord G i * statistic was then used to 23 identify the clusters of crashes by county. Z-values of G i *were used to categorize these counties. 24 Moran’s I is one of the oldest indicators of SA (24). SA compares the value of a variable at one 25 location with its value at other locations. Similar to a correlation coefficient, SA varies between 26 –1.0 and + 1.0. A positive correlation indicates clustering (i.e., higher crash concentrations), 27 whereas a negative correlation indicates dispersion or lower crash concentration. Moran’s I can 28 be expressed as: 5 TRB 2013 Annual Meeting Paper revised from original submittal. Manepalli and Bham Moran ' s I = 1 n ∑∑ wij (Yi − Y )(Y j − Y ) i j i≠ j 2 (1) ( ∑ wij )∑ (Yi − Y ) i where: 3 n = crash frequency, 4 w ij = weight used to compare crashes at locations i and j, 5 Y = mean crash severity index, 6 Yi = crash severity at location i, and 7 Yj = crash severity at location j. 8 9 The term w ij represents a contiguity matrix. If location j is adjacent to location i, the 10 interaction receives a weight of 1, otherwise, it is zero. The term w ij compares the sum of the 11 cross-products of values at different locations, weighted by the inverse of the distance between 12 the locations. The significance of the Moran’s I can be evaluated by a Z-value as: 14 E(I ) = −1 n −1 n 2 ( n − 1) s1 − n ( n − 1) s2 − 2 s02 ( n + 1)( n − 1) s02 S(I ) = where: S0 = ∑w i≠ j 19 (3) To calculate the Z-value, S(I), the standard deviation is computed as: 17 18 (2) where, E(I), the expected value of Moran’s I (without SA) can be computed as: 15 16 I − E(I ) S (I ) Z (I ) = 13 ij , S1 = 1 2 ∑ (w i≠ j ij + w ji ) 2 , and S 2 = (4) ∑ (∑ w + ∑ w kj k j ik )2 i In the above formula i, j, and k represent the locations of crashes, respectively. Values of 20 Z greater than +1.96, and less than -1.96 indicate positive and negative spatial autocorrelation, 21 respectively, at a significance level of 5%. 6 TRB 2013 Annual Meeting Paper revised from original submittal. Manepalli and Bham 1 Getis-Ord G i * Statistic G-statistics, developed by Getis and Ord, analyzes the evidence of 2 spatial patterns and represent a global SA index (25, 26). The G i * statistic, on the other hand, is a 3 local SA index. It is more suitable for discerning cluster structures of high or low concentration. 4 A simple form of the G i * statistic is (27): n ∑w G i* = 5 ij xj j =1 (5) n ∑x j j =1 6 where: G i * is the SA statistic of an event i over n events (crashes) (4). The term x j characterizes 7 the magnitude of the variable x at events j over all n, and it is the CSI value determined at a 8 particular location. The distribution of the G i * statistic can be observed from the underlying 9 distribution of the variable x (4). The threshold distance (the proximity of one crash to another) 10 in this study was set to zero to indicate that all features were considered neighbors of all other 11 features. This threshold was applied over the entire region of the study. 12 13 The standardized G i * is essentially a Z-value and can be associated with statistical significance: n Gi* = 14 15 16 ∑w x j =1 ij ij n − X ∑ wij j =1 2 n n 2 n∑ wij − ∑ wij j =1 j =1 S n −1 (6) where: n ∑x j =1 2 j 17 S= 18 Positive and negative G i * statistic values correspond to clusters of crashes with high- and 19 n − ( X )2 (7) low-value events, respectively. A G i * close to zero implies a random distribution of events. 20 7 TRB 2013 Annual Meeting Paper revised from original submittal. Manepalli and Bham 1 Categorization of Counties 2 The categorization of counties was based on the Z-value of the G i * statistic calculated for 3 each of the counties. This categorization can be based on six different classification schemes i.e., 4 equal interval, defined interval, quartile, natural breaks, geometric interval, and standard 5 deviation. The natural breaks scheme was best suited for the present study (28, 29). In the natural 6 breaks scheme, the classes are based on categorizing inherent in the data. The classes identify the 7 break points that best group similar values and maximize the differences between these classes. 8 9 In this study, the Jenks’ algorithm was used to categorize the natural breaks (28). This algorithm is commonly used to classify the data in a choropleth map; a type of thematic map that 10 uses shading to represent classes of a feature associated with specific areas (e.g., a population 11 density map). The Jenk’s algorithm generates a series of values that best represent the actual 12 breaks in the data, as opposed to some arbitrary classification scheme. Thus, it preserves the true 13 clustering of data values. As a result, the algorithm creates k classes as the variance within 14 categories is minimized (30). 15 Selection of Counties for Analysis of Crash Data 16 From each category, a county or a set of counties starting with the highest crash severity 17 index (CSI) was (were) selected as a data sample. The highest CSI was used as the criterion as it 18 provided the greatest variability in the crash data and provides a better choice in terms of sample 19 data for analysis compared to other data samples such as random selection of data samples. CSI 20 was used to incorporate crash severity, and to associate crash contributing factors with high 21 levels of crash severity. A high CSI indicates a large number of fatal crashes or frequency of 22 crashes with various levels of severity. The CSI was computed as (31): 23 24 CSI = S1*W1 + S2*W2 + S3*W3 + S4*W4 + S5*W5 (8) where: 25 S1 = frequency of crashes involving fatalities, 26 S2 = frequency of crashes involving incapacitating injuries, 27 S3 = frequency of crashes involving moderate injury, 28 S4 = frequency of crashes involving complaint of pain, 29 S5 = frequency of crashes involving property-damage-only (PDO), and 8 TRB 2013 Annual Meeting Paper revised from original submittal. Manepalli and Bham 1 2 3 WI = weights (542, 29, 11, 6, and 1(32)) assigned to crash severity levels, I = 1, 2 …5. The weights used are based on the comprehensive crash costs per person for each level of 4 crash severity (S1-S5) as used in the Highway Safety Manual (HSM) (32) and Manepalli et al. 5 (31). They were calculated as the ratio of different crash costs with respect to property damage 6 only crashes. The crash costs were $4,008,900 for a fatal crash (S1), $216,000 for major injury 7 (S2), $79,000 for minor injury (S3), $44,900 for complain of pain (S4) and $7,400 for property 8 damage only crashes (S5), i.e., weight for the fatal crash equals 542 (4,008,900/7,400). Similarly, 9 the other weights were computed and rounded up to the nearest integer. 10 11 Multinomial Logistic Regression Logistic regression (LR) can be used to predict dependent variables from different types 12 of independent variables, and can compute the percent of variance associated with the dependent 13 variable that is explained by the independent variable. LR can also rank the relative importance 14 of independent variables to assess the interaction effects, and explain the impact of covariate 15 control variables. The impact of predictor variables can be explained in terms of the odds ratios. 16 In this study, crash severity with five different levels (S1-S5) was used as the dependent variable. 17 As the dependent variable has more than two categories, multinomial logistic regression (MLR) 18 was selected for this study. Crash severity was calculated relative to the property-damage-only 19 crashes as explained previously. The details of independent variables and their levels are 20 presented in Table 1. 21 All independent variables were checked using variance inflation factor (VIF) to ensure 22 that multicollinearity did not exist in the data. VIF was found to be less than 10 for all of the 23 variables; hence, multicollinearity was not observed. Variables selected for model development 24 depended on the quality of the data. Only certain factors were retained for analysis as some 25 factors had missing values. When more than 10% of the values were missing, that factor was not 26 considered. For the factors presented in Table 1, values no more than 1% were missing. Mallow 27 C p was used to retain the variables. A smaller value of C p indicates a better model (33). 28 The Statistical Analysis System (SAS) (34) was used to perform MLR using the 29 CATMOD (Categorical Modeling) procedure to identify the factors that contribute to crashes 30 and are positively associated with crash severity. The CATMOD procedure has been used in the 9 TRB 2013 Annual Meeting Paper revised from original submittal. Manepalli and Bham 1 past for linear modeling, log-linear modeling, logistic regression, and repeated measurement 2 analysis (33). 3 INTERPRETATION AND ANALYSIS OF RESULTS 4 In this section, the results for SA are first presented. The results of MLR for the county 5 selected in the first category, and then for the rest of the category are presented. To avoid 6 repetition, consolidated results for the remaining categories are summarized at the end of the 7 section. 8 Results for Spatial Autocorrelation Indices 9 Table 2 presents the results of Moran’s I, demonstrating that the total crash frequency and 10 the different crash severity levels exhibit SA, and that the crashes were not random chance 11 events. The crash severity levels proved significant at various levels. The counties in Arkansas 12 were divided into five categories using G i *; Figure 1 shows these categories, and Table 3 13 presents the results by category and lists the number of counties in each category. For some 14 categories, more than one county was selected as MLR models need a minimum sample size of 15 2000 crashes (22). As mentioned earlier, a set of counties were identified based on the highest 16 CSI. In Table 3, the size of sample data used in terms of CSI and crash frequency ratio used in 17 identifying the crash contributing factors is also presented. Columns D and E indicate the CSI 18 computed for the sample county(ies) and the data for the remaining counties used, respectively. 19 The first category included three counties, Pulaski, Benton and Washington. Pulaski 20 County covers the Arkansas state capital of Little Rock. Little Rock has a population of 183,133 21 and an area of 116.8 square miles (35). Fayetteville, another major city in Arkansas, with a 22 population of 72,202 and an area of 44.5 square miles, is located in Washington County. Several 23 Interstate highways pass through Little Rock. The AADT on one of those highways, I-630, was 24 52,297 vehicles/day. Benton and Washington border the states of Missouri and Oklahoma; have 25 a high volume of traffic, and high incidence of crashes. These counties, therefore, indicated a 26 high Z-value of the G i * statistic. 27 Results for Multinomial Logistic Regression 28 Table 4 presents the results for Pulaski County (the county with the highest CSI in the 29 first category), and Table 5 shows the collective results for the remaining counties in the first 10 TRB 2013 Annual Meeting Paper revised from original submittal. Manepalli and Bham 1 category; results are presented that are statistically significant at a level of 0.05. The factors 2 common to both Pulaski County and the other counties in the first category follow similar trend 3 (i.e., an increase or decrease in the odds ratio) with respect to the factors identified for the first 4 category. These factors are shaded in these tables. Details of the analysis of the odds ratio are 5 presented below, along with a few examples for each case. For detailed results on all of the other 6 categories, the interested reader is referred to Manepalli (36). 7 Table 4 indicates that during darkness, fatal crashes were more likely to occur than 8 property damage only crashes, and the odds ratio increased by a factor of 1.28 if other variables 9 remained constant. Similarly, the relative risk of fatal crashes was greater than property-damage- 10 only crashes in rural areas and on curved roads. For details on calculation of the odds ratio, the 11 interested reader is referred to Bham et al. (33). 12 Table 5 indicates that major injury crashes to property-damage-only crashes were more 13 likely to result from a head-on collision than from a sideswipe-same-direction (SSSD) collision 14 by a factor of 4.66, given that all other variables remained constant. Similarly, major injury 15 crashes were more likely than property-damage-only crashes on rural roads and curved roads, as 16 a result of single-vehicle crashes (SVC). The risk of a major injury crash relative to a property- 17 damage–only crash decreased by a factor of 0.50 in rear-end collisions compared to SSSD. 18 Similarly, major injury crashes were more likely than property-damage-only crashes when 19 alcohol was involved, AADT was high, occurred on the weekends, and in the case of SSSD 20 collisions. To reduce the repetitive nature of the results, only the results of the first category are 21 explained in the text. 22 Summary 23 Table 6 summarizes the results for each of the five categories at each crash severity level. 24 It indicates the number of crash contributing factors common to both a county or a set of 25 counties with the highest CSI and the remaining counties collectively within a category that 26 showed similar trends in the odds ratio for both (sample county(ies) and remaining counties). 27 For the third category, crashes involving complains of pain were more frequently the 28 result of SVC than SSSD crashes, but indicated positive and negative values for the odds ratio 29 (i.e., similar trend was not observed), respectively. For Craighead County, the odds ratio 30 increased (SVC vs SSSD); however, overall for the third category, it decreased. Therefore, SVC 11 TRB 2013 Annual Meeting Paper revised from original submittal. Manepalli and Bham 1 or SSSD collision types cannot be considered factors that are positively associated with crash 2 severity. 3 Rural areas were positively associated with crash severity. The following factors also had 4 a positive association with crash severity: darkness, driving under the influence of alcohol, 5 roadways on a grade, curved roadways, head-on and sideswipe same direction collisions (SSSD). 6 Positive association of these factors with crash severity has been found in other studies as well 7 (33). 8 Additional inferences 9 The factors identified were based on classification of SA in the first step. The division of 10 the counties by SA indices, therefore, plays a major role in identifying factors that contribute to 11 crashes. These results are supported by the factors identified by MLR as well. Analysis 12 conducted for each highway individually in Arkansas would require significant computational 13 time and resources. A safety analyst’s time would be significantly reduced if the factors that 14 contribute to crashes can be analyzed at the county level. The summary of results indicates that 15 an analysis of a set of counties in a state was sufficient to identify 49% of factors that contributed 16 to crashes statewide. Tables 2, 3 (column G) and 6 demonstrate that crash data from a group of 17 selected counties (13 out of 75 counties) represented 25% of the total crash data and identified 18 49% of the common factors that contributed to the crashes. The results from Table 3 (column G) 19 indicate that the use of a small proportion of crash data can be used to identify the major crash 20 contributing factors. This proportion varies for different categories and is remarkable that overall 21 such a small proportion of the crash data (25%) identifies major (49%) crash contributing 22 factors. 23 Previous studies (4) have shown that negative Z-values of the G i * statistic contains fewer 24 clusters of events. The fourth and fifth categories with Z-values from -.481 to -.098 and from - 25 .775 to -.481, respectively; had fewer crash clusters. As values close to zero imply random 26 distribution of spatial events, the third category (-.098, .560), and to a lesser degree the second 27 category (.560, 1.769), demonstrate a random distribution of spatial events. 28 APPLICATION TO CURRENT PRACTICE 29 30 State DOTs allocate funds for application of remedial measures to hotspots. The results of this study indicate that as Z-values decrease, crash concentrations also decrease. DOTs can 12 TRB 2013 Annual Meeting Paper revised from original submittal. Manepalli and Bham 1 allocate funds and concentrate on counties with higher Z-values, rather than equal importance to 2 all counties. State DOTs can initiate a statewide improvement program such as improve shoulder 3 width, add median barriers, etc. by identifying the factors from the sample data using SA. 4 For specific improvements, each highway should be analyzed separately; however, the 5 entire length of each highway need not be analyzed. For instance, I-40 passes through the entire 6 state of Arkansas. SA indices indicated the highest positive Z-value for Pulaski County. Sections 7 of I-40 in Pulaski County should, therefore, receive the highest priority rather than the entire 8 length (285 miles) of I-40 in Arkansas (35). 9 The procedure presented in this paper can be used for identification of crash clusters; 10 counties that exhibit similar levels of clustering. Categories with positive Z-values require more 11 attention to detail compared to categories with negative Z-values. This will aid DOTs in 12 improved allocation of funds with budget constraints in the current economy of the United 13 States. This procedure will require limited personnel time in analysis of data; however, statistical 14 knowledge is required for such an analysis. 15 DISCUSSION, CONCLUSIONS AND RECOMMENDATIONS 16 The methodology proposed in this paper can allow departments of transportation (DOTs) 17 to effectively identify factors that contribute to crashes by using a sample of crash data, thus 18 allocating more time to study crash contributing factors associated with crash severity. 19 The methodology used categorized 75 counties in Arkansas into five categories. From 20 these categories, counties with the highest crash severity index were selected. Pulaski County, in 21 the first category, had the highest Z-value (6.16) and identified 55% of the crash contributing 22 factors that were common with remaining counties in the category. 23 The use of Moran’s I or Getis-Ord G i * statistic does not suffice in revealing the effects of 24 spatial autocorrelation (SA) in crash data analysis. The use of spatial indices, Moran’s I and 25 Getis-Ord G i * statistic, is recommended because Moran’s I indicates the presence of SA and 26 Getis-Ord G i * indicates the relative level of clustering. DOTs should consider categorization of 27 counties based on Z-values of the G i * statistic, if global SA exists. 28 Factors contributing to crashes were identified using multinomial logistic regression 29 (MLR) in this paper. The odds ratio was used to identify the factors positively associated with 30 crash severity. Rural areas had a positive association with crash severity in addition to the 31 following factors: darkness, driving the under influence of alcohol, roadway on a grade, curved 13 TRB 2013 Annual Meeting Paper revised from original submittal. Manepalli and Bham 1 roadways, and head-on and sideswipe same direction collisions. The positive association of these 2 factors with crash severity was also found in another study (33). 3 Further research is recommended to identify and compare factors that contribute to 4 crashes over time using crash data for more than three years. The effect of selecting counties 5 with a low frequency of certain levels of crash severity is also a subject for future research. 6 Analysis of additional factors related to roadway geometry will provide more insight on the 7 methodology used in the paper. Further, the authors plan to study factors such as land use 8 distribution for spatial autocorrelation and identify the crash contributing factors. 9 ACKNOWLEDGMENTS 10 The authors are grateful to the Arkansas Highway and Transportation Department for 11 providing the data for this research. They also like to thank Drs. V. A. Samaranayke and 12 Dominique Lord for their input. The authors also like to thank the four anonymous reviewers for 13 their comments which helped improve the quality of the paper. 14 REFERENCES 15 16 17 18 19 20 21 22 23 24 25 26 1. Decade of Action for Road Safety 2011 to 2020, http://www.decadeofaction.org/, Accessed July 1, 2011. 2. NHTSA (2006). http://www-nrd.nhtsa.dot.gov/Pubs/810623.PDF, Accessed March 10, 2010. 3. MoDOT's Bolder Five-Year Direction, A Presentation to the Missouri Highways and Transportation Commission, June, 2011, http://www.modot.org/bolderfiveyeardirection/documents/FINAL_June8plan.pdf. Accessed July 1, 2011. 4. Songchitruksa, P., and Zeng, X. Getis-Ord Spatial Statistics for Identifying Hot Spots Using Incident Management Data. Proceedings of Transportation Research Board 89th Annual Meeting, Washington D. C., 2010. 5. Depue, L. Safety Management Systems: A Synthesis of Highway Practice. NCHRP Synthesis 322, Transportation Research Board of the National Academies, Washington, D.C., 2003. 6. Norden, M., J. Orlansky and H. Jacobs. Application of Statistical Quality-Control Techniques to Analysis of Highway-Accident Data. Bulletin 117, HRB, National Research Council, Washington, D.C., 1956, pp. 17-31. 7. Hakkert, A. S. and D. Mahalel. Estimating the Number of Accidents at Intersections from a Known Traffic Flow on the Approaches. Accident Analysis & Prevention, Vol. 10, No. 1, 1978, pp. 69-79. 8. McGuigan, D. R. D. The Use of Relationships between Road Accidents and Traffic Flow in “Black-Spot” Identification. Traffic Engineering and Control, 1981, pp. 448–453. 9. Kim, K. D. Takeyama and L. Nitz. Moped safety in Honolulu Hawaii. Journal of Safety Research, Vol. 26, No. 3, 1995, pp. 177–185. 10. McMahon, P. A Quantitative and Qualitative Analysis of the Factors Contributing to Collisions between Pedestrians and Vehicles along Roadway Segments. Masters Project, 27 28 14 TRB 2013 Annual Meeting Paper revised from original submittal. Manepalli and Bham 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Department of City and Regional Planning, University of North Carolina at Chapel Hill, NC. 1999. 11. Peled, A., Haj-Yehia, B., Hakkert, A.S. ArcInfo Based Geographical Information System for Road Safety Analysis and Improvement, http://www.esri.com/library/userconf/proc96/TO50/PAP005/P5.HTM, Accessed November, 2009. 12. Premo, L. S. Local spatial autocorrelation statistics quantify multi-scale patterns in distributional data: an example from the Maya Lowlands. Journal of Archaeological Science, Vol. 31, 2004, pp. 855-866. 13. Huang, H., Chin, H. C., and Haque, M. M. Empirical Evaluation of Alternative Approaches in Identifying Crash Hot Spots: Naïve Ranking, Empirical Bayes, and Full Bayes. In Transportation Research Record: Journal of the Transportation Research Board, No. 2013, 2009, pp. 32-41. 14. Quddus, M. A. Modeling Area-Wide Count Outcomes with Spatial Correlation and Heterogeneity: An Analysis of London Crash Data. Accident Analysis and Prevention, Vol. 40, No. 4, 2008, pp. 1486-1497. 15. Miaou, S., Song, J. J., and Mallick, B. K. Roadway traffic crash mapping: a space-time modeling approach. Journal of Transportation and Statistics, Vol. 6, No. 1, 2003, pp. 33-57. 16. Huang, H., Abdel-Aty, M. A., and Darwiche, A. L. County-level crash risk analysis in Florida: Bayesian spatial modeling. Proceedings of Transportation Research Board 89th Annual Meeting, Washington D. C., 2010. 17. Siddiqui, C., Abdel-Aty, M., and Choi, K. Macroscopic spatial analysis of pedestrian and bicycle crashes. Accident Analysis & Prevention, Vol. 45, No. 45, 2012, pp. 382-391. 18. Li, L., Zhu, L., and Sui, D, Z. A GIS-based Bayesian approach for analyzing spatial-temporal patterns of intra-city motor vehicle crashes. In Journal of Transport Geography, 2007, pp. 274-285. 19. Tobler, W. A computer movie simulating urban growth in the Detroit region. Economic Geography, Vol. 46, No. 9, 1970, pp. 234-240. 20. Guerts, E., Wets, G., Brijis, T., and Vanhoof, K. Identification and ranking of black spots. In Transportation Research Record: Journal of the Transportation Research Board, No. 1897, 2004, pp. 34-42. 21. Hauer, E., Allery, B, K., Konnov, J., and Griffith, M, S. How Best to Rank Sites with Promise. In Transportation Research Record: Journal of the Transportation Research Board, No. 1897, 2004, pp. 48-54. 22. Ye, F. and Lord, D. Comparing Three Commonly Used Crash Severity Models on Sample Size Requirements: Multinomial Logit, Ordered Probit and Mixed Logit Models, Proceedings of the Annual Meeting of the Transportation Research Board, Washington, D. C., 2011. 23. Mitra, S. Spatial autocorrelation and Bayesian spatial statistical method for analyzing fatal and injury crash prone intersections. Proceedings of Transportation Research Board 88th Annual Meeting, Washington D. C., 2009. 24. Moran, P. A. P. Notes on continuous stochastic phenomena. Biometrika, No. 37, 1950, pp. 17-33. 25. Getis, A. and Ord, J. K. The analysis of Spatial Association by use of Distance Statistics. Geographic Analysis, Vol. 24, No. 3, 1992, pp. 189-206. 15 TRB 2013 Annual Meeting Paper revised from original submittal. Manepalli and Bham 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26. Ord, J. K. and Getis, A. Local Spatial Autocorrelation Statistics: Distributional Issues and an Application. Geographic Analysis, Vol. 27, No. 4, 1995, pp. 286-306. 27. McGuigan, D. R. D. The Use of Relationships between Road Accidents and Traffic Flow in “Black-Spot” Identification. Traffic Engineering and Control, 1981, pp. 448–453. 28. http://go.owu.edu/~jbkrygie/krygier_html/geog_353/geog_353_lo/geog_353_lo07.html Accessed, June 23, 2010. J. B. Krygier: Geography lecture notes. 29. http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?TopicName=Standard_classification_sc hemes Accessed, June 23, 2010. ArcGIS 9.2 Desktop Help Online Manual. 30. http://danieljlewis.org/2010/06/07/jenks-natural-breaks-algorithm-in-python/ Accessed, June 25, 2010. Daniel J Lewis, University College London, Department of Geography. 31. Manepalli, U. R. R., Bham, G, H., and Samaranayake, V. A. An Evaluation of Crash Frequency, Crash Severity, and Composite Rank Methods for Hotspot Identification, No.123349. Proceedings of Transportation Research Board 91th Annual Meeting, Washington D. C., 2012. 32. Highway Safety Manual, First Edition, 2010. Vol. 1 Section 4-66, Table 4A-1. 33. Bham, G. H., Javvadi, B, S., and Manepalli, U. R. R. Multinomial logistic regression model for single-vehicle and multivehicle collisions on urban U.S. Highways in Arkansas. Journal of Transportation Engineering, Vol. 138, No. 6, 2012, pp. 786-797. 34. SAS online documentation manual 9.1.2., http://support.sas.com/onlinedoc/912/docMainpage.jsp Accessed, December, 2009. 35. Bham, G. H., and Manepalli, U. R. R. Identification and Analysis of High Crash Segments on Interstate, US, and State Highway Systems of Arkansas. Report MBTC 2099/3006, 2009. 36. Manepalli, U. R. R., A Statewide Analysis of Highway Safety in Arkansas, M.S. thesis, Missouri University of Science and Technology, Rolla, MO, 2011. 26 16 TRB 2013 Annual Meeting Paper revised from original submittal. Manepalli and Bham 1 Table 1. List of Independent Variables Terms Variables Atmospheric Conditions ATM (ATM) LGT Light Conditions RSUR Roadway Surface RU Roadway Type RALI Roadway Alignment RPRO Roadway Profile TOH Roadway Classification TOC Collision Types WK Days of the week Driving Under the Influence DUI AADT Annual Average Daily Traffic Levels of Variables Clear, Rain Dark, Daylight Dry, Wet Rural, Urban Curve, Straight Grade, Level Divided, Undivided Angle, Head-On, Rear End, Sideswipe Same Direction (SSSD), Single Vehicle Crashes (SVC), Sideswipe Opposite Direction (SWOD) Weekdays (M-F), Weekends (Sat, Sun) Yes, No <20,000; 20,000-40,000; 40,000-60,000; 60,000-80,000 ; 80,000-100,000; 100,000-12000 2 3 4 Table 2. Results of Spatial Autocorrelation Index, Moran’s I Global Variables Z-score α Moran's I TCF 0.06 2.50 0.05* S1 (Fatal) 0.08 2.98 0.01** S2 (Major Injury) 0.10 3.33 0.01** S3 (Minor Injury) 0.07 2.72 0.01** S4 (Complain of Pain) 0.07 2.75 0.01** S5 (Property Damage Only) 0.05 2.26 0.05* 5 6 7 Notes: *statistically significant at 95%, **statistically significant at 99% TCF = Total Crash Frequency 17 TRB 2013 Annual Meeting Paper revised from original submittal. Manepalli and Bham 1 2 3 4 5 6 7 8 9 10 11 12 Table 3. Results by category, highest CSI in each category, and ratio of crash data Category Number of counties Counties with highest CSI@ CSI! Total CSI CSI ratio* Crash Freq. ratio (A) (B) (C) (D) (E) (F) (G) First 3 Pulaski 137,627 276,755 .50 .51 Second 9 Garland 52,189 324,668 .16 .27 Third 13 Craighead 28,676 298,379 .10 .17 Fourth 25 45,707 273,196 .17 .16 -0.481099, -0.097832 Fifth 25 53,477 133,861 .40 .57 -0.775375, -0.481100 Total 75 317,676 1,306,859 .24 .34 - Notes: Madison, Cleburne, Logan Chicot, Montgomery, Polk, Perry, Little River, Clay, Colombia 13^ Range of Z value of G* i statistic (H) 1.7678741, 6.161180 0.559918, 1.768740 -0.097831, 0.559918 @ satisfies the condition of minimum sample size of 2000 in terms of crash frequency !CSI computed for county/counties in col. ‘C’ $CSI computed for counties in col. ‘B’ *Ratio of CSI values in col. D and col. E. ^total number of counties in col. C 18 TRB 2013 Annual Meeting Paper revised from original submittal. Manepalli and Bham 1 Table 4. Factors Positively Associated with Crash Severity, Pulaski County (First Category) Parameter LGT RU RALI DUI Intercept RU RALI TOC TOC TOC TOC TOC DUI Intercept RU TOH TOC TOC TOC TOC TOC WK DUI AADT AADT Intercept TOC TOC TOC TOC TOC WK AADT AADT 2 3 Standard Estimate Error Chi Square Fatal vs Property Damage Crashes Dark vs Daylight 0.25 0.12 3.92 Rural vs Urban 0.71 0.13 29.31 Curve vs Straight 0.34 0.13 6.25 No vs Yes -1.17 0.13 86.71 Major Injuries vs Property Damage Crashes -2.49 0.18 185.59 Rural vs Urban 0.43 0.08 29.24 Curve vs Straight 0.29 0.08 13.05 Angle vs SSSD -0.39 0.17 5.36 Head-on vs SSSD 1.86 0.23 64.43 Rear-end vs SSSD -0.58 0.15 15.34 SVC vs SSSD 0.69 0.13 26.32 SWOD vs SSSD -1.50 0.25 36.62 No vs Yes -0.77 0.08 91.27 Minor Injuries vs Property Damage Crashes -1.19 0.09 158.92 Rural vs Urban 0.13 0.05 7.86 Divided vs Undivided 0.15 0.04 18.29 Angle vs SSSD 0.17 0.07 5.32 Head-on vs SSSD 1.13 0.16 51.00 Rear-end vs SSSD -0.50 0.07 49.05 SVC vs SSSD 0.61 0.07 72.22 SWOD vs SSSD -1.18 0.10 137.05 Weekdays vs Weekends -0.10 0.03 9.03 No vs Yes -0.42 0.05 70.48 20000 vs 120000 0.25 0.06 17.50 80000 vs 120000 -0.16 0.07 5.59 Complain of Pain vs Property Damage Crashes -0.73 0.08 79.38 Angle vs SSSD 0.18 0.06 9.04 Head-on vs SSSD 0.63 0.15 17.80 Rear-end vs SSSD 0.30 0.06 29.19 SVC vs SSSD -0.19 0.07 8.56 SWOD vs SSSD -0.46 0.07 47.60 Weekdays vs Weekends -0.09 0.03 12.68 40000 vs 120000 -0.12 0.04 7.53 80000 vs 120000 -0.09 0.05 3.89 Factors Pr > Chi Square Odds Ratio 0.0476 <.0001 0.0124 <.0001 1.28 2.04 1.40 0.31 <.0001 <.0001 0.0003 0.0206 <.0001 <.0001 <.0001 <.0001 <.0001 1.54 1.33 0.68 6.41 0.56 2.00 0.22 0.46 <.0001 0.0051 1.14 <.0001 0.0210 <.0001 <.0001 <.0001 <.0001 1.16 1.18 3.10 0.61 1.85 0.31 0.0027 <.0001 <.0001 0.0180 0.91 0.66 1.29 0.85 <.0001 0.0026 <.0001 <.0001 0.0034 <.0001 1.20 1.88 1.35 0.82 0.63 0.0004 0.0061 0.0486 0.91 0.89 0.91 Note: *shading indicates common factors in Tables 4 and 5 (includes similar increase/decrease in the odds ratio) 19 TRB 2013 Annual Meeting Paper revised from original submittal. Manepalli and Bham 1 Table 5. Factors Positively Associated with Crash Severity (First Category#) Standard ChiPr > Chi Odds Estimate Error Square Square Ratio Fatal vs Property Damage Crash Intercept -3.1045 0.3359 85.4 <.0001 RU Rural vs Urban 0.7018 0.1378 25.94 <.0001 2.02 TOH Divided vs Undivided 0.4316 0.1503 8.25 0.0041 1.54 TOC Angle vs SSSD 0.6343 0.2995 4.48 0.0342 1.89 TOC Head-on vs SSSD 3.052 0.3328 84.08 <.0001 21.16 TOC Rear-end vs SSSD -2.0406 0.5333 14.64 0.0001 0.13 TOC SVC vs SSSD 0.5922 0.2861 4.28 0.0385 1.81 TOC SWOD vs SSSD -1.5616 0.6343 6.06 0.0138 0.21 DUI No vs Yes -1.2111 0.1275 90.22 <.0001 0.30 AADT 40000 vs 120000 -0.7311 0.2897 6.37 0.0116 0.48 Major Injury vs Property Damage Crash Intercept -1.6167 0.1538 110.53 <.0001 RU Rural vs Urban 0.6714 0.0668 101.16 <.0001 1.96 RALI Curve vs Straight 0.2893 0.0642 20.31 <.0001 1.34 TOH Divided vs Undivided 0.2977 0.0734 16.46 <.0001 1.35 TOC Head-on vs SSSD 1.539 0.231 44.37 <.0001 4.66 TOC Rear-end vs SSSD -0.7016 0.1394 25.33 <.0001 0.50 TOC SVC vs SSSD 0.6892 0.126 29.91 <.0001 1.99 TOC SWOD vs SSSD -1.0892 0.2264 23.15 <.0001 0.34 WK Weekday vs Weekend -0.19 0.0565 11.31 0.0008 0.83 DUI No vs Yes -0.6524 0.078 69.9 <.0001 0.52 AADT 40000 vs 120000 -0.6028 0.1233 23.91 <.0001 0.55 Minor Injury vs Property Damage Crash Intercept -0.8127 0.0949 73.4 <.0001 ATM Clear vs Rain 0.1561 0.081 3.72 0.0538 1.17 RU Rural vs Urban 0.1381 0.0439 9.89 0.0017 1.15 RPRO Grade vs Level 0.1299 0.032 16.45 <.0001 1.14 TOH Divided vs Undivided 0.1946 0.0404 23.24 <.0001 1.21 TOC Angle vs SSSD 0.1734 0.0622 7.77 0.0053 1.19 TOC Head-on vs SSSD 1.1991 0.1375 76.04 <.0001 3.32 TOC Rear-end vs SSSD -0.5982 0.0646 85.82 <.0001 0.55 TOC SVC vs SSSD 0.7107 0.072 97.38 <.0001 2.04 TOC SWOD vs SSSD -1.2811 0.1159 122.23 <.0001 0.28 WK Weekday vs Weekend -0.1116 0.0318 12.31 0.0005 0.89 DUI No vs Yes -0.5 0.0504 98.23 <.0001 0.61 AADT 20000 vs 120000 0.2738 0.0685 15.97 <.0001 1.31 AADT 60000 vs 120000 -0.238 0.1131 4.43 0.0353 0.79 Complain of Pain vs Property Damage Crash Intercept -0.3083 0.0756 16.61 <.0001 LGT Dark vs Daylight 0.0538 0.0234 5.27 0.0217 1.06 RALI Curve vs Straight 0.0746 0.0335 4.97 0.0258 1.08 RPRO Grade vs Level 0.0566 0.0236 5.73 0.0167 1.06 TOH Divided vs Undivided 0.1422 0.0295 23.18 <.0001 1.15 TOC Rear-end vs SSSD 0.2054 0.0457 20.2 <.0001 1.23 TOC SWOD vs SSSD -0.5199 0.0668 60.55 <.0001 0.59 WK Weekday vs Weekend -0.0678 0.0238 8.15 0.0043 0.93 DUI No vs Yes -0.1589 0.0469 11.49 0.0007 0.85 AADT 20000 vs 120000 0.1518 0.0474 10.26 0.0014 1.16 AADT 40000 vs 120000 -0.1173 0.0456 6.62 0.0101 0.89 Note: *shading indicates common factors in Tables 4 and 5(includes similar increase/decrease in the odds ratio); # excludes Pulaski County Parameter 2 Factors 20 TRB 2013 Annual Meeting Paper revised from original submittal. Manepalli and Bham 1 2 Table 6. Summary of Results: Crash Frequency and Severity Contributing Factors for Selected County and Category Major Minor Complain S. No Fatal Injury Injury of pain Description (1) (2) (3) (4) (5) (6) I. Category 1 1 No. of contributing factors for Pulaski County 4 8 11 8 No. of contributing factors in Category 1 (excluding Pulaski 2 county) 9 10 13 10 3 No. of contributing factors common to I.1 and I.2 2 7 10 4 Percentage of all crashes resulting from factors common to I.1 4 and I.2 22 70 77 40 55 5 Percentage of commonly identified factors for the Category 1 (((2+7+10+4)/(9+10+13+10))*100) II Category 2 1 No. of contributing for factors Garland County 3 8 10 7 No. of contributing factors in Category 2 (excluding Garland 2 county) 7 10 15 12 3 No. of contributing factors common to II.1 and II.2 3 6 10 5 Percentage of all crashes resulting from factors common to II.1 4 and II.2 43 60 67 42 5 Percentage of commonly identified factors for the Category 2 54 III Category 3 1 No. of contributing for factors Craighead County 4 4 8 4 No. of contributing factors in Category 3 (excluding Craighead 2 county) 8 13 10 8 3 No. of contributing factors common to III.1 and III.2 3 4 7 3 Percentage of all crashes resulting from factors common to III.1 4 and III.2 38 31 70 38 5 Percentage of commonly identified factors for the Category 3 44 IV Category 4 No. of contributing factors for Madison, Cleburne, Logan 1 counties 6 5 4 5 No. of contributing factors in Category 4 (excluding Madison, 2 Cleburne, Logan counties) 6 11 10 8 3 No. of contributing factors common to IV.1 and IV.2 3 5 3 4 Percentage of all crashes resulting from factors common to IV.1 4 and IV.2 50 45 30 50 5 Percentage of commonly identified factors for the Category 4 43 V Category 5 No. of contributing factors for Chicot, Montgomery, Polk, Perry, 1 Little River, Clay and Columbia counties 5 4 7 1 No. of contributing factors in Category 5 (excluding Chicot, 2 Montgomery, Polk, Perry, Little River, Clay and Columbia counties) 3 5 7 5 3 No. of contributing factors common to V.1 and V.2 1 3 4 0 Percentage of all crashes resulting from factors common to V.1 4 and V.2 33 60 57 0 5 Percentage of commonly identified factors for the Category 5 40 Consolidated total percentage of commonly identified factors 49 Note: The frequency of crash contributing factors identified is with respect to property damage crashes only 3 21 TRB 2013 Annual Meeting Paper revised from original submittal. Manepalli and Bham 1 2 Figure 1. Counties categorized by G i *statistic for three years of Arkansas crash data 3 TRB 2013 Annual Meeting Paper revised from original submittal.