top of page

Uncovering Information Hidden in Crash Narratives: Enhancing Safety Reporting for Pedestrians and Bicyclists in Transit Bus Collisions

Project Description

Pedestrians and bicyclists are among the least protected road users in traffic environments, disproportionately affected by traffic crashes. In 2022, the National Highway Traffic Safety Administration reported 42,795 traffic fatalities in the United States, including 7,522 pedestrians (18 percent) and 1,105 bicyclists (3 percent). Transit buses were involved in 6,731 crashes in 2022, including collisions with pedestrians and bicyclists. These statistics emphasize the critical safety risks faced by pedestrians and bicyclists, particularly in interactions with transit buses. These interactions often result in severe outcomes due to the physical characteristics of buses and the increased exposure of pedestrians and bicyclists. Despite ongoing improvements in crash reporting, existing datasets lack detailed behavioral and contextual factors that can inform targeted interventions. Structured data alone often fails to capture important elements such as pedestrian/bicyclist distraction, alcohol impairment, pedestrian crossing violations, or unusual crash scenarios. As a result, safety professionals and transit agencies face limitations when developing targeted strategies to reduce injury severity and prevent future incidents. This project addresses these deficiencies by leveraging the rich, crash narrative data in the National Transit Database Major Safety Events Dataset. This project applies advanced artificial intelligence and natural language processing methods, including text mining, topic modeling (Latent Dirichlet Allocation), hierarchical clustering, and contextual analysis, to extract meaningful patterns from thousands of crash narratives involving pedestrians and bicyclists. Initial results from over 5,600 crash narratives demonstrate that 20 distinct crash topics can be identified using topic modeling. These topics represent a wide range of behavioral and situational factors, such as pedestrian and bicyclist distraction, limited visibility, and boarding or alighting incidents. By transforming unstructured text into structured indicators, this project enhances transit safety reporting and provides new insights to inform safety practices for pedestrians and bicyclists involved in bus-related crashes.

Outputs

The project will produce several key outputs aimed at enhancing transit safety analysis, data quality, and knowledge transfer. A comprehensive final research report will document the study’s methodology, findings, and recommendations, with a focus on how unstructured crash narratives can be transformed into structured insights using natural language processing. Peer-reviewed journal papers and conference presentations, such as for the Transportation Research Board (TRB), will disseminate technical results on topic modeling, clustering, and narrative analysis for transit safety. A cleaned and enriched crash database will be developed by integrating structured transit crash records with new variables extracted from narrative data, such as indicators of distraction, impairment, and unusual crash scenarios. This enhanced dataset will be made available for further research and policy analysis. In support of technology transfer, the research team will engage stakeholders through webinars and presentations tailored to transportation agencies and planning organizations. To advance education and workforce development, the project will actively involve graduate students in data science and transportation safety research. Students will gain hands-on experience applying AI methods to real-world safety data and will integrate findings into their thesis and dissertation work. Insights from this research will also be incorporated into the curriculum of a graduate-level transportation safety course at the University of Tennessee, Knoxville. Collectively, these outputs will enhance the analytical capabilities of the transportation safety community and lay the groundwork for more informed crash reporting and safer transit environments for pedestrians and bicyclists.

Outcomes/Impacts

This research will substantially improve the understanding of transit bus collisions involving pedestrians and bicyclists by uncovering complex crash patterns and identifying unique crash types and rare edge-case scenarios that are often missed in conventional databases. By transforming unstructured narrative data into actionable insights, the study will enhance the quality and completeness of crash reporting systems, leading to more targeted safety interventions. The enriched crash database and analytical findings will guide transportation agencies in developing evidence-based operational strategies, infrastructure improvements, and driver training programs specifically tailored to high-risk conditions involving buses and pedestrians or bicyclists. Over time, these insights are expected to reduce the severity and frequency of crashes and enhance system reliability. The study’s findings will also inform policymakers and regulatory bodies on the importance of integrating narrative elements into national safety datasets, supporting more responsive and context-aware regulations. Furthermore, the application of advanced AI and NLP methods will demonstrate how innovative analytics can be successfully integrated into safety research, inspiring further adoption across public agencies and academic communities. Ultimately, this work will provide transportation professionals with data-driven tools to design safer multimodal systems and reduce crash-related costs and risks in urban transit environments.

Dates

12/1/2025 to 11/30/2026


Universities

University of Tennessee Knoxville

 

Principal Investigator

Asad J. Khattak

akhattak@utk.edu

https://orcid.org/0000-0002-0790-7794

 

Candace Brakewood

cbrakewo@utk.edu

https://orcid.org/0000-0003-2769-7808

 

Project Partners

The University of Tennessee, Knoxville

Center for Transportation Research

 

Research Project Funding

Federal: $60,039

Non-Federal: $45,632

 

Contract Number

69A3552348336

 

Project Number

25UTK04

 

Research Priority

Promoting Safety

bottom of page