top of page

Mining Police Crash Report Narratives: A Natural Language Processing Approach to Identify Bus-Stop Related Crashes

Project Description

Transit riders are a particularly vulnerable population, as they often walk to and from bus stops, wait in areas where multiple transportation modes interact, and cross the road at least once during a round trip. Prior studies have identified a significant relationship between transit elements (i.e., stops, corridors, and ridership levels) and pedestrian crash locations. National databases like the Fatality Analysis Reporting System (FARS) reported 196 transit bus stop-related pedestrian crashes (2014–2022), while the Crash Report Sampling System (CRSS) reported 93 (2016–2022). This small sample appears to contradict rising pedestrian crash trends in the U.S., suggesting potential underreporting due to inconsistent definitions, lack of standardized fields for transit bus stop-related crashes, or variation in how crashes are coded. To address this gap, artificial intelligence methods like natural language processing (NLP), specifically named entity recognition (NER), can extract transit bus stop-related details from police crash report narratives. NER will be applied to Minnesota and Tennessee datasets to identify such crashes. The model will be trained, validated, and tested for generalizability using metrics like precision and recall. Results will be cross-analyzed with national databases (FARS, CRSS) to test the hypothesis that transit bus stop-related crashes are underreported. Misclassified cases will also be analyzed to identify patterns. While NER has been widely used to improve crash data quality, it has not been applied to identify transit bus stop-related crashes specifically. This approach could streamline data collection, reduce manual review time, and enhance the accuracy of pedestrian crash data. By addressing a critical gap in crash reporting, this work will improve the ability to study risks faced by transit riders and inform safety improvements at bus stops.

Outputs

A key output of this proposed project is a domain-specific NER algorithm that state agencies can use to accurately classify transit bus stop-related crashes based on police-reported crash narratives. The code will be developed using open-source software (such as in the statistical program R) and will be posted online in forums such as GitHub including documentation about how to run the code. This will help to enable technology transfer to other interested users – such as state DOTs, transit agencies, consultants, and other practitioners.

Outcomes/Impacts

This project will deliver a tool to automatically identify transit bus stop-related crashes from police narratives, enabling more consistent crash data collection. Improved documentation will support research on risks faced by transit passengers walking to, from, or waiting at bus stops. These insights can guide policymakers and practitioners in selecting countermeasures to enhance pedestrian safety and reduce serious traffic crashes in the United States.

Dates

12/1/2025 to 11/30/2026

 

Universities

University of Tennessee Knoxville

 

Principal Investigator

Candace Brakewood

cbrakewo@utk.edu

https://orcid.org/0000-0003-2769-7808

 

Project Partners

None

 

Research Project Funding

Federal: $84,983

Non-Federal: $19,206

 

Contract Number

69A3552348336

 

Project Number

25UTK05​

 

Research Priority

Promoting Safety

bottom of page