Skip to main content

Complementary Data Sources for Road Transport Statistics: Use of Machine Learning in Providing Additional Insights into Road Crashes

Symbol: 
E/ESCWA/CL4.SIT/2020/TP.16
Issued in: 
2021

The purpose of this technical paper was first initiated based on a recent transport data questionnaire that was circulated by ESCWA among national statistical offices of the Arab countries. ESCWA launched a pilot project on the use of complementary data sources on car crashes. The initial idea was to check whether data provided by NSO’s can be combined or complemented with other sources of data, including big data, to come up with a better understanding of causes of car crashes and to showcase to member countries readily available solutions at affordable means to improve their data collection and analysis of such data thereof. Data sources from both private and public sector were investigated, and several key challenges were faced. These challenges include siloed data, lack of regulatory frameworks for data sharing, lack of transparency between governmental agencies, and general sensitivities towards sharing data.

As a result, open data from the United Kingdom on car crashes were obtained, and street data from OpenStreetMap (OSM) were added as a complementary data source to showcase the possibilities data analytics and other new data sources can provide. Machine learning was employed, and the results were inputted into decision trees, gradient boosted trees and random forest to predict a crash injury severity. Preliminary results were obtained on possible relationships in this respect. Further work is needed to achieve more reliable results and to bring the experimental nature of machine learning more in line with official statistics, especially when the needed data becomes more accessible. The exercise employed in the paper highlights the possible benefits of using machine learning algorithms to understand car crashes. These benefits include the abilities to see the logic of the prediction from the decision tree and to see which features the models consider important.