Evaluation and Comparison of Classification Models for Predicting Crash Severity

Rossi Passarella; Isbatudinia Isbatudinia; Mastura Diana Marieska; Dedy Kurniawan; Romi Fadillah Rahmat; Harumi  Veny

doi:10.3311/PPtr.41673

Authors

Rossi Passarella

Affiliation

Department of Computer Engineering, Faculty of Computer Science, Universitas Sriwijaya, Jalan Palembang-Prabumulih, KM 32 Inderalaya, 30662 Kabupaten Ogan Ilir, South Sumatera, Indonesia
Isbatudinia Isbatudinia

Affiliation

Department of Computer Engineering, Faculty of Computer Science, Universitas Sriwijaya, Jalan Palembang-Prabumulih, KM 32 Inderalaya, 30662 Kabupaten Ogan Ilir, South Sumatera, Indonesia
Mastura Diana Marieska

Affiliation

Department of Informatics, Faculty of Computer Science, Universitas Sriwijaya, Jalan Palembang-Prabumulih, KM 32 Inderalaya, 30662 Kabupaten Ogan Ilir, South Sumatera, Indonesia
Dedy Kurniawan

Affiliation

Department of Information System, Faculty of Computer Science, Universitas Sriwijaya, Jalan Palembang-Prabumulih, KM 32 Inderalaya, 30662 Kabupaten Ogan Ilir, South Sumatera, Indonesia
Romi Fadillah Rahmat

Affiliation

Department of Information Technology, Faculty of Computer Science and Information Technology, Universitas Sumatera Utara, 9 Universitas Road, 20155 Medan, North Sumatra, Indonesia
Harumi Veny

Affiliation

Faculty of Chemical Engineering, MARA University of Technology, Jalan Ilmu 1/1,40450 Shah Alam, Selangor Darul Ehsan, Malaysia

Abstract

Traffic accidents pose a significant challenge to modern transportation systems, impacting both safety and infrastructure. This study aimed to evaluate and compare the effectiveness of seven classification models—Random Forest, XGBoost, Logistic Regression, Decision Tree, Gaussian Naive Bayes, Support Vector Machine (SVM), and K-Nearest Neighbors' (KNN)—in predicting the severity of traffic accidents. A pre-processed dataset from the Road Safety Data in the UK, involving variable selection, outlier treatment, and data normalization, was utilized. The models underwent parameter tuning using Bayesian optimization, and their performance was assessed based on accuracy, precision, recall, and F1-score. The results indicated that Random Forest achieved the highest accuracy of 99.3% on unseen data, closely followed by XGBoost at 99%. Notably, the Random Forest model in this study outperformed a similar study that used XGBoost. The statistical findings emphasize the benefit of employing a comprehensive set of accident-related variables, advanced pre-processing techniques, and optimized hyper parameter tuning for developing reliable crash severity prediction models. Based on the findings, the Random Forest model is strongly recommended for practical implementation to enhance road safety.

Keywords:

traffic accidents, severity prediction, machine learning, classification models, random forest, road safety

Citation data from Crossref and Scopus

Evaluation and Comparison of Classification Models for Predicting Crash Severity

Authors

Abstract

Keywords:

Citation data from Crossref and Scopus

Published Online

How to Cite

Issue

Section

Make a Submission