Evaluation and Comparison of Classification Models for Predicting Crash Severity

Authors

  • Rossi Passarella
    Affiliation
    Department of Computer Engineering, Faculty of Computer Science, Universitas Sriwijaya, Jalan Palembang-Prabumulih, KM 32 Inderalaya, 30662 Kabupaten Ogan Ilir, South Sumatera, Indonesia
  • Isbatudinia Isbatudinia
    Affiliation
    Department of Computer Engineering, Faculty of Computer Science, Universitas Sriwijaya, Jalan Palembang-Prabumulih, KM 32 Inderalaya, 30662 Kabupaten Ogan Ilir, South Sumatera, Indonesia
  • Mastura Diana Marieska
    Affiliation
    Department of Informatics, Faculty of Computer Science, Universitas Sriwijaya, Jalan Palembang-Prabumulih, KM 32 Inderalaya, 30662 Kabupaten Ogan Ilir, South Sumatera, Indonesia
  • Dedy Kurniawan
    Affiliation
    Department of Information System, Faculty of Computer Science, Universitas Sriwijaya, Jalan Palembang-Prabumulih, KM 32 Inderalaya, 30662 Kabupaten Ogan Ilir, South Sumatera, Indonesia
  • Romi Fadillah Rahmat
    Affiliation
    Department of Information Technology, Faculty of Computer Science and Information Technology, Universitas Sumatera Utara, 9 Universitas Road, 20155 Medan, North Sumatra, Indonesia
  • Harumi Veny
    Affiliation
    Faculty of Chemical Engineering, MARA University of Technology, Jalan Ilmu 1/1,40450 Shah Alam, Selangor Darul Ehsan, Malaysia
https://doi.org/10.3311/PPtr.41673

Abstract

Traffic accidents pose a significant challenge to modern transportation systems, impacting both safety and infrastructure. This study aimed to evaluate and compare the effectiveness of seven classification models—Random Forest, XGBoost, Logistic Regression, Decision Tree, Gaussian Naive Bayes, Support Vector Machine (SVM), and K-Nearest Neighbors' (KNN)—in predicting the severity of traffic accidents. A pre-processed dataset from the Road Safety Data in the UK, involving variable selection, outlier treatment, and data normalization, was utilized. The models underwent parameter tuning using Bayesian optimization, and their performance was assessed based on accuracy, precision, recall, and F1-score. The results indicated that Random Forest achieved the highest accuracy of 99.3% on unseen data, closely followed by XGBoost at 99%. Notably, the Random Forest model in this study outperformed a similar study that used XGBoost. The statistical findings emphasize the benefit of employing a comprehensive set of accident-related variables, advanced pre-processing techniques, and optimized hyper parameter tuning for developing reliable crash severity prediction models. Based on the findings, the Random Forest model is strongly recommended for practical implementation to enhance road safety.

Keywords:

traffic accidents, severity prediction, machine learning, classification models, random forest, road safety

Citation data from Crossref and Scopus

Published Online

2026-03-26

How to Cite

Passarella, R., Isbatudinia, I., Diana Marieska, M., Kurniawan, D., Fadillah Rahmat, R., Veny, H. (2026) “Evaluation and Comparison of Classification Models for Predicting Crash Severity”, Periodica Polytechnica Transportation Engineering. https://doi.org/10.3311/PPtr.41673

Issue

Section

Articles