Evaluation and Comparison of Classification Models for Predicting Crash Severity
Abstract
Traffic accidents pose a significant challenge to modern transportation systems, impacting both safety and infrastructure. This study aimed to evaluate and compare the effectiveness of seven classification models—Random Forest, XGBoost, Logistic Regression, Decision Tree, Gaussian Naive Bayes, Support Vector Machine (SVM), and K-Nearest Neighbors' (KNN)—in predicting the severity of traffic accidents. A pre-processed dataset from the Road Safety Data in the UK, involving variable selection, outlier treatment, and data normalization, was utilized. The models underwent parameter tuning using Bayesian optimization, and their performance was assessed based on accuracy, precision, recall, and F1-score. The results indicated that Random Forest achieved the highest accuracy of 99.3% on unseen data, closely followed by XGBoost at 99%. Notably, the Random Forest model in this study outperformed a similar study that used XGBoost. The statistical findings emphasize the benefit of employing a comprehensive set of accident-related variables, advanced pre-processing techniques, and optimized hyper parameter tuning for developing reliable crash severity prediction models. Based on the findings, the Random Forest model is strongly recommended for practical implementation to enhance road safety.

