The Use of Machine Learning Models and SHAP Interaction Values to Predict the Soil Swelling Index
Abstract
Predicting the soil swelling index (CS) is crucial for geotechnical engineer to ensure the stability of civil engineering conceptions. Recently, ML models has sparked great interest from researchers in predicting the soil swelling index. However, due to the black-box nature of ML models, their prediction capabilities are still uninterpretable. This study aims to predict the soil swelling index using ML algorithms and interpret predictions. First, it employs the prediction capability of the Gaussian process regression (GPR) algorithm and compares it to the artificial neural network (ANN) for prediction the soil swelling index. Second, the SHAP algorithm as one recent explainable artificial intelligence (XAI) models is applied to interpret the predictions of the complex GPR and ANN models. The compiled experimental database covers 362 clayey samples gathered from different sites located in Northern Algeria. The modeling involved six input features, including the liquid limit (LL), plastic limit (PL), plasticity index (PI), water content (ωn), dry density (γd), and void ratio (e) to predict the soil swelling index. The findings based on statistical metrics showed a good performance of GPR with R2 = 0.78 and of ANN with R2 = 0.79. Comparative study based on Wilcoxon signed- rank test and sign test indicated that the ANN outperform better than GPR. Based on the interpretations obtained by SHAP algorithm, it is observed that the liquid limit (LL) and plastic limit (PL) are the two main input features that influence the CS, indicating, the higher content of LL and PL increase the model's output.