基于可解释随机森林的多类别交通事故风险研判

    Multi-class Traffic Accident Risk Assessment Based on Interpretable Random Forest

    • 摘要: 为了探究影响因素对不同事故类别的影响程度,考虑道路条件、天气状况和交通流状态三方面因素,采用网格搜索方法确定超参数的最优参数,建立了基于随机森林的道路交通事故风险研判模型,研判是车车碰撞还是人车碰撞事故风险,是受伤事故还是死亡事故风险。为了量化影响因素对事故风险研判结果的贡献,提出基于SHAP (shapley additive explanations)的交通事故风险致因解释方法。利用北京市京开高速和南六环等路段的事故数据对构建的模型方法进行参数标定和测试,并与传统随机森林、逻辑回归和支持向量机(support vector machines,SVM)进行了对比。研究结果表明:构建的模型在人车碰撞事故风险研判上表现最优,有较高的测试精度,召回率(recall,REC)相较传统随机森林、逻辑回归模型和支持向量机模型分别有30%、40%和40%的提升;在总体交通事故和受伤事故风险研判上表现次之,相较对比模型提升约20%、10%和10%;在车车碰撞事故上相较逻辑回归模型有30%的提升;而在死亡事故上无显著提升。影响因素中,当前车道车头间距、时间占用率、降水等对总体事故风险研判分别有约30%、30%和10%的影响;而在各类细分事故上降水才是主导因素,其次才是车道车头间距、时间占用率因素。

       

      Abstract: To explore the influence degree of influencing factors on different accident categories, three factors including road conditions, weather conditions and traffic flow state were considered, grid search method was used to determine the optimal parameters of hyperparameters, an improved road traffic accident risk assessment model based on random forest was established to investigate whether there is an accident, whether it is a vehicle-vehicle collision or a human-vehicle collision, whether it is a injury accident or a death accident. To quantify the contribution of influencing factors to the results of accident risk evaluation, a method of explaining the cause of traffic accident risk based on SHAP was proposed. The accident data of Beijing Jingkai Expressway and South Sixth Ring Road were used to calibrate and test the parameters of the proposed model, and the results were compared with traditional random forest, logistic regression and support vector machine (SVM). Results show that the constructed model has the best performance in the study of human-vehicle collision risk, and the recall rate is improved by 30% , 40% and 40% , respectively, compared with the traditional random forest, logistic regression model and support vector machine model, with high test accuracy. In total traffic accidents and injury accidents, the performance of the model on accident risk assessment is ranked secondly, and the improvement is about 20% , 10% and 10% compared with the baseline models. Compared with logistic regression model, there is a 30% increase in vehicle-vehicle collision accidents. There was no significant increase in death accidents. In terms of risk causes, the model considers that the current lane space headway, time occupancy and precipitation have relatively 30% , 30% and 10% effects on the overall accident risk. In the subdivision accidents, precipitation is the leading factor, followed by the current lane space headway and time occupancy.

       

    /

    返回文章
    返回