基于改进随机森林的城市污水处理过程运行数据清洗方法

    Data Cleaning Method for Municipal Wastewater Treatment Based on Improved Random Forest

    • 摘要: 针对城市污水处理运行过程中出现混合异常数据的问题,提出了一种基于改进型随机森林的数据清洗方法.首先,设计了一个孤立森林的异常数据识别模型,识别数据中的离群值.其次,建立了一种改进型随机森林回归模型,提高随机森林对混合类型异常数据的适应能力,并对数据趋势进行拟合预测.最后,用改进的随机森林数据清洗方法对剔除混合异常数据后的缺失数据进行补偿,实现对污水数据的清洗.实际数据测试结果表明,该方法提高了混合类型缺失数据补偿的准确性.

       

      Abstract: To reduce the impact of different types of abnormal data in the municipal wastewater treatment processes, a data cleaning method was proposed in this paper based on improved random forest. First, an anomaly detection model for isolated forest was designed to detect the outlier data. Second, an improved random forest regression model was used to predict the missing data, which improved the random forest to adapt to the mixed type missing data. Third, the detected abnormal data was eliminated. Finally, the improved random forest was used to predict and compensate the missing data of mixed types. This cleaning method was tested through the municipal wastewater treatment data. Results show that the method improves the accuracy of compensation for mixed type missing data.

       

    /

    返回文章
    返回