Abstract:
To reduce the impact of different types of abnormal data in the municipal wastewater treatment processes, a data cleaning method was proposed in this paper based on improved random forest. First, an anomaly detection model for isolated forest was designed to detect the outlier data. Second, an improved random forest regression model was used to predict the missing data, which improved the random forest to adapt to the mixed type missing data. Third, the detected abnormal data was eliminated. Finally, the improved random forest was used to predict and compensate the missing data of mixed types. This cleaning method was tested through the municipal wastewater treatment data. Results show that the method improves the accuracy of compensation for mixed type missing data.