Abstract:
It is difficult to build accurate mathematical models of difficult-to-measure parameters such as quality or environmental indices of complex industrial processes due to various physical/chemical reactions. Neural network-based data-driven modeling method has shortcomings such as poor interpretability and large sample requirements. To solve these problems, a modeling method of deep ensemble forest regression (DEFR) was proposed. A random sampling strategy based on sample space and feature space was first used to obtain training subsets. Then, they were used to construct
T sub-forest models based on decision trees (DT). Furthermore, the layer regression vector was selected by using the
K-nearest neighbor (KNN) criterion, which was combined with raw features to obtain the augmented layer regression vector. Thus, the output of the input layer forest model was obtained. In addition, the middle layer forest model was constructed in the same way until the model depth reached the preset number of layers. Finally, the sub-models of the output layer forest were constructed based on the augmented layer regression vector of the former layer, and the prediction value of DEFR model was obtained by weighing the outputs of
T sub-forest models. The concrete compressive strength data of university of california irvine (UCI) platform and the dioxin emission mass concentration data of actual municipal solid waste incineration process were used to verify the effectiveness of the proposed method.