深度集成森林回归建模方法及应用

    Modeling Method of Deep Ensemble Forest Regression With Its Application

    • 摘要: 复杂工业过程因涉及多种物理/化学反应,其质量指标或环保指标等难测参数的精确数学模型难以构建.常用的基于神经网络的数据驱动建模方法存在可解释性差、样本需求量大等缺点.针对上述问题,提出了一种非神经网络模式的深度集成森林回归(deep ensemble forest regression,DEFR)建模方法.首先,基于样本空间和特征空间的随机采样策略获得训练子集后构建T个基于决策树(decision trees,DT)的子森林模型,将采用K最近邻(K-Čnearest neighbor,KNN)准则选取的层回归向量与原始特征组合获得的增强层回归向量作为输入层森林模型的输出;然后,采用相同方式构建包含若干预设层数的中间层森林模型;最后,基于上层增强层回归向量构建输出层的子森林模型,通过对其T个输出值的加权获得DEFR模型的预测值.采用加州大学欧文分校(University of California Irvine,UCI)平台混凝土抗压强度数据和城市固废焚烧过程的二口恶英排放质量浓度数据仿真验证了所提方法的有效性.

       

      Abstract: It is difficult to build accurate mathematical models of difficult-to-measure parameters such as quality or environmental indices of complex industrial processes due to various physical/chemical reactions. Neural network-based data-driven modeling method has shortcomings such as poor interpretability and large sample requirements. To solve these problems, a modeling method of deep ensemble forest regression (DEFR) was proposed. A random sampling strategy based on sample space and feature space was first used to obtain training subsets. Then, they were used to construct T sub-forest models based on decision trees (DT). Furthermore, the layer regression vector was selected by using the K-nearest neighbor (KNN) criterion, which was combined with raw features to obtain the augmented layer regression vector. Thus, the output of the input layer forest model was obtained. In addition, the middle layer forest model was constructed in the same way until the model depth reached the preset number of layers. Finally, the sub-models of the output layer forest were constructed based on the augmented layer regression vector of the former layer, and the prediction value of DEFR model was obtained by weighing the outputs of T sub-forest models. The concrete compressive strength data of university of california irvine (UCI) platform and the dioxin emission mass concentration data of actual municipal solid waste incineration process were used to verify the effectiveness of the proposed method.

       

    /

    返回文章
    返回