一种基于集成学习与类指示器的文本分类方法

    An Ensemble Learning and Category Indicator Based Text Categorizing Method

    • 摘要: 提出了一种基于集成学习机制与类指示器的文本分类方法.该方法利用AdaBoost.MH算法框架, 在每一轮次中, 自适应地计算类指示度, 通过加权组合所有成员类指示度, 获得对理想类指示度的一种逼近.利用最终的类指示度所得到的分类器不仅简单、易于更新, 而且泛化能力强.在标准语料集TanCorp-12上的实验表明, 该方法适用于对分类效率要求较高的实时应用, 同时可以利用集成学习进行某些知识的精确学习, 并将这些知识用于弱分类器, 从而实现简单高效的分类.

       

      Abstract: As it is well known that the motivation of ensemble learning is to boost a strong classifier with high generalization ability from a weak classifier.However, the achievement of generalization ability is often at great cost of complexity and intense computation.In this paper an ensemble learning and category indicator based categorizing method is proposed and Adaboost.MH based mechanism is developed to adaptively compute the category indicating function at every step.Then all individual category indicating functions are combined with weight and an approximation to the expected category indicating function is obtained.Based on the combined category indicating function, a classifier, which has low computational cost, flexibility in updating with new features and suitable for real-time applications has been obtained.Furthermore it is proved that the proposed method is equivalence to ensemble classifier and thereby it has high generalization ability.Experiments on the corpus of TanCorp-12 show that the proposed method can achieve good performance in text categorizing tasks and outperform many text categorizing methods.

       

    /

    返回文章
    返回