基于组合凸线性感知器的文本分类模型

    Text Classification Model Based on Multiconlitron

    • 摘要: 针对文本分类问题,从分片线性学习的角度出发,提出了一种文本分类的组合凸线性感知器模型.首先,对文本样本集进行预处理,包括特征选择、特征项赋权等;然后,分别利用生长支持组合凸线性感知器算法(growing support multiconlitron algorithm,GSMA)和支持组合凸线性感知器算法(support multiconlitron algorithm,SMA)构造组合凸线性感知器,对样本集进行分类.该模型基于支持向量机的最大间隔思想,通过集成线性分类器,实现了对2类数据的划分,具有计算简单、适应能力强的优点.在标准文本数据集上的实验结果表明:该模型所构造的分类器具有良好的文本分类性能,与其他典型文本分类方法的对比也说明了该方法的有效性.

       

      Abstract: To deal with the problem of text classification, a text categorization method was proposed based on multiconlitron from the perspective of piecewise learning. First,text sample preprocessing including feature selection and feature weighting was performed. Then, the multiconlitron was constructed by using growing support multiconlitron algorithm (GSMA) and support multiconlitron algorithm (SMA) respectively for text classification. Inspired by the idea of maximum interval of support vector machine, the classification of two kinds of data by integrating the linear classifier was achieved by this model, which had the advantages of small computation cost and strong adaptive ability. Experiments on standard text data sets show that the proposed method has a good performance on text classification and the comparison results with some other typical text classification methods also verifies the effectiveness of the proposed method.

       

    /

    返回文章
    返回