Classifying Text Corpus Based on Information Gain Weight of Feature
-
Graphical Abstract
-
Abstract
In order to improve the training speed of classifiers without losing their accuracy, three classifying algorithms based on information gain of features are provided in this work. They are IG-C1, IG-C2 and IG-C, which classifies unlabeled text according to features' weight generated in feature selection phase. All these approaches have two characteristics: lower time complexity and simpler implementation. The performance comparison between these algorithms and Naive Bayes, Vector Space Model using retuers 21578 and 20 newsgroup data sets, shows that IG-C algorithm is best one.
-
-