Abstract:
In order to improve the training speed of classifiers without losing their accuracy, three classifying algorithms based on information gain of features are provided in this work. They are IG-C1, IG-C2 and IG-C, which classifies unlabeled text according to features' weight generated in feature selection phase. All these approaches have two characteristics: lower time complexity and simpler implementation. The performance comparison between these algorithms and Naive Bayes, Vector Space Model using retuers 21578 and 20 newsgroup data sets, shows that IG-C algorithm is best one.