Abstract:
There are many algorithms based on data distribution to effectively solve the problem of semisupervised text categorization.However,they may perform badly when the labeled data distribution is different from the unlabeled data.To solve the problem,semi-supervised text classification algorithm based on feature mapping was proposed.First,three sets of features were selected respectively from labeled data,unlabeled data and test data by using different feature selection methods,and their values were initialize.Second,three feature mapping functions were studied,and the weight of each feature was recalculated by them.Finally,the EM algorithm was employ to classify the text data.Experiments of standard data sets show that the proposed algorithm is effective.