Semi-supervised Classification Using Feature Distribution
-
Graphical Abstract
-
Abstract
It is crucial for semi-supervised learning (SSL) to cut down the dimension of the feature space through feature selection.The popular information gain (IG) selection method,which inclines to high frequency words,always ignores similarity of classes.Thus,the classification performance of characteristics IG is unstable.This paper puts forward a feature distribution selection to help IG retain features possessing high categories discriminative information.To solve the inherent efficiency problem of the expectation maximization (EM) algorithm,unlabeled documents that possess maximum posterior category probability are transferred from unlabeled collection to labeled collection.The iteration number of the improved EM is obviously reduced.Finally,experimental evaluation on Reuter-21578 and Epinion.com with two different data sets shows that the semi-supervised learning method using feature distribution obtains very effective performance for micro average F1 criterion.
-
-