Adaptive Method for Chinese New Word Identification Based on Multi-features
-
Graphical Abstract
-
Abstract
To improve the performance of new word identification in Chinese word segment, the authors propose an adaptive method for Chinese new word identification based on multi-feature method for offline corpus processing, in which many features, including context-entropy, likelihood ratios, frequency ratio against background corpus and boundary-verification with basic segmentation are introduced to evaluate the candidate words. And all of the features are integrated into an adaptive SVM classifier. Candidate new words are extracted efficiently on PAT-Array with much less space overhead and arbitrary n-gram words can be identified by the method. The results show that the method can run fast upon new word identification and save much memory.
-
-