JI Jun-zhong, BEI Fei, WU Chen-sheng, CHAI Ying, SONG Chen. Influence of Part-of-speeches on the Network Topic Detection of Chinese News and Micro-blog[J]. Journal of Beijing University of Technology, 2015, 41(4): 526-533. DOI: 10.11936/bjutxb2014090078
Citation:
JI Jun-zhong, BEI Fei, WU Chen-sheng, CHAI Ying, SONG Chen. Influence of Part-of-speeches on the Network Topic Detection of Chinese News and Micro-blog[J]. Journal of Beijing University of Technology, 2015, 41(4): 526-533. DOI: 10.11936/bjutxb2014090078
JI Jun-zhong, BEI Fei, WU Chen-sheng, CHAI Ying, SONG Chen. Influence of Part-of-speeches on the Network Topic Detection of Chinese News and Micro-blog[J]. Journal of Beijing University of Technology, 2015, 41(4): 526-533. DOI: 10.11936/bjutxb2014090078
Citation:
JI Jun-zhong, BEI Fei, WU Chen-sheng, CHAI Ying, SONG Chen. Influence of Part-of-speeches on the Network Topic Detection of Chinese News and Micro-blog[J]. Journal of Beijing University of Technology, 2015, 41(4): 526-533. DOI: 10.11936/bjutxb2014090078
1. Beijing Municipal Key Laboratory of Multimedia and Intelligent Software Technology,College of Computer Science and Technology,Beijing University of Technology, Beijing 100124, China;
2. Beijing Institute Science and Technology Information, Beijing 100048, China
Based on two representative corpus of news and micro-blog,an experimental study was conducted in the paper,in which the purpose is to find the effect and influence of different part-ofspeeches and their combinations on the network topic detection. The research shows that if a single partof-speech as a characteristic is chosen,nouns can get the best results,and named entities can greatly reduce the dimensions of clustering characteristics while keeping the accuracy. If the combination of partof-speeches as a characteristic is chosen, nouns or named entities, numerals, the time phrases,adjectives and quantifiers can promote the accuracy of news network topic detection while nouns or named entities,adjectives,quantifiers,numerals,and the combination of special symbols and sites can achieve good results on micro-blog corpus.