Abstract:
Based on two representative corpus of news and micro-blog,an experimental study was conducted in the paper,in which the purpose is to find the effect and influence of different part-ofspeeches and their combinations on the network topic detection. The research shows that if a single partof-speech as a characteristic is chosen,nouns can get the best results,and named entities can greatly reduce the dimensions of clustering characteristics while keeping the accuracy. If the combination of partof-speeches as a characteristic is chosen, nouns or named entities, numerals, the time phrases,adjectives and quantifiers can promote the accuracy of news network topic detection while nouns or named entities,adjectives,quantifiers,numerals,and the combination of special symbols and sites can achieve good results on micro-blog corpus.