Abstract:
To address the challenges, which are the limited number of domain entitiesandtherelative lack ofcorpus samples, for entity recognition in the fine-grained domain, an unsupervised method for narrow-domain entity recognition was proposed by integrating word frequency and context information.Firstly, fusing the word frequency and context information, the new relevance hypothesis with term-corpus was designed, and the probability of hypothesis was calculated by using log likelihood ratio to obtain domain discrimination degree of candidate entities. Based on the relative domain ratio of head-word of candidate entities in the corpus, the domain dependence function was constructed to recognize the domain tendency of the candidate entities; Finally, combining the domain discrimination degree and the domain dependence, the domain relevance measurement of the candidate entities was calculated, and the candidate entities whose domain relevance measurement were greater than the threshold were selected as the narrow domain entities. The experimental results show that the proposed method can improve the accuracy of narrow-domainentity recognition and reduce manual intervention in the recognition process.