融合领域相关度与上下文信息的无监督窄域实体识别方法

    Unsupervised Method for Narrow-domain Entity Recognition by Fusing Domain Relevance Measurement and Word Features of Context

    • 摘要: 针对细分领域实体识别所面临的实体规模受限、语料样本相对缺乏的挑战,提出了一种融合领域相关度与上下文信息的、无监督的窄域实体识别方法.首先,融合词频及上下文信息,设计了术语-语料库相关性假设,并利用对数似然比计算假设的可能性,获得候选实体的领域区分度;在此基础上,基于候选实体的中心词在语料库中的相对领域占比,构建领域依存度函数,识别候选实体的领域倾向性;最后,绑定领域区分度和领域依存度,计算候选实体的领域相关度,选择领域相关度大于阈值的候选实体作为被识别的窄域实体.实验结果表明:该方法在减少识别过程人工干预的同时能有效提升窄域实体识别的准确率.

       

      Abstract: To address the challenges, which are the limited number of domain entitiesandtherelative lack ofcorpus samples, for entity recognition in the fine-grained domain, an unsupervised method for narrow-domain entity recognition was proposed by integrating word frequency and context information.Firstly, fusing the word frequency and context information, the new relevance hypothesis with term-corpus was designed, and the probability of hypothesis was calculated by using log likelihood ratio to obtain domain discrimination degree of candidate entities. Based on the relative domain ratio of head-word of candidate entities in the corpus, the domain dependence function was constructed to recognize the domain tendency of the candidate entities; Finally, combining the domain discrimination degree and the domain dependence, the domain relevance measurement of the candidate entities was calculated, and the candidate entities whose domain relevance measurement were greater than the threshold were selected as the narrow domain entities. The experimental results show that the proposed method can improve the accuracy of narrow-domainentity recognition and reduce manual intervention in the recognition process.

       

    /

    返回文章
    返回