• 综合性科技类中文核心期刊
    • 中国科技论文统计源期刊
    • 中国科学引文数据库来源期刊
    • 中国学术期刊文摘数据库(核心版)来源期刊
    • 中国学术期刊综合评价数据库来源期刊

基于半CRF模型的百科全书文本段落划分

许勇, 宋柔

许勇, 宋柔. 基于半CRF模型的百科全书文本段落划分[J]. 北京工业大学学报, 2008, 34(2): 204-210.
引用本文: 许勇, 宋柔. 基于半CRF模型的百科全书文本段落划分[J]. 北京工业大学学报, 2008, 34(2): 204-210.
XU Yong, SONG Rou. A Semi-Markov CRF Model Approach to Encyclopedia Text Topic Segmentation[J]. Journal of Beijing University of Technology, 2008, 34(2): 204-210.
Citation: XU Yong, SONG Rou. A Semi-Markov CRF Model Approach to Encyclopedia Text Topic Segmentation[J]. Journal of Beijing University of Technology, 2008, 34(2): 204-210.

基于半CRF模型的百科全书文本段落划分

基金项目: 

国家自然科学基金(60272055)

国家“八六三”计划资助项目(2001AA114111)

详细信息
    作者简介:

    许勇(1975-),男,吉林延吉人,博士研究生.

  • 中图分类号: TP391

A Semi-Markov CRF Model Approach to Encyclopedia Text Topic Segmentation

  • 摘要: 介绍了基于半条件随机域(semi-Markov conditional random fields,简称semi-CRFs)模型的百科全书文本段落划分方法.为了克服单纯的HMM模型和CRF模型的段落类型重复问题,以经过整理的HMM模型状态的后验分布为基本依据,使用了基于词汇语义本体知识库的段落开始特征以及针对特定段落类型的提示性特征来进一步适应目标文本的特点.实验结果表明,该划分方法可以综合利用各种不同类型的信息,比较适合百科全书文本的段落结构,可以取得比单纯的HMM模型和CRF模型更好的性能.
    Abstract: This paper introduced the semi-markov Conditional Random Fields(semi-CRFs)model based method for Chinese Encyclopedia text topic segmentation.The authors adopted HMM model state posterior as the basic segmentation clue which was adjusted to each text instance to overcome the topic duplication problem of fully connected state HMM model and CRF model.The authors also used several segment level word semantic features derived from domain thesaurus,and additional topic specific cue phrases to make the method more adapted to target domain.The experiment result showed that this method was suitable for Chinese Encyclopedia text topic structure and achieved better performance than HMM model and CRF model.
  • [1]

    REYNARJ C.Topic segmentation:algorithms and applications[D].Philadelphia,USA:University of Pennsylvania, 1998.130-151.

    [2]

    MARTI A H.Multi-paragraph segmentation of expository text[C]//Proceedings of the 32nd Annual Meeting of the Associa- tion for Computational Linguistics.Las Cruces,New Mexico:Association for Computational Linguistics,1994:9-16.

    [3]

    CHRISTOPHER D M,HINRICH S.Foundations of statistical natural language processing[M].Cambridge,Massachusetts: MIT Press,1999:539-544.

    [4]

    YAMRON J,CARP I,GILLICK L,et al.A hidden markov model approach to text segmentation and event tracking[C]// Proceedings of the IEEE ICASSP.Seattle,Washington:Institute of Electrical and Electronics Engineers Signal Processing Society,1998:333-336.

    [5]

    MCCOLLUM A,FREITAG D,PEREIRA F.Maximum entropy markov models for information extraction and segmenta- tion[C]///Proceedings of ICML 2000.Stanford,California:Morgan Kaufmann Publishers lnc,2000:591-598.

    [6]

    JOHN L,ANDREW M,FERNANDO P.Conditional random fields:probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the International Conference on Machine Learning(ICML-2001).MA:Morgan Kauf- mann Publishers Inc,2001:282-289.

    [7]

    FEI S,FERNANDO P.Shallow parsing with conditional random fields[C]//Proceedings of HLT-NAACL.Edmonton, Canada:Association for Computational Linguistics,2003:134-141.

    [8]

    SUNITA S,WILLIAM W C.Semi-markov conditional random fields for information extraction[C/OL]//Proceedings of the Eighteenth Annual Conference on Neural Information Processing Systems.Vancouver:MIT Press,2004.http://citeseer. ist.psu.edu/653054,html

计量
  • 文章访问数:  16
  • HTML全文浏览量:  0
  • PDF下载量:  8
  • 被引次数: 0
出版历程
  • 收稿日期:  2006-11-09
  • 网络出版日期:  2023-01-10

目录

    /

    返回文章
    返回