• 综合性科技类中文核心期刊
    • 中国科技论文统计源期刊
    • 中国科学引文数据库来源期刊
    • 中国学术期刊文摘数据库(核心版)来源期刊
    • 中国学术期刊综合评价数据库来源期刊
JIANG Zong-li, LU Guo-xiang. MatchLink:A Focused Crawling Method[J]. Journal of Beijing University of Technology, 2007, 33(11): 1227-1232.
Citation: JIANG Zong-li, LU Guo-xiang. MatchLink:A Focused Crawling Method[J]. Journal of Beijing University of Technology, 2007, 33(11): 1227-1232.

MatchLink:A Focused Crawling Method

More Information
  • Received Date: August 30, 2006
  • Available Online: December 29, 2022
  • How to find what a user wants in tremendous amount of Web information is a great challenge to web search engine.By focusing downloading web pages on a given domain,focused crawlers can save a great deal of works and improve the quality of the information they provide.We put forward a method of focused crawling--MatchLink.It uses document vector model to evaluate topic relevance of the anchor and uses Naive Bayes algorithm and multilayer classification method to compute the topic relevance of the web page containing the anchor.According to these.two relevaneies,topic relevant web pages have prior claim to be downloaded.Experiment shows that the result is better than BestFirst and BreadthFirst.
  • [1]
    庞剑锋,卜东波,白硕.基于向量空间模型的文本自动分类系统的研究与实现[J].北方交通大学学报,2003,27(2):38—41.PANG Jian-feng,PU Dong-bo,BAI Shuo.Research and implementation of text categorization system based on VSM[J]. Journal of Beijing Jiaotong University,2003,27(2):38-41.(in Chinese)
    [2]
    CRAVEN M,DIPASQUO D,FREITAG D,et al.Learning to construct knowledge bases from the world wide web[J].Ar- tificial Intelligence,2000,118:69-113.
    [3]
    CHAKRABARTI S,VAN DEN BERG M,DOM B.Focused crawling:a new approach to topic-specific web resource dis- covery[J].Computer Networks,1999,31(11-16):1623-1640.
    [4]
    PORTER M F.An algorithm for suffix stripping[J].Program,1980,14(3):130-137.
    [5]
    ALTINGOVDE I S,ULUSOY O.Exploiting interclass rules for focused crawling[J].IEEE Intelligent Systems Archive, 2004,19(6):66-73.
    [6]
    CHO J,GARCIA-MOLINA H,PAGE L.Efficient crawling through URL ordering[J].Computer Networks,1998,30(1- 7):161-172.
    [7]
    DILIGENTI M,COETZEE F M,LAWRENCE S,et al.Focused crawling using context graphs[C/OL]//The 26th Inter- national Conference on Very Large Databases.[S.1.]:[s.n.],2000,[2006-05-05].http://clgiles,ist.psu.edu/papers/ VLDB-2000-focused-crawling,pdf
    [8]
    MOCALLUM A,NIGAM K,RENNIE J,et al.A machine learning approach to building domain-specific search engines[C/ OL]//The 6th International Joint Conference on Artificial Intelligence.[S.I.]:[s.n.].1999[2006-05-01].http:// www.kamalnigam,com/papers/cora-ijcai99,pdf.
  • Related Articles

    [1]DOU Huijing, GUO Hongliang, XING Luyang, LU Yao. Sparse Bayesian DOA Estimation Based on Subspace Model[J]. Journal of Beijing University of Technology, 2024, 50(12): 1421-1427. DOI: 10.11936/bjutxb2023060021
    [2]FANG Shanshan, CHEN Yanyan, LIU Xiaoming, WEI Panyi, LAI Jianhui. Identification of City Couriers Based on Mobile Phone Data[J]. Journal of Beijing University of Technology, 2017, 43(3): 413-421. DOI: 10.11936/bjutxb2016070035
    [3]SU Yi-la, LI Hui-min, WANG Fei. Mass Customized Product Configuration Method Based on Constraint Satisfaction Problem and Bayesian Networks[J]. Journal of Beijing University of Technology, 2015, 41(7): 1005-1011. DOI: 10.11936/bjutxb2014100074
    [4]JI Jun-zhong, ZHANG Ling-ling, WU Chen-sheng, WU Jin-yuan. Semantic Weight-based Naive Bayesian Algorithm for Text Sentiment Classification[J]. Journal of Beijing University of Technology, 2014, 40(12): 1884-1890.
    [5]LI Ming, LIU Lu, MIAO Rui, ZHU Yan-qiu. Approach to Searching Multiple Case Bases Based on Bayesian Belief Network[J]. Journal of Beijing University of Technology, 2012, 38(1): 81-85.
    [6]JI Jun-zhong, ZHANG Hong-xun, HU Ren-bing, LIU Chun-nian. A Tabu-search Based Bayesian Network Structure Learning Algorithm[J]. Journal of Beijing University of Technology, 2011, 37(8): 1274-1280.
    [7]LAI Ying-xu, YANG Zhen. Unknown Malicious Detection Based on Improved Bayes Algorithm[J]. Journal of Beijing University of Technology, 2011, 37(5): 766-772.
    [8]MA Dong-hui, GUO Xiao-dong, ZHOU Xi-yuan. Site Zoning Methodology for Urban District Lack of Investigation Data[J]. Journal of Beijing University of Technology, 2007, 33(5): 524-529.
    [9]JI Jun-zhong, LIU Chun-nian, JIANG Chuan, YANG Wen-sheng. Application of Bayesian Network and its Probability Reasoning to Intelligent Tutoring System[J]. Journal of Beijing University of Technology, 2002, 28(3): 353-357.
    [10]Zheng Gengxin, He Wei, Zhang Fang. Bayes Analysis of Earthquake Prediction Decision-making[J]. Journal of Beijing University of Technology, 2000, 26(4): 99-104.

Catalog

    Article views (5) PDF downloads (6) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return