JIANG Zong-li, LU Guo-xiang. MatchLink:A Focused Crawling Method[J]. Journal of Beijing University of Technology, 2007, 33(11): 1227-1232.
    Citation: JIANG Zong-li, LU Guo-xiang. MatchLink:A Focused Crawling Method[J]. Journal of Beijing University of Technology, 2007, 33(11): 1227-1232.

    MatchLink:A Focused Crawling Method

    • How to find what a user wants in tremendous amount of Web information is a great challenge to web search engine.By focusing downloading web pages on a given domain,focused crawlers can save a great deal of works and improve the quality of the information they provide.We put forward a method of focused crawling--MatchLink.It uses document vector model to evaluate topic relevance of the anchor and uses Naive Bayes algorithm and multilayer classification method to compute the topic relevance of the web page containing the anchor.According to these.two relevaneies,topic relevant web pages have prior claim to be downloaded.Experiment shows that the result is better than BestFirst and BreadthFirst.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return