MatchLink:A Focused Crawling Method
-
Graphical Abstract
-
Abstract
How to find what a user wants in tremendous amount of Web information is a great challenge to web search engine.By focusing downloading web pages on a given domain,focused crawlers can save a great deal of works and improve the quality of the information they provide.We put forward a method of focused crawling--MatchLink.It uses document vector model to evaluate topic relevance of the anchor and uses Naive Bayes algorithm and multilayer classification method to compute the topic relevance of the web page containing the anchor.According to these.two relevaneies,topic relevant web pages have prior claim to be downloaded.Experiment shows that the result is better than BestFirst and BreadthFirst.
-
-