MA Zhi-qiang, ZHANG Ze-guang, YAN Rui, YANG Shuang-tao. Collecting Model of Focused Crawler for Mongolian Website[J]. Journal of Beijing University of Technology, 2015, 41(7): 1012-1019. DOI: 10.11936/bjutxb2014120001
    Citation: MA Zhi-qiang, ZHANG Ze-guang, YAN Rui, YANG Shuang-tao. Collecting Model of Focused Crawler for Mongolian Website[J]. Journal of Beijing University of Technology, 2015, 41(7): 1012-1019. DOI: 10.11936/bjutxb2014120001

    Collecting Model of Focused Crawler for Mongolian Website

    • Forecast of collecting URL and tunnel discovery are two core issues in Focused crawler for Mongolian website. Therefore, a collecting model was proposed based on topic group of site clustering, ordering and tunnel discovery. First, through the topic identification text, to be crawling URL was divided into the site links and non site links. Second, a URL priority ordering algorithm was established by using the text similarity and the hyperlink graph analysis, and an adaptive tunnel discovery algorithm based on website was designed. Finally, the system of focused crawler for the Mongolian website was constructed. The experimental results show that the accurate rate of collecting, the amount of information and the collection rate have been improved significantly compared than the baseline algorithm.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return