结构化向量空间模型及其在Web信息检索中的应用
Structured Vector Space Model and Its Application to Web Information Retrieval System
-
摘要: 针对Web信息检索的特点,通过分析传统向量空间模型在Web检索中存在的若干问题,对传统向量空间模型进行改进,并提出结构化向量空间模型,其基本思想是将Web文档表达为具有一定逻辑结构的向量,即结构化向量组.每个结构化向量组由若干子向量构成,每个子向量对应Web文档中相对应独立的文本段.理论分析和实验证明,该方法能提高向量空间模型在信息检索精度和召回率方面的性能.Abstract: Considering specialties on web information retrieval and analyzing some problems about the traditional vector space model,this paper proposes the concept of structured vector space model.The new model represents a web document as a logically structured vector,which contains several sub-vectors related to relatively independent parts such as title,subtitle,plain text and anchor text,etc.by the retical analysis and experimental proof.It can improve the performance of the traditional vector space model in precision and recall.