基于LDA模型的主题演化分析:以情报学文献为例

    Evolution of Topic Using LDA Model: Evidence From Information Science Journals

    • 摘要: 为了掌握科研动态和追踪研究热点,需要挖掘文献中的主题及其变化规律,因此,提出了一种基于隐性狄利克雷分布(latent Dirichlet allocation,LDA)的主题演化分析模型.首先,在整个文本集合上使用LDA模型识别主题及其关键词,并计算每个时间窗口中文档-主题概率分布;然后,对各个时间窗口下的文本集合分别使用LDA模型计算出主题-词汇概率分布,并计算不同时间窗口下不同主题的相似度,从而得出主题强度的演化趋势;最后,通过相似主题下的词汇的概率分布得到主题内容的变化.可观察到中文情报学领域,“语义分析”等主题的关注度具有持续上升的趋势.

       

      Abstract: To learn about the trend and monitor the hot topic of research, mining evolution of topic intensity from papers plays an important role. A model based on Latent Dirichlet Allocation (LDA) was proposed. Firstly, the collections of all the papers by using LDA to find out topics and their key words, and probability distribution of documentation-topic on different time windows was obtained. Secondly, LDA was applied in papers on every single time window to get probability distribution of topic-word, through which similarity of topics from different time windows were computed. The trend of topic intensity was figured out, and the words probability of similar topics can help figure out the trend of topic content. It shows that the topic of semantic analysis draws more and more attention in the field of Chinese informatics.

       

    /

    返回文章
    返回