大气污染领域本体的半自动构建及语义推理

    Semi-automated Construction of Air Pollution Domain Ontology and Semantic Reasoning

    • 摘要: 为了明确大气污染物、污染源、影响因素、评价指标、危害等之间的关系,分析大气污染传播路径,建立了一个较为清晰、完善的大气污染领域本体.首先,基于机器学习和自然语言处理等技术,提出一种基于注意力机制的序列标注联合抽取实体关系的方法,在双向长短时记忆(long short-term memory,LSTM)网络模型中加入注意力机制,并将实体和关系联合标注,从而进行实体关系抽取.其次,结合词频-逆文档频率(term frequency-inverse document frequency,TF-IDF)核心概念挖掘方法进行知识抽取,并将概念、属性、关系和实例组织起来,从而实现大气污染本体模型的半自动构建.最后,在本体和实例的基础上通过Protégé的SPARQL Query模块和HermiT推理机分别进行条件推理和可视化推理.结果表明,基于注意力机制的序列标注实体关系联合抽取方法所构建的大气污染领域本体包含核心实体68个,实例数360个,相较于现有的本领域本体,在全面性、有效性、准确性和可重用性方面都有较好表现,同时推理出了Ca2+和K+等污染离子的传播路径.因此,基于注意力机制的序列标注联合抽取实体关系的方法能够有效地半自动构建大气污染领域本体,推理出清晰的大气污染传播路径.

       

      Abstract: To clarify the relationship among air pollutants, pollution sources, influencing factors, evaluation indicators and harms, and to analyze the air pollution transmission path, a clearer and more complete domain ontology of air pollution was established. First, a method of entity relationship joint extraction based on attention mechanism was proposed. Attention mechanism was added to the model of bi-directional long and short time memory network, and entity and relation were labeled jointly to extract entity relation. Second, it was combined with term frequency-inverse document frequency (TF-IDF) core concept mining method to extract knowledge, and then concepts, relationships, and relevant instances were organized in hierarchy. Furthermore, the ontology model was constructed semi-automatically. Finally, conditional reasoning and visual reasoning were carried out on the basis of ontology and instance through SPARQL Query module and HermiT reasoning machine of Protégé. Results show that the domain ontology of atmospheric pollution constructed by the entity relation extraction method based on attention mechanism contains 68 core entities and 360 instances. Compared with the existing domain ontology, the validity, accuracy and comprehensiveness, reusability of this method have better performance. At the same time, the propagation paths of pollution of ions were deduced. Therefore, the method of sequence labeling and joint extraction of entity relations based on attention mechanism can effectively construct air pollution domain ontology semi-automatically and deduce air pollution propagation path clearly.

       

    /

    返回文章
    返回