Publication - Patent

An Ontology Alignment Algorithm Based on Knowledge Graph Embedding

Ontology Alignment Algorithm

Application No.

    CN201710230135.0

Authors

    Xueqi Cheng, Yantao Jia, Manling Li, etc. (1st student author)

Sponsored by

    Cooperation Research Program of Chinese Academy of Sciences and Huawei Inc. (one of the world's top 500 companies).

Problem:

  • Objective: Ontology Alignment aims to align the same entities from two different ontologies.
  • Problem: Traditional ontology alignment methods formulate logic rule as equalities, and transform them into probability assessments to evaluate the probability whether two entities can be aligned. These transformations either require much labor effort to design, or based on assumption that attributes are mutually independent, which may not hold true in practice, as they stated. Moreover, it will take great effort to apply the established model to another specific area. For example, the model for Video needs to be modified much to be used for Book, since the logic rules are different.
  • This patent builds a general alignment model based on knowledge graph embedding. It aligns entities without employing logic rules or distinct data assumption, which can be easily transformed in different areas.

My Work

  • Proposed to jointly embeded two knowledge graphs, where the annotated aligned entities should be near to each other in the embedding space. Thus the alignment are learned implicitly without designing complex logic rules, and it only needs to prepare triples, so the model can be easily transformed to other areas.
  • Came up with the idea to employ co-training to overcome the dearth of alignment training data. Namely, the number of entities in practice is huge and far more than the manually annotated training entity pairs. Thus, the triples are divided into 2 sets according to the attributes related to them, and the embedding is firstly learned by one set of triples (i.e., actor-related, director related), and then learned by another set of triples (i.e., name-related, tag-related, abstract-related), and then cycling by turns.
  • Designed a fuction to reduce semantic drift during co-training. The function is based on the edit distance of entities and a relative threshold.