Project

OpenKN at TAC KBP 2016

OpenKN at TAC KBP 2016

Date

    05/2016 - 08/2016

For

    TAC KBP 2016 Cold Start Track (English)

Team

    Manling Li, Zixuan Li, Juan Yao, Fan Yang, Yunqi Qiu

Advisor

    Xinlei Chen, Yantao Jia, Yuanzhuo Wang, etc.

Paper

Project Objective

    This project aims to construct a knowledge graph from a given corpus. It mainly contains three parts: Entity Discovery, Entity Linking and Slot Filling. Entity Discovery is to recognize named entities and nominal entities. Entity Linking is to decide the entity for each entity mention, which mainly contains linking entities to the given knowledge base (Freebase) and clustering remain entity mentions that do not be linked. Slot Filling is to fill 41 properties of entities (i.e., relation extraction).

Result

    Cold Start Entity Discovery and Linking: ranked 2nd out of 7 teams, where in Entity Discovery, ranked 1st out of 7 teams, and 4 measures ranked 1st among 6 measures; Cold Start Slot Filling: ranked 9th out of 19 teams

My Work

  • Lead a 5 member team (ICTCAS_OKN), consisting of 4 undergrates (Zixuan Li, Juan Yao, Fan Yang, Yunqi Qiu) and me. Responsible for undergradates supervision, task assignment, and progress control. .
  • Designed the architecture of OpenKN, including Entity Discovery module, Entity Linking module and Slot Filling module.
  • Derived the named entities of 5 types, i.e., PER, ORG, GPE, FAC, LOC, and nominal entities by training nominal entity recognizers and rules. To better extract three kinds of location related entities, a named entity recognizer is trained to detect GPE, FAC, LOC.
  • Participated in implementing entity linking module, including Freebase indexing, Wikipedia indexing, query expansion and feature selection. Clustered NIL entities through one pass cluster algorithm. We also made attempts to use LSTM to link entities, which suffered from dearth of training data, thus did not be used in 2016. But we made an improvement to make it work this year, and this model played an important role in 2017.
  • Extracted relations of 41 types by conducting 4 relation extraction modules (i.e., OpenIE-based extractor, Bootstrapping-based extractor, Implicit Relation Extractor and rule-based method) and a bagging module. We also tried a distant supervision based model using knowledge graph embedding and word embedding, which did work well in 2016. But it was a meaningful effort, since based on it, we implemented the relation extractor in the following collaborate project with Huawei Inc, which is regarded as one of the main innovations by Huawei Inc.