Project

Entity Typing for Chinese Wikipedia, Baidu Baike and Hudong Baike

The semi-structured text and free text in Baidu Baike

Date

    09/2014-01/2015

Sponsored by

    National Natural Science Foundation of China (NSFC) No. 61402442

Advisor:

    Yantao Jia, Yuanzhuo Wang

Project Objectives:

    Entity typing aims to assign entities with types, including coarse-grained entity typing with coarse types (e.g., person, organization, location), and fine-grained entity typing with fine-grained types (e.g., politician, artist, etc.).

My Work 1

  • Participated in designing the fine-grained entity typing module.
  • Implemented the entity linking module by using context similarity.
  • Selected features to define context similarities, including word frequency, abbreviation, alias, and so on.
  • Implemented the random walk module to generate types. The random walk starts from every node in the graph, and walk to another node according to a transfer possibility.
  • Co-authored a patent (CN201510033050.4.), which was accepted as a patent in China.

My Work 2

  • Designed and implemented the coarse-grained entity typing model.
  • Extracted features of entites in Wikipedia, Baidu Baike and Hudong Baike, including constructing semantic features from free text and generating attribute features from semi-structured text.
  • Implemented a classification module by Support Vector Machine (SVM).
  • Achieved improvement from 85% to 95% of the F1 measure.