Publication - Patent

An Attribute Extraction Method for Web Pages

Cascaded finite state automaton constructed in rule-based method

Application No.

    CN201510071993.6

Authors

    Xueqi Cheng, Yantao Jia, Zeya Zhao, Manling Li, etc. (2nd student author)

Sponsored by

    Chinese Academy of Sciences

Abstract

  • Objective: Attribute Extraction aims to extract attributes of entities from free text.
  • Problem: Existing attribute extraction methods based on rules or supervised methods. However, rule-based methods have low efficieny in matching rules to extract entities, while supervised methods suffer from training data deficiency.
  • Proposed to construct cascaded finite state automaton and employ Directed Acyclic Graph (DAG) traversal method to speed the rule-based method. Moreover, proposed to generate training data automaticly by distant supervision.

My Work

  • Implemented the module to generate training data authomaticly by distant supervision using Wikipedia Infobox.
  • Extracted features of training data to train the supervised attribute extractor based on Conditional Random Field (CRF).
  • Assisted in making rules to extract attributes by cascaded finite state automaton.