慕苏亭: Surdeanu_ACL11_customizing an information extraction system to a new domain

Friday, November 23, 2012

Surdeanu_ACL11_customizing an information extraction system to a new domain

The paper implements a pipeline system for relation extraction. First NE, the RE. The RE system features are selected based on the ACE corpus. The test domain is the sports news.

They showed that gazetteer doesn't help that much.
Identify the head of arguments is important. They proposed several heuristics for this:

1. append "It was" to the entity, then parse the entity by the Stanford parser.
2. remove dashes as it is not common in the Penn treebank.
3. Guide the Stanford parser such that the final tree contains a constituent with the same span. (Need the check how to do this.)

They show domain specific deterministic rules also helps. such as "teamFinalScore(G, S) :- teamInGame(T,G), teamScoringAll(T,S)"

慕苏亭

Friday, November 23, 2012

Surdeanu_ACL11_customizing an information extraction system to a new domain

No comments:

Post a Comment

About Me