The paper implements a pipeline system for relation extraction. First NE, the RE. The RE system features are selected based on the ACE corpus. The test domain is the sports news.
They showed that gazetteer doesn't help that much.
Identify the head of arguments is important. They proposed several heuristics for this:
1. append "It was" to the entity, then parse the entity by the Stanford parser.
2. remove dashes as it is not common in the Penn treebank.
3. Guide the Stanford parser such that the final tree contains a constituent with the same span. (Need the check how to do this.)
They show domain specific deterministic rules also helps. such as "teamFinalScore(G, S) :- teamInGame(T,G), teamScoringAll(T,S)"
No comments:
Post a Comment