Description
Lack of progress in automatically producing semantic representations
constitutes a major obstacle for natural
language processing. Our proposal addresses this issue by creating a
Unified Linguistic Annotation (ULA)
exemplified by the first large (550K words), balanced, semantically
annotated corpus. This corpus will have
most basic types of semantic information annotated according to
high-quality schemes using state-of-the-art
annotation technology. Crucially, all individual annotations, although
unified, will be kept separate in order to
make it easy to produce alternative annotations of a specific type of
semantic information (word senses,
anaphora, etc.) without modifying annotation at other levels. Our ULA
framework will be easily extendable to
incorporate new annotation schemes as they become available. We will
create an infrastructure including both
multiply annotated corpora and guidelines for merging so that the ULA
will grow after this project is complete.
This project is funded by the National Science Foundation
Computing Research Infrastructure Program.