The browser you are using is not supported by this website. All versions of Internet Explorer are no longer supported, either by us or Microsoft (read more here: https://www.microsoft.com/en-us/microsoft-365/windows/end-of-ie-support).

Please use a modern browser to fully experience our website, such as the newest versions of Edge, Chrome, Firefox or Safari etc.

Default user image.

Anders Björkelund

Researcher

Default user image.

Towards Robust Linguistic Analysis using OntoNotes

Author

  • Sameer Pradhan
  • Alessandro Moschitti
  • Nianwen Xue
  • Hwee Tou Ng
  • Anders Björkelund
  • Olga Uryupina
  • Yuchen Zhang
  • Zhi Zhong

Summary, in English

Large-scale linguistically annotated corpora have played a crucial role in advancing the state of the art of key natural language technologies such as syntactic, semantic and discourse analyzers, and they serve as training data as well as evaluation benchmarks. Up till now, however, most of the evaluation has been done on monolithic corpora such as the Penn Treebank, the Proposition Bank. As a result, it is still unclear how the state-of-the-art analyzers perform in general on data from a variety of genres or domains. The completion of the OntoNotes corpus, a large-scale, multi-genre, multilingual corpus manually annotated with syntactic, semantic and discourse information, makes it possible to perform such an evaluation. This paper presents an analysis of the performance of publicly available, state-of-the-art tools on all layers and languages in the OntoNotes v5.0 corpus. This should set the benchmark for future development of various NLP components in syntax and semantics, and possibly encourage research towards an integrated system that makes use of the various layers jointly to improve overall performance

Publishing year

2013-08-01

Language

English

Pages

143-152

Publication/Series

Proceedings of the Seventeenth Conference on Computational Natural Language Learning

Document type

Conference paper

Publisher

Association for Computational Linguistics

Topic

  • Language Technology (Computational Linguistics)

Status

Published

ISBN/ISSN/Other

  • ISBN: 978-1-937284-70-1