|
Description:
|
While supervised corpus-based methods are highly accurate for different NLP
tasks, including morphological tagging, they are difficult to port to other
languages because they require resources that are expensive to create. As a
result, many languages have no realistic prospect for morpho-syntactic
annotation in the foreseeable future. The method presented in this book
aims to overcome this problem by significantly limiting the necessary data
and instead extrapolating the relevant information from another, related
language. The approach has been tested on Catalan, Portuguese, and Russian.
Although these languages are only relatively resource-poor, the same method
can be in principle applied to any inflected language, as long as there is
an annotated corpus of a related language available. Time needed for
adjusting the system to a new language constitutes a fraction of the time
needed for systems with extensive, manually created resources: days instead
of years.
This book touches upon a number of topics: typology, morphology, corpus
linguistics, contrastive linguistics, linguistic annotation, computational
linguistics and Natural Language Processing (NLP). Researchers and students
who are interested in these scientific areas as well as in cross-lingual
studies and applications will greatly benefit from this work. Scholars and
practitioners in computer science and linguistics are the prospective
readers of this book.
|