ELRA is happy to announce that 1 new Written Corpus is now available in its catalogue:
ELRA-W0050 The CINTIL Corpus – International Corpus of Portuguese CINTIL-Corpus Internacional do Português is a linguistically interpreted written and spoken corpus of European Portuguese. It is composed of one million annotated tokens, each one of which verified by human expert annotators. The annotation comprises information on part-of-speech, open class lemma and inflection, multi-word expressions pertaining to the class of adverbs and to the closed POS classes, and multi-word proper names (for named entity recognition). The corpus is developed over raw textual materials of several types, of which 30% are spoken materials.