LINGUIST List 17.905

Fri Mar 24 2006

FYI: The IPI PAN Corpus of Polish (2nd. Edition)

Editor for this issue: Svetlana Aksenova <svetlanalinguistlist.org>


Directory         1.    Adam Przepiórkowski, The IPI PAN Corpus of Polish (2nd. Edition)


Message 1: The IPI PAN Corpus of Polish (2nd. Edition)
Date: 23-Mar-2006
From: Adam Przepiórkowski <adampipipan.waw.pl>
Subject: The IPI PAN Corpus of Polish (2nd. Edition)


The 2nd edition of the IPI PAN Corpus of Polish, developedat the Institute of Computer Science of the Polish Academyof Sciences (PAS), is available at the web pages of:

- the Institute of Computer Science PAS: http://korpus.pl/en/- the Institute of Polish Language PAS: http://corpus.ijp-pan.krakow.pl/en/

To the best of our knowledge, this is currently the largestsearchable morphosyntactically annotated corpus of Polishavailable to the public.

The whole corpus consists of over 250 million segments(about 200 million orthographic words) and it is notbalanced, but a balanced sample of over 30 million segmentsis also available. These corpora can be directly searchedat the above addresses (do read the query syntax cheatsheetat http://korpus.pl/en/cheatsheet/index.html) or downloadedin a binary form to be used with a standalone version of thecorpus search engine Poliqarp. Consult the above URLs formore details.

Adam P.

Linguistic Field(s): Text/Corpus Linguistics
Subject Language(s): Polish (pol)