LINGUIST List 32.1938
Fri Jun 04 2021
FYI: Danish Gigaword Corpus v1.0 Released
Editor for this issue: Everett Green <everettlinguistlist.org>
Leon Derczynski <leod
Danish Gigaword Corpus v1.0 Released E-mail this message to a friend
This week marks the release of Danish Gigaword v1.0, with over 1,000,000,000 words of Danish, spanning centuries, dialects, registers, modalities, and domains. This marks the largest single collection of openly-licensed documents in Danish, and we hope helps bring the language up from an underprivileged to a well-resourced one.
* The DAGW homepage, https://gigaword.dk/
, where there's a download link and license information;
* The paper in the ACL anthology, https://www.aclweb.org/anthology/2021.nodalida-main.46/
Thank you for your interest.
Leon Derczynski (IT University of Copenhagen)
Manuel R. Ciosici (University of Southern California / IT University of Copenhagen)
Linguistic Field(s): Computational Linguistics; Text/Corpus Linguistics
Subject Language(s): Danish (dan)
Page Updated: 04-Jun-2021