LINGUIST List 32.1938

Fri Jun 04 2021

FYI: Danish Gigaword Corpus v1.0 Released

Editor for this issue: Everett Green <everettlinguistlist.org>



Date: 03-Jun-2021
From: Leon Derczynski <leoditu.dk>
Subject: Danish Gigaword Corpus v1.0 Released
E-mail this message to a friend

This week marks the release of Danish Gigaword v1.0, with over 1,000,000,000 words of Danish, spanning centuries, dialects, registers, modalities, and domains. This marks the largest single collection of openly-licensed documents in Danish, and we hope helps bring the language up from an underprivileged to a well-resourced one.

Links:
* The DAGW homepage, https://gigaword.dk/ , where there's a download link and license information;
* The paper in the ACL anthology, https://www.aclweb.org/anthology/2021.nodalida-main.46/

Thank you for your interest.

Faithfully,

Leon Derczynski (IT University of Copenhagen)
Manuel R. Ciosici (University of Southern California / IT University of Copenhagen)

Linguistic Field(s): Computational Linguistics; Text/Corpus Linguistics

Subject Language(s): Danish (dan)


Page Updated: 04-Jun-2021