LINGUIST List 33.2653

Wed Aug 31 2022

Software: Large corpus of German YouTube language now available

Editor for this issue: Everett Green <everettlinguistlist.org>



Date: 18-Aug-2022
From: Louis Cotgrove <cotgroveids-mannheim.de>
Subject: Large corpus of German YouTube language now available
E-mail this message to a friend

Dear Linguist-Listers,

The Nottingham Corpus of German YouTube Language (Nottinghamer Korpus Deutscher YouTube-Sprache or NottDeuYTSch) is now available for analysis in a variety of formats, including tsv, R object, JSON, SketchEngine and CorpusExplorer.

https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-4806

The NottDeuYTSch corpus contains over 33 million words taken from approximately 3 million YouTube comments from videos published between 2008 to 2018 targeted at a young, German-speaking demographic and represents an authentic language snapshot of young German speakers. The corpus was proportionally sampled based on video category and year from a database of 112 popular German-speaking YouTube channels in the DACH region for optimal representativeness and balance and contains a considerable amount of associated metadata for each comment that enable further longitudinal cross-sectional analyses.

If you have any questions or queries about the corpus, please feel free to email me at cotgroveids-mannheim.de

Kind Regards

Louis Cotgrove
Abteilung: Lexik

Leibniz-Institut für Deutsche Sprache
R5, 6-13
D-68161 Mannheim

Linguistic Field(s): Text/Corpus Linguistics

Subject Language(s): German (deu)


Page Updated: 31-Aug-2022