LINGUIST List 28.825

Mon Feb 13 2017

FYI: Over 95 Million Wikipedia Discussion Comments

Editor for this issue: Yue Chen <>

Date: 08-Feb-2017
From: Melody Kramer <>
Subject: Over 95 Million Wikipedia Discussion Comments
E-mail this message to a friend

Yesterday, Wikipedia released a corpus of all 95 million user and article talk
comments made on Wikipedia between 2001-2015 - It is the largest annotated
dataset of online personal attacks and a corpus of over 95 million Wikipedia
discussion comments.

More information at:

Both data sets are available on FigShare, a research repository where users
can share data, to support further research:

If you’re interested in collaborating with the Wikimedia Foundation on
research in this area, you can find documentation on formal collaborations

You can also follow WikiResearch on Twitter for news and updates on research,
datasets and APIs from Wikimedia projects or contact the Wikimedia Research
team at

Linguistic Field(s): Computational Linguistics
Text/Corpus Linguistics

Page Updated: 13-Feb-2017