LINGUIST List 27.937

Mon Feb 22 2016

Disc: Significance testing for corpus comparison

Editor for this issue: Anna White <>

Date: 20-Feb-2016
From: Bettina Eiber <>
Subject: Significance testing for corpus comparison
E-mail this message to a friend

Dear linguists,

I am working on a corpus containing Wikipedia articles and articles from printed encyclopedias. I would like to study differences in style between Computer Mediated Discourse and written discourse. My corpus contains articles from 4 disciplines and it is thematically comparable because I always chose the same lemma.

I also calculated relative frequencies and now I ask myself how to find out the most typical words for each subcorpus (Wikipedia vs. printed encyclopedias). For this purpose I ask the question if statistical methods like significance testing could help here. I read about LL-test, chi square and also non-parametric tests.

Now: Which test should I apply for my research question or should I rely on other measures?

Thank you for your answers,
Bettina Eiber

Linguistic Field(s): Text/Corpus Linguistics

Page Updated: 22-Feb-2016