LINGUIST List 15.2343

Thu Aug 19 2004

Disc: Sum: Linguist 15.2332: Survey Results, Hibino

Editor for this issue: Naomi Fox <>


  1. Mike Maxwell, Disc: Re: 15.2332, Sum: 'Who' & 'What' in Subject-verb Concord

Message 1: Disc: Re: 15.2332, Sum: 'Who' & 'What' in Subject-verb Concord

Date: Thu, 19 Aug 2004 09:52:06 -0400
From: Mike Maxwell <>
Subject: Disc: Re: 15.2332, Sum: 'Who' & 'What' in Subject-verb Concord

In some recent issues of Linguist List (most recently Linguist
15.2322), Hideo Hibino posted the results of a survey of agreement
done through LL. I couldn't help but notice two of the responses:

> I (AmE)(No judgements given) Try using a large database of spoken
> and written English and find out how language is really used.


> ...(4) sounds less awful than the others. Go to some electronic
> corpora. That is more reliable than people's judgements.

This is a common dispute, and there's a lot of water under this 
particular bridge. Nevertheless, I feel compelled to comment.

What you get when you look at corpora of _written_ language is, by
definition, how the written language is _used_. Whether corpora of
_spoken_ English represents how the spoken language is used depends on
how it was transcribed--it is not uncommon to not transcribe
hesitations, for example. It depends on why the transcript was made,
how much time was invested, who did the transcription, etc.

But asking how language is _used_ is akin to asking how cars are used. 
If you go to the junkyard, you'll find some of the ways cars are used. 
That may not help you, though, if you want to know how cars work. Or 
you could look at how cars get into accidents--again, that may be a nice 
way to find out about airbags, and maybe about how drunks drive, but it 
may not be the most enlightening way to find out how cars work.

Similarly, if you want to know how language works, looking at corpora is 
one way. The problem is, it's a mixed bag. You'll get dialect mixtures 
that you can't always sort out, whereas a survey of the sort Hideo did 
can give you that dialect information (and in Hideo's survey, indeed 
revealed an interesting pattern). (Whether you can sort it out in a 
corpus of course depends on how the corpus was collected.)

Written, and even more so spoken, corpora will also give you mistakes. 
Sometimes that's exactly what you want: there are collections of speech 
errors, for example, that presumably show something of the way the brain 
processes language. And of course there's the question of just what a 
'mistake' is: is it just that the user would have, if given time, come 
up with a better wording? Or is it an attempt to conform to what they 
remember their 5th grade English teacher taught them? Or on the other 
hand, is it a genuine error, which happened because the author's finger 
slipped, or someone came into the office in the middle of a sentence, or 
a speaker choked on their lunch, or they were distracted by music they 
were listening to, or a later editor changed something they didn't like, 
or... Many of these genuine errors would be corrected if the speaker 
was given a chance, and this is precisely what a survey (or other sorts 
of introspective evidence) allows.

In sum, I would claim that there is room for corpora evidence, but there 
is also plenty of room for introspection and surveys. Saying that one 
is more 'reliable' than the other is like asking whether beef or oranges 
are 'better food': it depends.

Mike Maxwell
Linguistic Data Consortium
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue