Editor for this issue: Naomi Fox <fox
linguistlist.org>
In some recent issues of Linguist List (most recently Linguist 15.2322), Hideo Hibino posted the results of a survey of agreement done through LL. I couldn't help but notice two of the responses: > I (AmE)(No judgements given) Try using a large database of spoken > and written English and find out how language is really used. and > ...(4) sounds less awful than the others. Go to some electronic > corpora. That is more reliable than people's judgements. This is a common dispute, and there's a lot of water under this particular bridge. Nevertheless, I feel compelled to comment. What you get when you look at corpora of _written_ language is, by definition, how the written language is _used_. Whether corpora of _spoken_ English represents how the spoken language is used depends on how it was transcribed--it is not uncommon to not transcribe hesitations, for example. It depends on why the transcript was made, how much time was invested, who did the transcription, etc. But asking how language is _used_ is akin to asking how cars are used. If you go to the junkyard, you'll find some of the ways cars are used. That may not help you, though, if you want to know how cars work. Or you could look at how cars get into accidents--again, that may be a nice way to find out about airbags, and maybe about how drunks drive, but it may not be the most enlightening way to find out how cars work. Similarly, if you want to know how language works, looking at corpora is one way. The problem is, it's a mixed bag. You'll get dialect mixtures that you can't always sort out, whereas a survey of the sort Hideo did can give you that dialect information (and in Hideo's survey, indeed revealed an interesting pattern). (Whether you can sort it out in a corpus of course depends on how the corpus was collected.) Written, and even more so spoken, corpora will also give you mistakes. Sometimes that's exactly what you want: there are collections of speech errors, for example, that presumably show something of the way the brain processes language. And of course there's the question of just what a 'mistake' is: is it just that the user would have, if given time, come up with a better wording? Or is it an attempt to conform to what they remember their 5th grade English teacher taught them? Or on the other hand, is it a genuine error, which happened because the author's finger slipped, or someone came into the office in the middle of a sentence, or a speaker choked on their lunch, or they were distracted by music they were listening to, or a later editor changed something they didn't like, or... Many of these genuine errors would be corrected if the speaker was given a chance, and this is precisely what a survey (or other sorts of introspective evidence) allows. In sum, I would claim that there is room for corpora evidence, but there is also plenty of room for introspection and surveys. Saying that one is more 'reliable' than the other is like asking whether beef or oranges are 'better food': it depends. Mike Maxwell Linguistic Data Consortium maxwellMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueldc.upenn.edu