Editor for this issue: Karen Milligan <karen
linguistlist.org>
Mark Jones has done a good job of laying out two of the difficulties in corpus phonetics/phonology. However, that's only one side of the story. First, the problems he raises are solvable with a proper statistical approach and proper corpus design. Second, "classic" approaches have problems of their own. His first objection is that reading a text may not provide enough control of the environment in which a word is spoken. That is certainly true, if one randomly chooses a text and just asks people to read it. Different people will often interpret it different ways, and speak it different ways. Shakespeare's plays provide a good example here: the same text is interpreted by different actors, and the acoustic results can be dramatically different. However, without pretending to solve all the problems of experimental design, I can point out some possible solutions: 1) Carefully design texts that have only one reasonable interpretation. 2) Ask listeners (listeners who aren't an author) to evaluate the resulting speech: "Was he putting the sentence focus on 'George?'" Then, once the data is acquired, you can ask listeners to mark prosodic or other factors that can influence the pronunciation. For instance, listeners can mark pauses. Then, later stages of the analysis can differentiate /a/ preceeding a pause from /a/ that doesn't. You need to build models that can survive incomplete data. The model has to be statistically correct, so that its results will show where the data is missing. For instance, if the data contains only one example of /a/ at the end of a sentence, the results should not describe how /a/'s formants change under these conditions. Hopefully, the model also generalizes to some extent. For instance, one might have a model that claims that all vowels have similar formant shifts in sentence-final position. If so, one could still measure an effect averaged over all vowels, even if /a/ were missing, so long as the model's assumptions were clearly stated. And, clearly, one needs a large enough corpus to study improbable combinations. You may also need to 'seed' the corpus with words that you want to study. Again, a proper statistical analysis will tell you what you do and don't know as a result of the experiment. On the other hand the problem with "classic" approaches is that they tend to yield precise results about a language called "Laboratory English", which is not quite the same language as is spoken on the street. So, my general attitude is that _if_ you can answer a question with a corpus approach, you should. Because that lets you study real languages. Doing the job with a corpus-based approach may involve designing your own corpus, and it may involve subsidiary experiments to select relevant parts of the corpus. It's generally a lot of work. If you can't design a corpus experiment, you have to fall back on a formal, laboratory experiment. There, you have more control of the conditions, but you always have to be aware that your subjects will not speak quite normally. Also, their answers to questions like "is X contrastive" may not reflect how they would interpret such speech in the real world. So, different questions get answered different ways.Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue