The LINGUIST List is dedicated to providing information on language and language analysis, and to providing the discipline of linguistics with the infrastructure necessary to function in the digital world. LINGUIST is a free resource, run by linguistics students and faculty, and supported by your donations. Please support LINGUIST List during the 2017 Fund Drive.
Inherent Vowel Quality and Perception of Stress
|Submitter Email:||click here to access email|
I have a question about how the perception of syllable stress may be affected by inherent vowel quality. I would really appreciate it if anyone could share their thoughts on my questions and/or direct me to some good literature on this topic! I am a graduate student in Communication Disorders and do not have a person with an advanced knowledge of phonetics that I can consult in my department. First, I will just say what I found in my experiment, and then give some more description.
My experiment was on word segmentation by English-speaking adults. Adults were asked to listen to a nonsense language and then judge which of two trisyllabic “words” sounded more like a word from the language. One of the trisyllabic words was a “real word” in the language and the other word was a possible misparsing of the speech stream. I used natural language stimuli.
I found that, after a 20 min exposure to a nonsense language, adults were more likely to judge that a set of three syllables was a “word” if that word began with a syllable containing a high vowel (e.g. “pi”) than if a “word” began with a low or mid vowel (e.g. ''ta'' or ''bo''). These results were statistically significant.
So, I am wondering why this might be the case. Here is some background:
For this experiment, I needed to create a ''monotonous'' speech stream for an artificial language in which rhythm could not be a cue to word boundaries.
I wanted to use natural syllables for this experiment—rather than synthetic syllables. My natural syllables were all CV. The speech stream went something like this: “ta-di-ke # bo-du-ka # to-pi-ga # etc (where # indicates a word boundary). There were no pauses between “words” in the speech stream. The syllables were read individually by a female speaker and then strung together in MATLAB to create the speech stream.
I had a very hard time creating this language using natural tokens. To my ear—and to the ears of my pilot subjects—it sounded as if word boundaries occurred with syllables beginning with high vowels, (e.g. “di” sounded like it was the beginning of a word, as did “du” and “pi,” etc.)
I thought that this perception of these syllables as being the beginning of words might be due to the inherently higher pitch of these syllables, relative to the other syllables (such as “ta” or “bo”). Because of this, I re-selected tokens from my recordings and tried to match them as closely as possible in terms of pitch.
As per other experiments in this same area (word segmentation), I then made small adjustments to volume and duration (again, in order to create “monotonous speech). I ended up mostly equalizing the intensity of each vowel (64-66 dbs), as well as the duration (.30 s), as well as lowering the pitch of the high vowels and raising the pitch of the low vowels. After modification, the pitch range for the vowels was 177 Hz (for “to”) to 184 Hz (for “pu”).
I then ran more subjects using this language. I found that my subjects were still significantly more likely to judge that words that began with a high vowel were more like a “word” from the language they were exposed to than words that began with a mid or low vowel. Specifically, if asked to judge either “pi-ga-to” and “ga-to-pi” as a word, subjects were statistically more likely to choose the first. These results were quite robust.
Additionally, from my own perspective, I always heard the high vowel syllables as “more stressed” when I listened to the speech stream. What is perplexing to me is that, even though the pitch range was quite small between the high and low vowels (after manipulation), the high vowels still sounded more prominent in the speech stream.
My preliminary idea regarding these results is that, by equalizing the volume of the vowels, I may have made the high vowels sound more prominent, as they should be inherently less loud than the low vowels.
So here are my questions:
I am wondering if anyone could point me to some work on the perception of stress in English--particularly work that addresses:
1) Whether, in the absence of other cues, inherent pitch will make a syllable sound more prominent
2) Whether the interplay between acoustic correlates of stress (pitch, duration, loudness) is such that, by bringing vowels closer together in acoustic space in terms of both pitch and volume, high vowels may have been perceived to be more “stressed” because they are inherently quieter than low vowels.
I've tried to explore the literature on my own and have found myself a little bit lost, since I am not a phonetician. I would very much appreciate any advice on this matter!
Thank you very much!