LINGUIST List 14.991

Thu Apr 3 2003

Qs: Voice Quality Perception

Editor for this issue: Naomi Fox <foxlinguistlist.org>


FUND DRIVE 2003 To give you an incentive to donate, many of our Supporting Publishers have generously donated some amazing linguistic prizes. As a donor you are automatically entered into this prize draw. To find out what's on offer and the rules etc., visit: http://linguistlist.org/prizedraw.html As of 1pm, 04/02/03, we only have $18,854.59 to go. Target: $50,000 Total Raised: $31,145.41 Number of Donors: 650 Percentage of Subscribers Donated: 3.82% Please consider making a $5 donation at: http://linguistlist.org/donation.html The LINGUIST List depends on the generous contributions from subscribers like you; we would not be able to operate without your help. The moderators, staff, and student editors at LINGUIST would like to take this opportunity to thank you for your continuous support. We'd like to remind readers that the responses to queries are usually best posted to the individual asking the question. That individual is then strongly encouraged to post a summary to the list. This policy was instituted to help control the huge volume of mail on LINGUIST; so we would appreciate your cooperating with it whenever it seems appropriate. In addition to posting a summary, we'd like to remind people that it is usually a good idea to personally thank those individuals who have taken the trouble to respond to the query. To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.html.

Directory

  1. Christel de Bruijn, intrarater reliability - anchoring stimuli

Message 1: intrarater reliability - anchoring stimuli

Date: Wed, 2 Apr 2003 18:59:03 +0100 (BST)
From: Christel de Bruijn <christellarynx.shef.ac.uk>
Subject: intrarater reliability - anchoring stimuli




Dear list,

First of all apologies for the long message, but I will really
appreciate any comments or thoughts on my 2 questions below.

I was wondering if anybody could give me some advice on two problems,
the first one related to the amount of data duplication needed to
achieve a good estimate of intrarater reliability in a perceptual
experiment, and the secone one related to the amount of training
stimuli needed in the anchoring/training phase of the experiment.

I am about to start some tests on the perception of voice quality. A
panel of 6 expert listeners (i.e. voice therapists) will be asked to
rate the voice quality of a number of speech fragments on 12 - 15
perceptual parameters (e.g. roughness, breathiness etc.). The
parameters are rated on a 5 point equal appearing interval scale.

The speech fragments consist of 156 sustained vowels (divided into 3
groups of different vowels), 52 fragments of conversational speech and
52 fragments of the Rainbow passage. The 3 different types of speech
fragments will be presented in separate listening sessions.

In order to calculate intrarater reliability (i.e. the
self-consistency of the listener) , I need to duplicate some of the
stimuli. The best way to do this, is to duplicate all the
stimuli. However, given the large amount of speech material in the
tests, this will be very impractical. (The listeners will not be
prepared to sit through 12 hours or so of testing).

The literature provides little guidance as to the minimal amount of
stimuli that should be duplicated in order to achieve an accuarate
reliability coefficient. Some studies report a duplication of 10% or
less, some 30%, a few 50% and the very odd study duplicates 100%. But
never are any justifications given for the chosen percentage.

(I must admit I haven't decided yet on which statistic to use for the
reliability, but Pearson's r and intraclass correlation coefficients
seem to be widely used)

Therefore, my first question is:

Given the large amount of speech material, what should be the minimal
amount of data to be duplicated?

(A complicating factor is also the use of conversation fragments.
Listeners will probably be aware of the duplication, if only because
of conversation content, and may remember their scores for that
particular fragment)

- ------------------------------------

The second question is related to the anchoring phase of the
experiment.

It is common practice to provide listeners with anchoring stimuli
before the actual listening test. Usually the listeners are provided
with explicit anchors, i.e. the speech fragment is presented together
with the perceptual rating for a particular parameter.

In my experiment however, I have decided against the use of explicit
anchors, in order to avoid the introduction of a bias. (This is done
because the perceptual labels will become the baseline for acoustic
correlates). Instead, listeners will be presented with a random
selection of stimuli (which should include all values of the scale,
including extremes), and are supposed to create their own anchors on
the basis of these stimuli.

Again, very little information is available on how people reach the
decision on the number of anchoring samples.

It's actually more a stats problem. My question is:

If I have a set of 156 vowels and each vowel is rated on 12 parameters
on a scale from 0 - 4, how many vowels should be in my training set,
so that I can say with 95% probability that the scale begin- and end
values (i.e 0 and 4) for each parameter are included in that set?

Apologies once again for the lengthy e-mail, and sincere thanks to
those to took the trouble to read until the end.

Any thoughts and comments will be very greatfully received!


Christel de Bruijn


Christel de Bruijn - PhD student
University of Sheffield
Department of Human Communication Sciences
31 Claremont Crescent
Sheffield S10 2TA
United Kingdom
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue