LINGUIST List 9.1461

Tue Oct 20 1998

Sum: GoldVarb (addendum)

Editor for this issue: Brett Churchill <>


  1. Mario Cal Varela, Sum: GoldVarb (addendum)

Message 1: Sum: GoldVarb (addendum)

Date: Tue, 20 Oct 1998 17:15:09 +0200
From: Mario Cal Varela <>
Subject: Sum: GoldVarb (addendum)

Dear all,
After having posted my summary of responses to a query on "GoldVarb" a week
ago, I received a message from Robert Sigley commenting on some of the
points raised there. I thought I might just as well forward his message to
linguist as something to be appended to the summary posted on October 13th.

>1) Preston is absolutely right about the procedure, and about the need for
>linguistic motivation when collapsing factors. You test significance of
>collapsing factors, or of removing entire factor groups, by comparing the
>log-likelihood values, and calculating a value for chi-squared (actually,
>G2) based on twice (the difference in log-likelihood). (The
>'step-up/step-down' runs automatically perform the series of tests required
>to add/remove entire factor groups; but you need to do the comparison
>yourself when trying to simplify factor groups by collapsing factors, or
>when trying to test significance of interaction effects.)
>2) Avila is also correct in her description of the procedure. My own
>account differs in one minor detail -- I have assumed that the "input
>weight" counts as an estimated parameter, so that I list the number of
>degrees of freedom for each model as "(number of factors) - (number of
>factor groups) + 1". I may be wrong (and if somebody *knows* I'm wrong,
>please tell me!). Still, it doesn't matter, as you take the *difference* in
>degrees of freedom between the two models you compare, so the "+1"s cancel
>>Finally, Ron Smyth calls my attention to two limitations of the variable
>>rule applications that could perhaps be commented on by more
>>statistically-oriented researches than myself. The first one has to do with
>>the fact that when the design has several factors, the output of the
>>program does not give any information about some of the interactions. The
>>other is that the program seems to handle nicely data with very few
>>subjects per cell, where other applications would not give out anything
>>significant. That is, GoldVarb does not keep track of subjects and seems to
>>disregard individual differences.
>Let's take this a point at a time:
>GoldVarb does not freely give information about *any* interaction effects.
>However, you can still use the GoldVarb output to test for their presence
>-- and my PhD chapter describes two such methods (quite apart from the
>tedious and often unilluminating procedure of looking for high values of G2
>in individual cells).
>First: if you construct a model containing just 2 factor groups, and this
>model provides a poor fit to the data (choose "Show model fit" under the
>Cells menu before doing the analysis), then this means that a model
>treating these factor groups as independent provides a poor fit -- and
>therefore there is some dependence between them. However, this could result
>from several sources --
>a) some third significant factor (which you may or may not have encoded!)
>is inequitably distributed across the cells of the two-group model you're
>b) there really is an interaction effect.
>so this is only a rough indication of whether you should look for an
>interaction effect. What we *can* say is that if the model is a good fit,
>there's probably no interaction effect.
>It's very easy to carry out the entire set of such 2-group comparisons by
>running them as part of a step-up/step-down analysis. And if you do this
>first, then you only need to apply the second, more difficult test to the
>much smaller set of 2-group comparisons that give significant results.
>Second, and more definitive (but more difficult):
>* Take the model containing all the groups you've encoded (the "full-groups
>* Note the log-likelihood, and the number of degrees of freedom.
>* Then replace two factor groups with a new single factor group containing
>their crossproduct (ie, every possible combination of factors is
>* Note the new log-likelihood, and number of degrees of freedom.
>* Conduct the chi-squared test as Avila describes.
>If this test result is significant, that tells you that the crossproduct of
>factors is more informative than treating the factor groups independently
>-- from which you can infer that there is a significant interaction effect.
>(Assuming that you have encoded all significant influences!!!)
>It is possible to use this method to incorporate several interaction
>effects into the model -- but it quickly becomes rather cumbersome, as you
>will often have to collapse distinctions in order to include the
>crossproduct factor group, and things get really messy when you need to
>consider several interactions involving the same factor group. (I think the
>best way to treat these is stepwise: if the most significant interaction is
>between groups 1 and 2, and you suspect there's also an interaction between
>groups 1 and 3, you can only approach it indirectly by comparing models
>containing 1*2, 3, 4,...n and 1*2*3, 4,...n. By contrast, if you try
>constructing a model containing 1*2, 1*3, 4,...n then you've effectively
>encoded the distinctions from group 1 twice, which means your model has
>redundant parameters and could produce unreliable results.)
>Smyth's second point actually combines two problems:
>i) GoldVarb cannot be expected to take into account any factors (such as
>behaviour of individual respondents) which are not encoded in the model. If
>you want to test for significance of individual behaviour, you have to have
>this as a factor group. (This may be impractical if you've got a large
>number of individuals in your dataset.)
>ii) chi-squared tests (whether performed by GoldVarb or any other
>application) assume that every data point is independent. In other words,
>we assume that a speaker's choice on one occasion is not influenced by
>their choice on other occasions. Hence the 'unit of variation' is the
>single token. In datasets where many tokens have been drawn from the speech
>of one individual in one interaction, this assumption may be false, which
>will lead to significance being exaggerated. In extreme cases, it may be
>better to treat the 'unit of variation' as being each *speaker*, or even
>each *conversation*. Provided that you have a large number of tokens (at
>least 10, and preferably at least 30) per speaker, you can use
>nonparametric tests such as Mann-Whitney U or Kruskal-Wallis H to test for
>significant differences among speakers from different social groups. Such
>tests provide a useful 'sanity check' for token-based significance tests.
Mario Cal Varela
Departamento de Filoloxia Inglesa e Alemana, despacho 307
Facultade de Filoloxia
Universidade de Santiago de Compostela
c/ Burgo das Nacions s/n
Santiago 15705
tlf (981) 563100 ext. 11858
fax (981) 574646
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue