AUTHOR: Tagliamonte, Sali A.
TITLE: Analysing Sociolinguistic Variation
SERIES: Key Topics in Sociolinguistics
Tara Sanchez, Department of Anthropology, University of Virginia
The purpose of this book is to document the methodology, first developed by
Labov in the 1960s, underlying the formal (quantitative) analysis of
sociolinguistic variation. Up to now, it has been propagated orally (e.g. from
Labov to his student, Shana Poplack, then to her student and the author of this
volume, Sali Tagliamonte). The book is a ''manual of best practice'' (p. x), and
includes insider 'notes' and 'tips' throughout, with exercises at the end of
Chapter 1, Introduction, begins by situating sociolinguistics and variationism
within the field of linguistics. It continues with a description of three
essential tenets of variationism (ordered heterogeneity, the ubiquitousness of
language change, and the relationship between language and social identity), as
well as other key concepts (e.g. vernacular, speech community, form/function
asymmetry, principle of accountability).
Chapter 2, Data collection, outlines various sampling techniques, giving the
pros and cons of each. It also gives advice on sample design and fieldwork
ethics. Chapter 3, The sociolinguistic interview, describes the interview
modules developed by Labov and colleagues for use in Philadelphia, and discusses
how to adapt them to other communities. There are lots of 'how-to' tips here.
Chapter 4, Data, data and more data, talks the reader through creating a corpus
- keeping records of tapes and interviewees, transcription, and searching the
Chapter 5 defines the linguistic variable, the key element of sociolinguistic
analysis as it exists in the various levels of the grammar of language, from
phonetics to discourse-pragmatics. It covers everything one needs to know about
working with variables: how to recognize them, how to select one for analysis,
how to find variable contexts, etc. Chapter 6 explains the process of coding
data, again with examples from each level of grammar. Great detail is given
regarding selecting and operationalizing linguistic factor groups and factors,
but social factors are also discussed, as well as larger issues, such as in what
order to code the speakers in one's sample.
Chapter 7, The variable rule program: theory and practice, introduces variable
rule analysis via computer program. The author situates Goldvarb (the latest
instantiation; originally called Varbrul) within the theoretical climate from
which it originated, then various aspects of the program and its output are
explained. This is not a how-to chapter; rather, it explains the statistical
information one must understand before attempting to use the program. The
chapter contains a list of FAQs for easy reference.
Chapter 8, The how-to's of a variationist analysis, introduces the reader to the
practical workings of variable rule analysis. Since several versions of the
computer program are available, the focus is on elements common to all, such as
token, condition, cell, and results files, the relationships between them, and
how to read and/or create each. Tagliamonte also explains how to check for
(cross-tabulate) and deal with (recode) interaction among (external) factors and
factor groups in the data, and the importance of recordkeeping. Chapter 9,
Distributional analysis, takes the reader through the right way to obtain the
overall and factor-by factor distributions of the dependent variable, since
errors are common here, especially among beginners. Cross-tabulations of
internal factor groups are also covered. Chapter 10, Multivariate analysis,
describes the binomial one-step and step-up/step-down analyses, and explains how
to interpret measures of model fit, how to spot interaction, and how to deal
with statistical 'error'.
Chapter 11, Interpreting your results, shows how three aspects of the
statistical analysis may be employed to explain the data: statistical
significance of factors, relative strength of factor groups, and the overall
constraint hierarchy. Also discussed are comparing the distribution of variants
of a particular variable in different dialects via statistics, and how to report
results (e.g. in tabular format). The final chapter, Chapter 12, Finding the
story, provides suggestions for communicating one's work orally, as at a
conference, and in written form, as in a journal article.
There are 66 terms defined in the glossary and 154 references. The companion
contains a scattergram and 5 appendices: Appendix A - project information letter
(for community members) and informed consent form, B - interview schedule, C -
transcription protocol, D - verb coding, and E - step-up/step-down results for
This is an important and long-overdue contribution to the field. Most of this
knowledge has indeed been passed down by word of mouth, and the time has come
for it to be collected and documented. To illustrate, I found in this book a
number of tips that were passed on to me directly from Labov, as well as some
unfamiliar ones which must have originated in some branch of the variationist
family tree to which I was not, until now, privy. The book contains a myriad of
practical information and advice that could only come from an insider with lots
Though it is not specifically presented as a textbook - more of a ''teach
yourself'' or user's guide - this book is entirely appropriate for use in a
graduate-level methodology course. I used it as such with advanced
undergraduates in ''Introduction to (Variationist) Sociolinguistics'' (Linguistics
Program, Williams College) with great success. The style is straight-forward and
accessible, statistical procedures are exemplified by the author's own studies,
and the exercises offered throughout can be incorporated into class activities.
Especially useful and appropriate along these lines is the introduction: in a
mere 14 pages, variation theory is succinctly contextualized within the rest of
linguistics, important concepts are introduced (the principle of accountability,
circumscribing the variable context), and there are even examples of
sociolinguistic variables from all levels of language structure, phonology to
discourse/pragmatics, from the author's own York English Corpus (Tagliamonte
1998). This is the best introduction to the field I have come across.
The only thing preventing this from being a true teach-yourself book is the lack
of a basic overview of the multivariate statistical process, and the lack of
referral to one. For the uninitiated, it is difficult to follow Chapters 8-10
because Tagliamonte goes into such detail about how to get everything right at
each step (for example, performing cross-tabulations to check for interaction,
recoding, etc.) without giving the reader an idea of what the ''big picture'' or
major steps of the ultimate analysis will entail. This was the only point where
my students struggled with the text. Once I gave them an overview of the general
steps - something like: check tokens, load cells to memory and get marginals,
perform a step-up step-down, perform a binomial analysis - they devoured the
otherwise very well-written chapters on how to do multivariate analysis.
Some of the information that might be found in such an overview is presented in
the beginning of Chapter 8, in the context of explaining parts of the
statistical program (e.g. token file, condition file, etc.) Unfortunately, this
topical presentation did not have the intended effect for my students. For
future editions, I recommend an additional section after Chapter 7 (or perhaps
in the middle of Chapter 8, just before ''How to write condition files'') that
outlines chronologically the procedures to be introduced in 8-10. Alternatively,
the reader could be referred to the latest online Goldvarb manual specifically
for this purpose
author makes reference to the online manual, but not for this specific purpose.)
This book is remarkably thorough, especially considering its brevity, but
brevity sometimes has a cost. At some points, I felt that not enough explanation
1. Several definitions of 'vernacular' are given (pg. 8), but there is not one
definition of 'speech community'. The relevant section says only, ''...variation
analysis requires that the analyst immerse herself in the speech community,
entering it both as an observer and a participant'' (pp. 8-9). Even if space does
not permit mention of multiple leading views on what a 'speech community' is,
there could at least be reference to an existing discussion for further reading
(e.g. Patrick 2002).
2. Tagliamonte says that it's ''important to include'' (pg. 43) in interviews the
''tried-and-true'' questions from other studies, such as the 'danger-of-death'
question. Presumably this is because such questions are likely to continue to
elicit good data, but no rationale is actually articulated. Perhaps she believes
that including these questions leads to greater comparability between studies.
Perhaps there is some other reason.
3. Similarly, she discusses how she assigns pseudonyms to speakers, but not why
this step is necessary. Why not refer to speakers as numbers, or initials, or
nonsense names like 'Mickey Mouse'? I completely agree that the use of
pseudonyms is preferable; I just would have liked to have seen an argument for
it, especially since more than a few researchers use only numbers or initials.
4. The author advocates transcribing all interviews before proceeding with any
analysis of data, including even the selection of a variable for study. Complete
transcription is not always practical, however. And for whatever does get
transcribed, there is no discussion of coding for speech style or separating
narrative speech from other styles. Future editions (or perhaps a discussion
group on the website?) might benefit from at least a listing of the pros and
cons of making a well-documented database vs. proceeding, at least initially,
without complete transcription.
5. In explaining how the step-up/step-down works, Tagliamonte states that a
certain factor group is selected because ''it results in the log likelihood
closest to zero'' (pg. 142), but she does not explain the relevance of log
likelihood here (and according to the index, this is the first mention of log
likelihood in the book). Researchers would benefit from understanding what a log
likelihood closer to zero means, whether an explanation is provided in the text
or the reader is referred elsewhere.
6. Chapter 8 describes how to include or exclude specific tokens from an
analysis, but not why one might want to do so. This could be taken care of with
the procedural overview that I mentioned earlier.
7. With regard to the one-step analysis, Tagliamonte advises the reader to,
''Simply find the error values that are noticeably higher than the others'' (pg.
220). However, the significant value for chi-squared (df = 1, p < 0.05) is 3.84,
and any error per cell greater than this is significant and therefore must be
dealt with (Paolillo 2002).
8. Recording devices are not addressed. It's true that technology changes fast,
but as is done in other parts of the book, it would be helpful to provide
suggestions for where to seek further information (e.g. Plichta (2002),
9. The inclusion of a companion website is a great idea, and readers will surely
find this supplemental information useful. However, the only mention of the
location of this website is on the back cover. At first mention in the text (pg.
55), it would be extremely useful to include the url, or at least refer the
reader to the back cover for this information, so that my own experience is not
repeated: After searching through the book for the url multiple times to no
avail and not thinking to check the back cover, I was forced to locate the
website myself through Cambridge.
The only other work to which this book can be compared is Paolillo's (2002)
_Analyzing Linguistic Variation_. Despite the similar titles, they are quite
different. Paolillo, a computational linguist, describes the statistical theory
behind VARBRUL (and other quantitative tests such as chi-squared and logistic
regression) in great detail; one of his goals is to extend variationist
methodology to other areas of linguistics. Because of this, he does not address
how to conduct a complete variationist project. His discussion of sampling, for
example, is centered on statistical appropriateness without addressing
social/practical aspects of soliciting interviews. Tagliamonte, on the other
hand, writes specifically from the sociolinguistic, rather than computational,
perspective. She covers every aspect of variationist sociolinguistic studies,
from entering the community to writing up findings. I would hope that all
variationists have both the statistical knowledge covered by Paolillo and the
practical information presented by Tagliamonte. If I had to choose, though, I
would suggest (to sociolinguists) the Tagliamonte book, because of her focus on
sociolinguistics and the ''how-to'' of the entire project and analysis.
Finally, I found the following typos:
page 4: ''Unfortunately, because it is such a expansive field of research...'' -->
page 75: [first line in example box is not indented like the other lines]
page 86: ''In fact, the contrast between categorical variable contexts are
diagnostic of structural differences in language.'' --> ''...the contrast between
categorical and variable contexts is ...''
page 145: ''In the normal case, the best step-up and step-down stops discarding
the groups that were added in the step-up analysis.'' --> ''...the best step-down
page 241-2: ''Conversely, if two varieties do not share the same constraint
hierarchies, then such kinship may been ruled out, at least for the linguistic
variable under analysis.'' --> ''...may be ruled...''
To reiterate, this is a much-needed contribution to the variationist literature,
useful as a guide for individual researchers and as a text in methodology
classes. Now that this wisdom has been collected and published, others working
in this paradigm will no doubt have their own tips to add. Perhaps these can be
incorporated into the companion website and/or later editions of the book. There
are a handful of typos and some places where further explanation is desirable,
but these do not detract from the overall value of this volume. In fact, my
requests for more information should be viewed as high praise - there were
certainly no larger themes to take issue with. Everyone who uses variationist
methodology should have a copy, and theirs will no doubt quickly become as
dog-eared and worn out as my own.
Paolillo, John. 2002. _Analyzing Linguistic Variation: Statistical Models and
Methods_. Stanford, CA: Center for the Study of Language and Information.
Patrick, Peter. 2002. The speech community. In Jack Chambers, Peter Trudgill,
and Natalie Schilling-Estes (eds.), _The Handbook of Language Variation and
Change_. Malden, MA: Blackwell. 573-597.
Plichta, Bartek. 2002. Best practices in the acquisition, processing, and
analysis of acoustic speech signals. In Daniel Ezra Johnson and Tara Sanchez
(eds.), _U. Penn Working Papers in Linguistics_ 8 (3): 209-222.
Tagliamonte, Sali. 1998. York English Corpus.
ABOUT THE REVIEWER
Tara Sanchez is a Lecturer in Anthropology at the University of Virginia. She is
a graduate of the University of Pennsylvania, and a former student of William
Labov and Gillian Sankoff. Her research brings a variationist perspective to the
field of language contact.