AUTHOR: Rasinger, Sebastian M.
TITLE: Quantitative Research in Linguistics
SUBTITLE: An Introduction
SERIES: Research Methods in Linguistics
PUBLISHER: Continuum International Publishing Group Ltd
Thomas Hoffmann, English Linguistics, University of Regensburg (Germany)
In many areas of modern linguistics, quantitative data play an increasingly
important role, a fact which obviously leads to a demand for textbooks
introducing students and junior researchers to the topic. Sebastian Rasinger's
''Quantitative Research in Linguistics'' explicitly tries to meet this demand by
providing ''an introduction to quantitative research methods [...] aimed at those
with a minimum of prior knowledge'' (p. 1). The book consists of ten chapters,
grouped into two parts: part I (chapters 2-4) gives a first overview of basic
issues in quantitative research as well research and questionnaire design. Part
II (chapters 5-9) then deals with descriptive and exploratory statistical data
analysis using Microsoft Excel. Most chapters contain exercises, the answers to
which can be found in chapter 10 ''Appendix and Solutions'', which also provides a
list of the Excel functions used in the book and several statistical
The book opens with a brief introductory chapter (pp. 1-5), in which Rasinger
stresses the importance of a basic knowledge of quantitative research methods
for students and researchers and outlines the structure of the book.
The first chapter of part I (ch. 2 ''Quantitative Research: Some Basic Issues'';
pp. 7-34) discusses the difference between qualitative and quantitative data.
While Rasinger notes that the former data give rise to questions of ''how
something is'' (p. 11), the latter are said to be investigated by questions such
as ''how much or how many there is/are of whatever we are interested in'' (p. 10).
On top of that, qualitative research is characterised as inductive and
hypothesis-generating, while quantitative approaches are claimed to be deductive
and hypothesis-testing (p. 11-2). Next, various issues concerning ''variables''
are addressed (pp. 18-27), including amongst others their measurement,
definition, operationalisation and levels of measurement (using the widely-used
categorical, ordinal, interval, ratio scale distinction). The chapter closes
with a discussion of reliability/validity (pp. 28-31) and the relationship
between hypotheses, laws and theories (pp. 31-4).
In chapter 3, Rasinger then turns to ''Research Design and Sampling'' (pp. 35-55).
In essence he divides the various research designs into those which structure
research in terms of temporal order (longitudinal vs. cross-sectional designs)
and those which allow for ''explicit and deliberate manipulation of variables''
(p. 36; i.e. experimental and quasi-experimental designs). Giving the example of
vocabulary growth in first language acquisition, he points out that longitudinal
studies require measurements ''on several (at least two) occasions'' (p. 38). In
contrast to this, cross-sectional designs such as Labov's (1972) rhoticity study
in New York entail data collection at one point in time (pp. 36-8). Furthermore,
he shows how the sociolinguistic notion of apparent time can be used to
interpret the results from synchronic cross-sectional studies as evidence for
linguistic change (p. 41). Following a review of central issues of experimental
(experimental vs. control group / between-subject vs. within-subject design) and
quasi-experimental design (pp. 42-5), Rasinger then turns to the question of
sampling: he first of all sketches the relationship between population and
sample, and after that discusses the pros and cons of several sampling
techniques (random/probabilistic vs. non-random/non-probabilistic samples, the
latter being differentiated into opportunity and convenience samples; pp.
45-52). The final section of the chapter deals with ethical guidelines in
research (such as seeking participants' consent or allowing subjects to withdraw
at any time; pp. 52-5).
''Questionnaire Design and Coding'' (ch. 4; pp. 56-83) is the topic of the next
section. As the title suggests, in addition to ''general guidelines on how to
design a questionnaire'' (p. 57) the chapter also gives information on ''how to
prepare questionnaire-based data for [statistical] analysis'' (p. 57). First,
Rasinger points out that the basis for any good questionnaire is a clear and
precise research question. He then goes on to describe multiple choice/item
questions (pp. 59-61), the measurement of attitudes and beliefs (focusing on
elicitation instruments such as semantic differentials and Likert scales; pp.
61-3), pitfalls and problems in question phrasing (pp. 63-7), and the role of
piloting (pp. 67-9), layout (pp. 69-71) and the number and sequence of questions
(pp. 70-1). Finally, a sample questionnaire (pp. 76-82) is provided and discussed.
Part II of the book moves on to statistical data analysis, with chapter 5 (''A
First Glimpse at Data''; 87-109) dealing with descriptive statistical concepts
such as absolute and relative frequencies (pp. 89-93) and ''classes, width and
cumulative frequencies'' (pp. 93-8). Besides, particular emphasis is placed on
the visualisation of data by graphs (pp. 98-109. For this, Rasinger draws on
data from various linguistic studies to illustrate the use of bar and pie charts
(namely Labov's 1972 study on cluster simplification as well as Wolfram's 1969
and Trudgill's 1974 studies on non-standard features in Detroit and Norwich,
respectively; pp. 100-3), line graphs (based on data from Hirsh-Pasek and
Golinkoff's 1996 fixation time study investigating children's processing of verb
argument structure, pp.105-7) and scatter plots (this time using a ''fictive''
data set; pp. 107-9). (In fact, it should be pointed out that most data sets
discussed in the book are from actual linguistic studies.)
After this, Rasinger turns to measures of central tendency and dispersion (ch. 6
''Describing Data Properly -- Central Location and Dispersion''; pp. 110-36). He
explains crucial notions such as mean, median and mode together with quartiles,
quintiles and percentiles (pp. 113-23). Subsequently, he moves on to ''measures
of dispersion'' (p. 123), introducing range, variance, standard deviation,
standard error and z-scores (pp. 123-9, 133-6). The normal distribution with its
special properties is discussed in subsection 6.4. (129-32).
Despite its simple sounding title ''Analysing Data -- A Few Steps Further''
(pp.137-74), chapter 7 takes the reader with a minimum of prior knowledge for
quite a statistic ride from probability theory to multiple regression. The first
two sections explore probability issues, i.e. simple, conditional and joint
probabilities (pp. 138-44). Following this, test statistical concepts such as
chi-square tests, Pearson correlation, partial correlation, causality,
significance, simple and multiple regression and correlation and reliability are
introduced (pp. 144-74).
Finally, chapters 8 (''Testing Hypotheses''; pp. 175-94) and 9 (''Analysing Dodgy
Data: When Things Are Not Quite Normal''; pp. 195-205) complete the discussion of
statistical tests. Chapter 8 mainly focuses on the various types of t-tests (for
dependent and independent samples; pp. 178-91), but also illustrates the use of
chi-square tests for hypothesis testing (pp. 191-4). In contrast to this,
chapter 9 presents non-parametric tests for data which do not follow a normal
distribution, namely the Spearman correlation test (pp. 196-9), Kendall's tau
(p. 200), the Wilcoxon signed-rank test (pp. 200-3) and the Mann-Whitney U test
(pp. 203-5). (As mentioned above, the last chapter of the book, chapter 10 (pp.
206-23) actually only includes the Appendix.)
Since the majority of my own students have a strong humanities but limited
mathematical background, I know how difficult it is to find accessible
introductory texts on quantitative linguistics for an audience that is easily
intimidated by mathematical formulae, let alone statistical tests. Therefore,
Rasinger's ''Quantitative Research in Linguistics'' with its reader-friendly and
hands-on approach is a welcome contribution to the field. Unfortunately,
however, due to reasons mainly relating to the statistics part of the book (for
some of which Rasinger can't be held responsible at all), I would be more than
hesitant to adopt it as the textbook for any of my classes.
With any textbook an author has to make difficult decisions as to which aspects
should be focussed upon and which ignored. I think it is fair to say that in
this respect Rasinger has done a good job. Part I is an extremely accessible
introduction to the basics of quantitative research in linguistics and covers
most of the central concepts. (Though considering the prominence of quantitative
research in experimental psycholinguistics, I personally would have liked the
book to have included a chapter on experiment design, covering issues such as
stimuli and filler design, randomisation of stimuli, etc.; e.g. Cowart 1997.
Furthermore, a section on quantitative corpus linguistic research would
definitely also have been an asset; cf. Gries 2009: 173-217.) The same applies
to part II of the book. It surveys most of the basic (and even some of the more
advanced) statistical tests that any beginning researcher might need. Moreover,
throughout the book, all these topics are presented in a way that should be
easily accessible for the intended readership.
Strange as it may sound, the book's biggest problem is its date of publication.
Rasinger wrote ''Quantitative Research in Linguistics'' at a time when no hands-on
statistical textbook for linguists was available. In the same year as it was
published though, three excellent statistical textbooks appeared on the market
(Baayen 2008; Gries 2008; Johnson 2008), all of which work with the free, open
source R software (http://www.r-project.org/). This makes Rasinger's use of
Excel for statistical analysis a somewhat anachronistic choice, for several reasons.
Before going into details of these reasons, let me note that I strongly believe
that it doesn't matter which software researchers perform their statistical
analyses with, as long as the analysis is carried out in a sound and careful
way. For an introductory textbook to statistics for students, however, I feel
that Excel is now an unfortunate choice because:
1) It does not allow all kinds of statistical analyses: for multiple regression
e.g. even Rasinger himself suggests ''changing to a different software package''
(p. 169) and for some non-parametric tests like Kendall's tau he admits that
''there is no simple way of calculating it in Excel'' (p. 200). In R, however, all
of these (and many more) tests can easily be carried out ( e.g. Baayen 2008:
165-240; Gries 2008: 150). Since I would not recommend confusing students by
first teaching them how to do statistics in Excel and then in R, I think it
makes much more sense to start with R straightaway (especially since Excel's
syntax is not really that much simpler than R's).
2) Moreover, all of three textbooks using R (Baayen 2008; Gries 2008, Johnson)
are also written in a very accessible style (though Baayen's book is a somewhat
more demanding read and an English version of Gries's book will not appear until
later this year) and provide an even more thorough statistical introduction than
Rasinger's book (which is the only linguistic introduction to statistics using
Excel that I am aware of).
3) Another reason why I would prefer R over Excel (or SPSS) is the fact that the
data doesn't have to be recoded. Unlike Excel (or SPSS; cf. pp. 71-5), factors
such as ''gender'' with levels ''male'' and ''female'' do not have to be recoded into
numbers (such as ''1'' and ''2''). This minimises the danger of beginners
erroneously treating factors as numerical variables (which would invalidate
their statistical analysis).
4) Finally, Excel and SPSS are commercial software packages, while students and
researchers can download R for free (for further advantages of R; cf. e.g.
Baayen 2008: x-xiii).
As pointed out above, since none of the R textbooks were available to Rasinger,
he can obviously not be blamed for his choice of Excel as his statistical
software package. However, from an instructor's point of view, these reasons
would lead me not to adopt ''Quantitative Research in Linguistics'' as a textbook
for any of my courses (since I wouldn't be able to use half of the book).
Besides the software issue, however, the statistical section of the book also
contains a couple of slips and mistakes, which would make me question its use as
a) In his discussion of data coding in chapter 4.8. Rasinger e.g. suggests
filling in ''999'' for a missing ''age'' value (p. 73-4). This is not only
unorthodox, but simply wrong (since it changes the mean age of the sample from
25 to 219.8).
b) The presentation of the chi-square test (pp. 144-9) is flawed in several
respects: first, Rasinger claims that ''the chi-square test only works reliably
when the minimum count in each cell is 5'' (p.148). Yet, it is not the observed
frequencies but the expected ones that must meet this criterion (Gries 2008:
157). On top of that, he does not mention that the 2x2 table data he presents
actually require a Yates-corrected version of the chi-square test (something
that R automatically adjusts for) and he fails to point out that the
significance p-value of such tests crucially depends on sample size (i.e. that
larger data sets automatically yield more significant results, so that the
p-value cannot be interpreted as the size of an effect; cf. Baayen 2008: 114-6;
Gries 2008: 178).
c) In the section on multiple regression, coefficients and probability values
are again presented as indicators of effect size (p. 170-1), with no indication
that only z-scaled coefficients (Gries 2008: 260-1) allow a comparison of effect
size (since the size of a coefficient crucially depends on the scale of the
independent variable in question) and that the p-values are dependent on sample
size (cf. above).
d) While it is mentioned that data which do not follow a normal distribution
require nonparametric tests (p. 195), the author doesn't really explain how one
can test data with respect to this criterion. Again, however, the required
tests, i.e. the Shapiro-Wilk test for normality or Kolmogorov-Smirnov one-sample
test, can easily performed in R (using the functions shapiro.test() and
ks.test(), respectively; cf. Baayen 2008: 73). The omission of these tests is
particularly unfortunate since the validity of multiple regression analysis
crucially depends on the fact that the residuals and their variances follow a
normal distribution -- again something that is not mentioned by Rasinger.
Let me conclude by pointing out again that despite the largely negative tone of
the above comments, ''Quantitative Research in Linguistics'' is in fact a solid,
easy-to-read introduction to quantitative linguistic research. However, mainly
because of the recent publication of so many excellent statistics textbooks, I
do not think this is going to become one of the main textbooks in the field.
Baayen, R. H. 2008. Analyzing Linguistic Variation: A Practical Introduction to
Statistics Using R. Cambridge: Cambridge University Press.
Cowart, W. 1997 Experimental Syntax: Applying Objective Methods to Sentence
Judgements. Thousand Oaks: Sage.
Gries, St. Th. 2008. Statistik fuer Sprachwissenschaftler. (Studienbuch zur
Linguistik 13). Goettingen: Vandenhoeck & Ruprecht.
Gries, St. Th. 2009. Quantitative Corpus Linguistics with R: A Practical
Introduction. New York: Routledge.
Hirsh-Pasek, K. and R. M. Golinkoff. 1996. The Origins of Grammar: Evidence from
Early Language Comprehension. Cambridge, MA: MIT Press.
Johnson, K. 2008. Quantitative Methods in Linguistics. Malden, MA and Oxford:
Labov, W. 1972. Sociolinguistic Patterns. Philadelphia: University of
Trudgill, P. 1974.The Social Differentiation of English in Norwich. London:
Cambridge University Press.
Wolfram, W. 1969. A Sociolinguistic Description of Detroit Negro Speech.
Washington, DC: Center for Applied Linguistics.