Date: 28-Mar-2017
From: Luciana Forti <>
Subject: The Idiom Principle and L1 Influence
AUTHOR: Ying Wang
TITLE: The Idiom Principle and L1 Influence
SUBTITLE: A contrastive learner-corpus study of delexical verb + noun collocations
SERIES TITLE: Studies in Corpus Linguistics 77
PUBLISHER: John Benjamins
YEAR: 2016

REVIEWER: Luciana Forti, Università per Stranieri di Perugia



“The Idiom Principle and L1 Influence. A contrastive learner-corpus study of delexical verb+noun collocation” by Ying Wang is an extremely valuable study set at the interface between Learner Corpus Research (LCR) and Second Language Acquisition (SLA). To the best of my knowledge, it is one of the few attempts to study L1 influence in L2 production solely on the basis of a systematic comparison among relevant corpora, in the context of English as a foreign language acquisition.

The study analyses the delexical uses of six high-frequency verbs: have, get, give, take, make and do. These are verbs that are central to the earlier stages of foreign language learning; they are highly frequent, but at the same, display a certain degree of idiomaticity, which makes them not always entirely transparent on a semantic level. In combinations such as have trouble, make a decision, take turns (p. 1), the meaning of the verb is not entirely literal, and most of the meaning rising from the combination is determined by the noun. Furthermore, nouns in different languages often select different verbs, and this may increase the difficulty level when acquiring a second language in terms of tendency to rely on the L1, thus producing more instances of L1 influence (Gilquin, 2007).

The analysis is conducted on five corpora: two learner corpora (L2), one containing texts from Swedish learners and the other containing texts from Chinese learners; one target language corpus (TL), created by merging a British English corpus and an American English corpus; and two native language corpora (L1), one for Mandarin Chinese and the other for Swedish.

The author investigates three main research questions:

What is the frequency of verb+noun collocation use in the L2 corpora compared to the TL and L1 corpora?

How do noun collocates and morphosyntactic patterns change when comparing L2 corpora to the TL and L1 corpora?

What kinds of collocational errors are produced in learner texts, and how can they be interpreted?

As a reflection of these three research questions, after a first chapter detailing the characteristics of the corpora and methodology used in the study, the book can be seen as divided into three sections: a quantitative analysis section, coinciding with Chapter 3; a qualitative analysis section, dealing with noun collocates (Chapter 4) and morphosyntactic features (Chapter 5); a second qualitative analysis part, dealing with error analysis (Chapter 6).

A brief introduction places the study at the intersection between studies on L1 influence and the so-called Integrated Contrastive Model, which integrates Contrastive Interlanguage Analysis (CIA) and Contrastive Analysis (CA) (Gilquin, 2008; Granger, 1996).

Chapter 2, “Data and methodology”, discusses how the corpora were selected and the noun+verb collocations identified. In the first case, each corpus is described in terms of size and elicitation tasks used to create it; in the second case, a set of four identifying criteria are defined and discussed in light of examples extracted from the corpora.

Chapter 3, “Frequency of occurrence”, analyses the overall frequency of delexical verb+noun collocation across the five corpora. The highest observed frequencies are in the learner corpora, followed by the L1 corpora, and finally by the TL corpus. These results are interpreted in terms of developmental features, such as over-reliance on known (i.e. “teddy bear principle”), L1 influence, and typically spoken style of learner writing. The second part of the chapter analyses the frequencies of the six groups of verb+noun collocations, providing a more fine grained picture of the ways in which Chinese and Swedish learners differ in terms of preferences. Compared to native speakers, target language learners seem to employ a less varied spectrum of verb+noun collocations, for example, Chinese learners combine have and take with a greater variety of noun collocates, while Swedish learners do so with get, make and give.

Chapter 4, “Noun collocates: Lexical patterns of the six verbs”, explores the collocational profiles of the verbs under study in terms of their noun collocates. To this end, data is extracted from the British National Corpus (BNC), which is able to reflect, more closely than the TL corpus, the range of input that the learners are likely to have been exposed to. The noun collocates of each verb are classified according to the following semantic fields: cognition (e.g. have a feeling), communication (e.g. have a chat), specific act/activity (e.g. have a drink), event (e.g. have an accident), manner (e.g. make an effort), transfer of possession/transaction (e.g. make a donation), science/education (e.g. do research), illness (e.g. have a stroke), outcome (e.g. make a success of), reasoning/information (e.g. have evidence), attributes of people and objects (e.g. have a right), state of affairs (e.g. have a chance). Only the first 100 noun collocates emerging from the log-likelihood and MI lists are considered and unclassifiable combinations are considered only if salient.

The author concludes that the verbs observed in the BNC are not entirely delexical, as they display clear patterns of semantic preferences. In this respect, certain semantic fields such as nouns denoting an act (e.g. have a look) are underrepresented in the learner corpora.

Chapter 5, “Morphosyntactic features of delexical verb + noun collocations”, provides an overview of the morphosyntactic features of the collocations under study, through a comparison among the five corpora, and considering three perspectives: the morphology of the noun, the characteristics of the determiner, and the nature of post modification devices. In the first case, noun collocates are classified into the categories of zero-derived (e.g. make a change > change), derived (e.g. make a choice > choose) and other (e.g. to make an effort > Ø; i.e. the noun is not morphologically related to the verb). The observed distribution of the categories is largely similar across the five corpora.

A similar kind of distributional analysis is conducted with respect to the different types of determiners that may appear in a verb+noun collocation: a/an, the, Ø, quantifier, possessive, no, some/any, other. This analysis is particularly useful if we consider the impact that different degrees of lexicalisation may have on learning processes. Several factors influencing the use of determiners are detected: L1 influence, the presence of premodifiers or postmodifiers, the frequency of some TL expressions, and the degree of lexicalisation or frozenness of the collocation, which seems to influence the degree of L1 influence in the sense that a more fixed combination will be less susceptible to L1 influence (Ježek, 2016). Post modification is analysed according to whether it involves a phrase or a clause. The main observed differences between the two learner corpora are related to finite clauses.

Chapter 6, “Errors and unidiomatic usage”, provides an error analysis of the verb+noun collocations extracted from the two learner corpora according to two main error categories: grammatical errors, pertaining to the determiner, the number, the affixation and the postmodifier, and lexical errors, pertaining to the verb, the noun, the premodifier, and awkward combinations.

A first brief overview of errors in general display some similarities and differences in the two learner corpora; in both of them, get + noun collocations appear to be the most problematic. Some differences follow: have + noun collocations are significantly more difficult for Chinese learners, who in turn have not as many issues with do+noun collocation, as Swedish learners do. The subsequent paragraphs deal with the specific error categories in relation to the six verbs under study. For each verb, the largest categories of error are discussed and exemplified through examples. Overall, a scale of collocation difficulty emerges: take and get + noun collocations are the most challenging; take and give + noun collocations are the least challenging; and make and do + noun collocations sit somewhere in the middle.

The final chapter briefly offers a summary of the results by providing answers to the research questions that initiated the study. In relation to RQ1, both learner groups seem to overuse a limited range of collocations in comparison to native speakers. In relation to RQ2, the semantic patterning of the verbs in the reference corpus is only partly reflected in the learner corpora, which often display culture-specific preferences. In terms of syntactic features, both learner corpora appear to be closer to the TL corpus than to the L1 corpora, indicating a low degree of L1 influence. In relation to the RQ3, similar kinds of errors were detected in both learner groups, though Swedish learners reported a slightly lower proportion of errors compared to their Chinese counterparts (16% vs. 23%).

The final paragraphs underline the importance, in teaching, of focusing on high frequency verbs even at intermediate to advanced proficiency levels, contrary to the assumption that these verbs are learned early and are relatively easy to use. The range of collocational features they display, together with the varied error typology observed in the learner corpora, show that they do indeed merit further attention in teaching practices and pedagogical materials design. All differences between pairs of corpora are evaluated statistically through p values.


This an excellent contribution to the intersecting fields of Learner Corpus Research and Second Language Acquisition. The results of the analysis are clearly described and systematically discussed in light of the most relevant and recent theories. This clarity and systematicity makes the reading of the book particularly enjoyable. The absence of a lengthy theoretical introduction is also a definite plus: the choice of using the theory to discuss the actual results based on the data-analysis, instead of meticulously setting the scene before seeing any of the data, is certainly to be appreciated.

The book provides a total of 268 examples, progressively numbered so as to unify the different sections of the book. The plethora of examples effectively supports the description and discussion of the single phenomena, making the reader constantly aware of what lies behind the numbers displayed in the various tables. This aspect is key in nurturing the overall clarity of the book.

The limitations of the study are clearly pointed out by the author.
One of these concerns the nature of the learner corpora used in the study. They both include texts elicited as take-home essay assignments, with the learner having access to reference tools. Furthermore, while recognising a high degree of internal variability in regards to the proficiency level of the learners, no standardised proficiency level test is carried out to differentiate them, partly because of the difficulty of finding one that would be suitable for both learners. Ultimately, however, what all learners have in common is having studied English as a foreign language for a similar amount of time. Ultimately, the measure that all learners whose texts are collected in the corpora being used in this study is that of having studied English as a foreign language for a similar amount of time.

Despite the title of the book, the issue of L1 influence is addressed only intermittently throughout the discussion of the results.There doesn't seem to be an overall quantitative and qualitative account of the phenomenon. We do not know what proportion of errors and unidiomatic uses of the collocations is attributable to L1 influence, with respect to the different perspectives adopted (semantic preferences, noun collocates, morphosyntactic features, errors). A more explicit account of this aspect is probably what the reader would have expected when reading the title of the book although the actual research questions that open the book shift the focus and effectively clarify the scope of the study.

Another inevitable limitation of the study, which does not affect the quality of the book, lies in the consideration that L1 transfer, in its essence, is a psychological phenomenon. Learner corpora provide us with production data that is influenced by many different individual factors. As argued by Walter Belardi, one needs to reflect on where the linguistic influence takes place; production data, that is execution data, is one way in which the influence manifests itself, while taking place in the speaker’s individual competence (Belardi, 1990). This consideration carries a number of implications for corpus-based studies dealing with language influence.

In this study, corpora made of different speakers are compared. This reflects the scarcity of resources currently available.

In conclusion, the study is excellent and extremely valuable for both scholars interested in corpus-based error analysis and contrastive interlanguage analysis, as well as for those working in the field of cross-linguistic influence and wanting to explore different methodological avenues. In this respect, the book does a superb job in showing what kind of knowledge corpora can contribute to the understanding of such classic themes in SLA as is L1 influence.


Belardi, W. (1990). Il luogo dell’interferenza linguistica. In Linguistica generale, filologia e critica dell’espressione (pp. 57–68). Roma: Bonacci.

Gilquin, G. (2007). To err is not all. What corpus and elicitation can reveal about the use of collocations by learners. Zeitschrift Für Anglistick Und Amerikanistick, 55(3), 273–291.

Gilquin, G. (2008). Combining contrastive and interlanguage analysis to apprehend transfer: detection, explanation, evaluation. Language and Computers, 66(1), 3–33.

Granger, S. (1996). From CA to CIA and back: An integrated approach to computerized bilingual and learner corpora. In K. Aijmer, B. Altenberg, & M. Johansson (Eds.), Languages in Contrast: papers from a Symposium on text-based cross-linguistic studies : Lund 4-5 march 1994 (pp. 37–51). Lund: Lund University Press.

Ježek, E. (2016). The Lexicon: an introduction. New York, NY: Oxford University Press.


Luciana Forti is a PhD candidate in Applied Linguistics at the University for Foreigners of Perugia, Italy. She holds a BA and MA in Linguistics and Applied Linguistics, both earned at Sapienza University of Rome, Italy. She has worked on the corpus-based analysis of non-native language, in relation to adversative discourse markers and L1 influence. Her doctoral project deals with the use of corpora in Italian as a second language learning and teaching, with a focus on the acquisition of verb+noun collocations. She is part of the Learner Corpus Association. She is also a CELTA qualified EFL teacher.

