LINGUIST List 17.1540: Ling Theories/Methodology: Kepser & Reis (2005)

LINGUIST List 17.1540

Thu May 18 2006

Review: Ling Theories/Methodology: Kepser & Reis (2005)

Editor for this issue: Lindsay Butler <lindsaylinguistlist.org>

Directory 1. Elke Gehweiler, Linguistic Evidence: Empirical, Theoretical and Computational Perspectives

Message 1: Linguistic Evidence: Empirical, Theoretical and Computational Perspectives
Date: 12-May-2006
From: Elke Gehweiler <elkegehwzedat.fu-berlin.de>
Subject: Linguistic Evidence: Empirical, Theoretical and Computational Perspectives

Announced at http://linguistlist.org/issues/17/17-97.html

EDITORS: Kepser, Stephan; Reis, Marga TITLE: Linguistic Evidence SUBTITLE: Empirical, Theoretical and Computational Perspectives SERIES: Studies in Generative Grammar 85 PUBLISHER: Mouton de Gruyter YEAR: 2005

Elke Gehweiler, Freie Universität Berlin and Berlin-Brandenburgische Akademie der Wissenschaften

GENERAL DESCRIPTION

The volume 'Linguistic Evidence', edited by Stephan Kepser and Marga Reis is based on the conference 'Linguistic Evidence. Empirical, Theoretical, and Computational Perspectives' that took place in Tübingen from January 29 - February 1, 2004. It contains a short introduction by the editors and 26 papers.

SUMMARY

The introduction discusses several issues related to linguistic evidence. As the central objects of linguistic enquiry -- ''language, languages, and the factors/mechanisms systematically (co-) governing language acquisition, language processing, language use, and language change'' (1) -- cannot be directly accessed, they have to be reconstructed from the manifestations of linguistic behaviour. As there are many possible data types, e.g. introspection, corpus data, data from (psycho-) linguistic experiments, synchronic vs. diachronic data, typological data, neurolinguistic data, data from first and second language learning, data from language disorders, gaining linguistic evidence from the potentially available data is no trivial matter. Linguistic evidence is quite a new topic of linguistic discussion. Until the mid nineties there were largely two ways of gathering data. Generativists largely relied on introspective data, whereas non- generative linguists relied on informally gathered corpus data. But this has begun to change. The authors attribute this turning point to the book by Schütze (1996), who demanded a systematic approach to speaker judgements. Since then, many scholars have shown that it is necessary to control the many factors that influence speaker judgements in order to obtain more reliable data. Furthermore the size and availability of corpora has grown since the mid nineties, and with it the importance of corpora as a source of evidence. Both developments, Kepser/Reis claim, have paved the way for a rapprochement between introspective and corpus linguistics and ''[i]t is one of the main aims of this volume to overcome the corpus data versus introspective data opposition and to argue for a view that values and employs different types of linguistic evidence each in their own right. Evidence involving different domains of data will shed different, but altogether more, light on the issues under investigation, be it that the various findings support each other, help with the correct interpretation, or by contradicting each other, lead to factors or influence so far overlooked. This ties in naturally with the fact ... that there are more domains and sources of evidence that should be taken into account than just corpus data and introspective data.'' (3).

In the first article 'Gradedness and Consistency in Grammaticality' Aria Adli argues for graded grammaticality judgements. Adli criticises the fact that in theoretical studies questionable introspective judgements are quoted without prior empirical verification. One of the examples Adli discusses in detail is the case of the 'que' --> 'qui' rule in French, which is much cited in syntactic theorising. It essentially states that ''an ECP [Empty Category Principle-EG] violation can be avoided in French if 'qui' is used instead of the usual complementizer 'que' in sentences where a wh-phrase has been extracted from the subject position'' (7), and that there are clear differences in grammaticality between such sentences with 'qui' and 'que'. Using data from a controlled experiment with a graded concept of grammaticality Adli shows that the 'que' --> 'qui' rule is largely a myth and suggests that instead psycholinguistic factors are responsible for the differences in (un)grammaticality of different sentence types containing these forms.

Katrin Axel's paper 'Null Subjects and Verb Placement in Old High German' deals with Old High German (OHG) time and weather expressions without the quasi-argument 'iz' ('it') and with constructions where a referential subject is not overtly realised. Using three major prose texts as her empirical basis, she shows that earlier OHG (8th and 9th century) allowed genuine pro-drop and should therefore not be classified as a semi pro-drop language. Her data show that null subjects are (largely) restricted to root clauses in early OHG, which are distinguished from subordinate clauses by the position of the finite verb (verb-first/verb-second vs. sentence-final/sentence late). She claims that this main/subordinate asymmetry can be accounted for if we assume that null subjects are only licensed in post-finite position, i.e. ''it is highly plausible that null subjects are only licensed in configuration [sic] in which they are c-commanded by a leftward moved finite verb: [V+AGR]k [pro ... tk]]. In OHG, the only way to obtain the required configuration for null-subject licensing is verb movement to C0'' (34). Axel further suggests that the distribution of null subjects is influenced by morphological factors. In OHG there were two alternative verb endings in the 1st person plural: a short '-m' and a long '-mês'. Pronouns occurring with the short variant are virtually always overt but frequently omitted with the long ending, but only in post-finite position. Axel claims that although the Latinised writing tradition may have had a certain impact, the widely-held assumption that the omission of referential subject pronouns in earlier OHG is a foreign feature cannot be upheld as it fails to explain why null subjects were largely banned from pre-finite environments and from contexts with 1st person plural endings in '-m'. Modern Standard German does not allow referential pro-drop anymore, despite its comparatively 'rich' verbal inflection. Referring to Sprouse and Vance (1999) Axel argues that the replacement of null subjects by overt pronouns needs not be related to any grammar-internal changes, but rather to differences in parsing success, based on the assumption that utterances with null pronouns are more difficult to parse. Axel finally argues that the case of the OHG null subjects puts into doubt the assumed incompatibility of referential pro-drop and verb second. Neither does it confirm the relation between morphological richness and null subjects.

The authors of 'Beauty and the Beast: What Running a Broad- Coverage Precision Grammar over the BNC Taught Us about the Grammar - and the Corpus' (Timothy Baldwin, John Beavers, Emily M. Bender, Dan Flickinger, Ara Kim, Stephan Oepen) argue for a hybrid approach to grammar engineering (referring to Fillmore 1992). After reviewing some of the arguments for and against corpus data and introspective data they present their methodology for building a broad coverage precision grammar. In a first step they apply English Resource Grammar (ERG) to a sample of the BNC. The grammar was able to generate at least one parse for 57% of the sentences. The 43% that did not receive a parse were diagnosed and classified manually. The authors distinguished seven categories of parsing failure, which either represent gaps in the grammar (''missing lexical entry'', ''missing construction'', ''fragment''), are due to preprocessing errors or parser resource limitations, or represent noise (''ungrammatical string'', ''extragrammatical string''). They then discuss these categories further, and explain why the respective sentences could not be parsed. Missing lexical entries for example fall into two basic categories: missing lexical types for a given word token (e.g. the grammar contains the noun 'table', but not the verb) and missing multiword expressions. The authors argue that combining the two sources of linguistic evidence - using corpora as primary source of data, and enhancing and expanding that data with native speaker judgments - can be of much use to grammar developers. The corpus provides linguistic variety and authenticity, revealing new syntactic constructions, which can then be analyzed with the grammar. Here, insisting on a notion of grammaticality helps to recognise and categorise the noise in the corpus. According to Baldwin et al. ''precision grammar engineering serves both as a means of linguistic hypothesis testing and as an effective way to bring new data into the arena of syntactic theory'' (64).

In 'Seemingly Indefinite Definites' Greg Carlson and Rachel Shirley Sussmann use experimental and non-experimental methods to show that there is a sub-class of English definite articles which in their interpretations are similar to indefinite articles, such as 'the' in ''Mary went to the store'', where the identity of the store is not especially important, in contrast to 'the' in ''Mary went to the desk''. First, the authors show that weak definites have the same distributional properties as bare singular count nouns (''He was in bed''). They are lexically restricted, i.e. it is a lexical feature of the noun itself that determines whether it can function as a bare singular/weak definite, they do not allow any modification, a certain degree of semantic enrichment is added to them, they only co-occur with lexical items of certain classes, and their distributional properties preclude application of the usual tests for definiteness/indefiniteness. In the second part of the paper Carlson/Sussman show that experimental evidence supports the existence of a separate class of weak definites. For their experiment they selected six nouns that often function as indefinite definites and matched them with comparable regular definite nouns (e.g. ''After she finishes her breakfast, Lydia will read the newspaper'' vs. ''the book''). Each noun was put into a sentence containing a verb that was known to support the indefinite definite reading. For each sentence pair a visual context was created which depicted the scene just before the action depicted in the sentence is carried out. The participants saw this scene on a computer screen, while they heard a spoken version of the sentence. They then had to choose the item on display that they thought was most likely to be involved in the upcoming action. In addition, their eye-movement was monitored while they were listing to the sentence. Both target choice and eye movement supported the existence of two separate classes of definites.

Sonia Cyrino and Ruth Lopes ('Animacy as a Driving Cue in Change and Acquisition in Brazilian Portuguese') use both diachronic data and data from language acquisition to show that a feature that was relevant for a change in Brazilian Portuguese is still operative in language acquisition. Looking at historical data they first discuss the grammatical change in object constructions where the 3rd person neuter clitic 'o' is gradually replaced by a null element, leading to a change in the grammar. They then go on to examine the present-day acquisition of the null category, arguing that this shift became critical for language acquisition, cuing a new grammar, and that it was the semantic features of the antecedent that were the driving cue and played a role in the acquisition of the object pronominal paradigm in Brazilian Portuguese. The more general theoretical conclusions they draw from this is that firstly, ''we may take cue-based theories seriously and try to show how a cue can be operative after a change occurred in a language, explaining the change itself'' (102), and secondly, that this ''places some questions about acquisition proper within the generative framework'' (102).

In 'Aspectual Coercion and On-line Processing: The Case of Iteration' Sacha DeVelle discusses the phenomenon of iteration, which is a prime example of aspectual coercion. Iteration ''describes the encoding of a series of repetitions within a given situation'' (106). The iterative interpretation is enhanced by the semantic punctual feature of point action verbs ('jump'), which can reflect a single act ('dive') or an iterative act ('knock'). Two studies (Piñango, Zurif, and Jackendoff (1999), using a cross modal lexical decision (CMLD) interference task; Todorova, Straub, Bedecker, and Frank (2000) using a reading time task) have shown that if a point action verb is combined with the durational adverbials 'for' or 'until' (e.g. ''The girl dived in the pool for five minutes'') there is an increased processing load, which is demonstrated by longer reaction times and emerges at or just after the durational adverbial. The authors of both studies argue that this is evidence for an enriched compositional operation. DeVelle however argues that the processing differences between activity verbs and point action verbs may also be due to the sentence stimuli used in the two studies. A repetition of Piñango et al.'s (1999) study showed one significant difference from the original study: the point action/durational adverbial sentence pairs were overall interpreted as more difficult to understand and less plausible than their activity sentence counterparts. DeVelle claims that this may have influenced Piñango et al.'s findings.

Studies on child language acquisition have argued that the acquisition of epistemic expressions begins between two-and-half and three years of age, but that epistemic expressions remain very rare until 4;5 (year; month) or later. Experiments have however shown that the linguistic epistemic system is not fully understood until the age of 8;0 or later, and that weak epistemic expressions like 'können' or 'vielleicht' are still not understood by 6- and 7-year-olds. These findings suggest that children understand (weak) epistemic terms much later than they begin to use them. In 'Why Do Children Fail to Understand Weak Epistemic Terms? An Experimental Study' Serge Doitchinov presents the results of two experiments he has conducted in order to find out whether children's late understanding of epistemic terms is related to the development of their ability to understand epistemic uncertainty (inference based hypothesis) or to their ability to recognise scalar implicature (implicature based hypothesis). His first experiment consisted of three tasks: (i) the 'modal expression task' which investigated to children's ability to understand weak epistemic expressions correctly; (ii) the 'implicature task', to assess the children's understanding of scalar implicatures; and (iii) the 'interference task' which examined their ability to deal with epistemic uncertainty. The second experiment was conducted to further assess the children's ability to recognise scalar implicatures. The results of the two experiments suggests that the acquisition of epistemic terms depends on the development of children's ability to understand epistemic uncertainty; this ability seems not yet fully mastered by eight years of age. Doitchinov argues that younger children's capacity to use weak epistemic terms is limited. They probably first use weak epistemic terms only in very familiar situations - this does not contradict previous claims. According to Doitchinov the results however also suggest that they have difficulties in inferring epistemic possibility, and that they occasionally overgeneralise the use of strong epistemic terms in their talk.

Linguistic descriptions of negative polarity items agree that the occurrence of polarity items is licensed by semantic and/or pragmatic properties. Furthermore it was argued that a negative polarity item is only licensed if it occurs in the scope of a negator (cf. e.g. Haegeman 1995). (1) a. Kein Mann, der einen Bart hatte, war jemals glücklich. 'No man who had a beard was ever happy' b.*Ein Mann, der einen Bart hatte, war jemals glücklich. 'A man who had a beard was ever happy' c. *Ein Mann, der keinen Bart hatte, war jemals glücklich. 'A man who had no beard was ever happy'

The paper 'Processing Negative Polarity Items: When Negation Comes Through the Backdoor' by Heiner Drenhaus, Stefan Frisch and Douglas Saddy presents the results of two psycholinguistic studies (acceptability speeded judgment tasks and event-related brain potentials (ERPs)). They have used structures such as in (1) to examine the specific lexical properties of a negative polarity items like 'jemals' ('ever') and the licensing conditions that are due to hierarchical constituency. Both experiments confirmed that there are two licensing conditions for negative polarity items: the semantic/pragmatic, and the structural/syntactic condition. Both experiments however also showed that violation with inaccessible negation ((1c) was more often accepted as correct than violation without negation (1b)), indicating that the negator is (wrongly) used to license the polarity item even if it is not in a c-commanding position. Drenhaus et al. claim that this might be due to a ''competition between semantic/pragmatic information and hierarchical constituency'' (159), but that further systematic investigations of polarity constructions are needed.

Veronika Ehrich's paper 'Linguistic Constraints on the Acquisition of Epistemic Modal Verbs' discusses constraints on the acquisition of epistemic modal verbs (MVs) in German. Ehrich first gives a detailed description of the relevant semantic and syntactic properties of German MVs and reviews some of the main findings of MV-acquisition research. She then compares the results of her corpus study to different competing (psycho-) linguistic approaches to epistemicity in language and language development. Ehrich concludes that syntactic progress, semantic diversification and cognitive development are all necessary prerequisites for the rise of epistemicity, but none of them seems to be sufficient by itself.

In 'The Decathlon Model of Empirical Syntax' Sam Featherston describes a new model of grammar, the 'Decathlon Model'. Featherston has conducted studies on frequency (based on corpus data) and studies on grammaticality (based on native speakers' judgments, using a procedure which ''allowed informants to express all the differences in ''naturalness'' that they perceive, with no coercion to a given scale'' (189)). The grammaticality-judgment study has yielded the following results: (i) judged well-formedness is a continuum - a cut- off point between well-formed and not well-formed cannot be located, (ii) each linguistic factor has an effect on well-formedness - more violations cause a structure to be evaluated worse, and (iii) there are no 'hard' constraints - no violation excludes a structure from the grammar. The frequency data shows a different picture. Of the 16 structures tested in the judgments, one occurs once in the corpus (the one judged second best), one occurs 14 times (the one judged best); the remaining 14 structures do not occur at all. This shows that the two data types are in fact not measuring the same factor and that relative judgments say nothing about the probability of occurrence of a structure. Featherston then introduces the Decathlon Model, which is supposed to be both ''an outline architecture of a grammar and at the same time an account of the differences between data types'' (196). The Decathlon Model's 'Constraint Application' module ''applies constraints, assigns violation costs, and outputs form/meaning pairs, weighted with violation costs'' (197). These form/meaning pairs are then sent to the 'Output Selection' module, which basically contains the grammar and which selects the best candidate for output. The existence of these two modules explains the different results for the different data types: With judgments, what is returned is the output of the Constraint Application function, whereas frequency measures measure the output of the Output Selection module. Featherston then goes on to discuss the advantages of the Decathlon Model over other theories of syntax, the notion and the nature of well-formedness, and the implications of his findings for the choice of data types in syntax. Here he concludes that the data type for syntax must be relative judgments: ''Frequency measures give us the same information as relative judgments about the best (couple of) structural alternatives in each comparison set, but they give us no information about any of the others.'' (205) For syntactic theory this means that one has to chose what one wants to model, as output selection and the grammar are two separate processes.

In her paper 'Examining the Constraints on the Benefactive Alternation by Using the World Wide Web as a Corpus' Christiane Fellbaum asks whether data gathered from the web can give us new insights into speakers' grammars and serve as evidence for linguistic theories. She contrasts the constraints for the Benefactive alternation (consisting of the PP alternant (''Chris bought a cake for Kim'') and the direct object (DO) alternant (''Chris bought Kim a cake'')) that were formulated on the basis of introspective data, with the data found on the web. Her data show that the previously proposed constraints cannot fully account for the data found on the web, although ''most data fall into the kinds of patterns that previous researchers have suggested'' (237). Fellbaum e.g. shows (i) that the DO alternant can occur with verbs of destruction, (ii) that it not necessarily requires a ''created/prepared/obtained entity that becomes the Beneficiary's possession'' (222) as had been claimed by other scholars, and (iii) that there is no ''Latinate Constraint'', i.e. there is ''no restriction on the Benefactive alternation that can be formulated in terms of etymology or morphophonological properties of the verb'' (225). She further shows (iv) that restrictions concerning the Benefactive cannot be formulated in terms of aspect, and (v) that the constraints that had been formulated concerning the nominal arguments of the Benefactive seem to be no ''hard'' constraints. Fellbaum argues that although web data do not permit us to formulate any hard constraints, two observations can be made: in the DO alternant, the subject has to have control over the event, and, unlike in the PP alternant, ''a benefit is necessarily bestowed, resulting in a change of state of the affected entity, the Beneficiary'' (237). She concludes with the observation that constructed data often fails to capture the fuzzy nature of real constraints and argues that all those grammatical phenomena that could previously only be studied using one's intuition should now be re-examined using natural occurring data, i.e. corpus data.

In 'A Quantitative Corpus Study of German Word Order Variation' Kris Heylen attempts to overcome the limitations of ''traditional'' data (introspection and ''encountered'' examples) by using a corpus-based approach to study the word order variation in the German Mittelfeld. Heylen first discusses the problems with traditional data types for studying word order variation, arguing that they are unreliable and not able to deal with gradient and multifactorial phenomena. He then discusses the advantages of corpora over other data types. proposes a corpus-based approach, arguing that (i) corpus data is primary data in linguistics, (ii) corpora gives us easy access to large amounts of data, (iii) corpus-data reflects gradient effects through relative frequencies, and (iv) multiple factors can be studied directly by looking at actual usage data. Heylen then presents the results of a corpus- based study on word order, where he has examined ''the variation that occurs when both a full NP-subject and a pronominally realised object are present in the Mittelfeld'' (244). He takes into account seven factors that might influence word order, and, using various statistical models, examines the correlations between word order and these factors (for each factors separately and for multiple factors simultaneously). Although his analysis shows that the seven factors investigated can explain some of the variation (e.g. the strong effect of clause-type: ''the 'marked' order subject-first is especially common in subordinate clauses'' (261)), Heylen argues that additional factors have to be tested in order to be able to fully account for the variation. He concludes with arguing that the results of the study are ''not yet explanations'' (261) and that in order to formulate an explanatory model for the variation corpus-data alone may not be sufficient as it is only ''part of a whole set of data types that are necessary for sound empirical language research'' (261).

There are a number of statistical word similarity measures, which are based on fundamentally different assumption. The paper 'Which Statistics Reflects Semantics? Rethinking Synonymy and Word Similarity' by Derrick Higgins presents yet another model - local context-information retrieval (LC-IR), which ''is based on web search statistics regarding the frequency with which words appear adjacent to one another'' (280). Higgins shows that LC-IR outperforms any other purely statistical model and ascribes this to the fact that as it uses web data there is no problem of data sparsity, and to the fact that is uses the parallelism assumption, i.e. it ''predicts that similar words will occur in grammatically parallel constructions'' (275). Other models, on the other hand, are either based on the idea that similar words occur near the same set of other words (the topicality assumption) or that words occur near those words which are most similar to them (the proximity assumption). Higgins goes on to discuss the implications his approach may have for a theory of lexical semantics and acquisition, arguing for example that grammatical parallelism is a cue used by language learners to identify words as semantically similar or synonymous.

The paper 'Language Production Errors as Evidence for Language Production Processes - the Frankfurt Corpora' (Annette Hohenberger, Eva-Maria Waleschkowski) compares ''slips'' in German Sign Language (DGS) to ''slips'' in spoken German in order to answer the question ''which aspects of language production and monitoring are modality-dependent and which are not'' (287). Using data from a DGS corpus and a corpus of spoken German, as well as experimental data from what they call ''the slip experiment'' to supplement the corpus data, Hohenberger/Waleschkowski show that ''language processing is basically modality independent'' (300), i.e. the fact that there are identical types of slips in DGS and spoken German indicates that ''producing speech and sign proceeds through the same planning stages and involves the same computational vocabulary'' (300). The observed differences in slip-types are argued to be related to differences in information packaging strategies in DGS and spoken German.

The aim of Mary Aizawa Kato and Carlos Mioto's paper 'A Multi- Evidence Study of European and Brazilian Portuguese wh-Questions' is to compare contemporary European Portuguese (EP) and Brazilian Portuguese (BP) wh-questions using equivalent written corpora as well as speakers' intuition. They then aim to provide a theoretical interpretation of the results, using Lightfoot's Principle and Parameters (PP) model of language change (Lightfoot 1999) as their framework. Their empirical research showed that there is an intersection of licensed patters in EP and BP, but that there are also differences. Compared to what had been found in previous studies, their empirical study revealed two facts: (i) ''spoken EP does not exhibit VS [verb- subject - EG] order in non-cleft questions'' (316) and (ii) ''BP VS order in non-cleft questions is not restricted to unaccusative verbs'' (316). Kato/Mioto's most important theoretical conclusion is that the VS order in EP wh-questions reflects the derivation of thetic sentences in general.

Gerard Kempen and Karin Harbusch ('The Relationship between Grammaticality Ratings and Corpus Frequencies: A Case Study into Word Order Variability in the Midfield of German Clauses') compare the results of a graded grammaticality-study on word order in the German Mittelfeld (Keller 2000) to data from two corpora. Keller had found that none of the constraints (C1) Pronominal < Nominal, (C2) Nominative < Non-nominative, and (C3) Dative < Accusative are ''absolute'' in that their violation gave rise to extremely low grammaticality judgments (C1 and C2 were found to have equal strength, whereas C3 was very weak). If such constraints were ''psychologically real'', it could be assumed, the differences in acceptability would be reflected by different corpus frequencies. Kempen/Harbusch however found that this is not the case: ''a systematic discrepancy emerged between the frequency counts and the grammaticality ratings'' (330). The argument orderings that were rated average or low were absent from the corpora, i.e. ''the grammaticality judgments tend to be more lenient than the corpus data'' (337). The authors claim that this discrepancy exists because what was rated in Keller's study was actually the discrepancy between the to-be-judged argument ordering and the order(s) licensed by the ''strict production-based linearization rule'', a mechanism which yields equivalent output, i.e. ''the grammaticality ratings appear sensitive to the number and seriousness of violations of the rule'' (342). There seems to be a critical value, the ''production threshold'', which separates the grammaticality continuum. Structures with grammaticality values above this threshold will occur in corpora with moderate-to-high frequencies, all other structures will have zero or very low frequencies.

In 'The Emergence of Productive Non-Medical '-itis': Corpus Evidence and Qualitative Analysis' Anke Lüdeling and Stefan Evert use the German suffix '-itis' to show that the problem of (morphological) productivity can only be understood when different types of evidence - quantitative and qualitative - are combined. Medical '-itis' is rule- based, or categorial, and therefore fully productive, it is originally used in medical contexts meaning 'inflammation (of)', it is bound and combined with neoclassical elements denoting body parts (e.g. ''Arthritis'' 'inflammation of the joints'). Non-medical '-itis' is similarity-based, and difficult to characterise in categorial terms. Its meaning can be described as 'doing too much of X'; Lüdeling/Evert argue that it likely developed from medical '-itis' the meaning of which was generalised to mean 'illness'. Their qualitative analysis of '-itis' has shown that there is evidence for two morphological processes with different properties. Lüdeling/Evert now use corpus data to find out (i) whether both processes differ with respect to productivity - here it could be expected that the productivity for the rule-based process should be higher, and (ii) whether (and how) the productivity of each process changes over time - here one would expect that ''the established medical rule-based use of '-itis' does not change over time, but non-medical '-itis', which is similarity-based and therefore dependent on the stored examples, can show short-term qualitative changes as well as changes in productivity'' (356f). They apply and discuss different statistical models to test the synchronic and diachronic productivity of both types of '-itis'. The quantitative properties of the two processes however do not confirm the two initial hypotheses, which leads Lüdeling/Evert to suggest that probably, morphological theory does not need to make a distinction between rule-based and similarity-based processes'' (366).

Wiltrud Mihatsch's paper 'Experimental Data vs. Diachchronic Typological Data: Two Types of Evidence for Linguistic Relativity' explores the interaction of perceptual and typological factors in lexical change, comparing diachronic data (from a database containing paths of lexical change in the domain of body parts in a sample of over 30 languages) with experimental data from the psycholinguistic literature. Lucy (1992) and Imai/Gentner (1997) had found that ''the number marking system may influence the categorisation of entities that are ambiguous between a classification according to shape and one according to substance with respect to their shape'' (373). Speakers of languages with obligatory number marking (e.g. English) tend to classify according to shape, speakers of languages without obligatory number marking (e.g. Japanese) tend to classify such objects according to material. Presupposing that ''lexical change reflects fossilized categorization processes'' (375), i.e. that concepts are always conceptualised via existing labels for other concepts and some of these new concepts get lexicalised, Mihatsch looks at whether the concepts of EYEBALL, EYELID, EYEBROW, and EYELASH, the words for which tend to be less stable and change over time (in contrast to e.g. HAIR, EYE, or SKIN), are conceptualised according to substance or according to shape in different languages. EYEBALL is virtually always named on the basis of round objects, whereas in the case of EYELID, EYEBROW, and EYELASH there are different naming strategies. EYEBROW, and EYELASH for example can be conceptualised on the basis of HAIR or WOOL, i.e. in terms of material (mostly in languages without obligatory plural marking), but also via their elongated, arc-like shape (in languages with obligatory plural marking). The results indicate a very strong interaction between noun type and conceptualisation, and therefore, according to Mihatsch, point towards ''a moderate version of linguistic relativity'' (381).

In 'Reflexives and Pronouns in Picture Noun Phrases: Using Eye Movements as a Source of Linguistic Evidence' Jeffrey T. Runner, Rachel S. Sussman, and Michael K. Tanenhaus first show that native speaker judgments on binding in picture NPs, i.e. noun phrases headed by a ''representational'' noun such as 'photograph', 'picture', 'film', are not solid. Reflexives in picture NPs lacking a possessor may violate Binding Theory (BT) (e.g. ''John knows that there is a picture of himself in the morning paper''). These reflexives have been called logophors (cf. Reinhard/Reuland 1993), i.e. ''reflexive noun phrases which are not ... subject to structural Binding Theory, but rather are constrained at least in part by discourse variables'' (395). Picture NPs with possessors appear to show the complementary distribution predicted by BT, but two studies by Keller and Asudeh (2001) have shown that native speakers accepted equally reflexives and pronouns bound to the subject of the sentence in examples like ''Hanna found Peter's picture of herself/he''. The three authors then present the results of an experiment that investigated the use of reflexives and pronouns in possessed picture NPs. In the experiment participants had to work with a display and three dolls, Ken, Harry, and Joe, which each had three pictures, one of himself and one of each of the others. The participants were then presented with potentially ambiguous instructions like ''Have Joe touch Ken's picture of himself''. Thus, participants' target choice provided a kind of judgment. ''If a participant choose a picture indicating a particular reading, this means that reading is acceptable or possible.'' (398) In addition to target choice the eye movements of the participants were being monitored, to see which potential referents were being considered by them. The authors found that ''pronouns in picture NPs with possessors are constrained by Binding Theory and that reflexives are not'' (403), and that ''instead these reflexives behave like logophors'' (404). Runner et al. furthermore show that BT ''cannot be viewed as an early filter that constrains the set of potential referents'' (408) as BT-inappropriate referents were considered early on in the processing for both reflexives and pronouns. They conclude with two more general implications of their study: (i) reflexives in picture NPs should all be treated as logophors, and (ii) their experiment could serve as an example for other studies that aim at complementing introspective data with psycholinguistic evidence.

Uli Sauerland, Jan Anderssen, and Kazuko Yatsushiro ('The Plural is semantically unmarked') first show that the 'Strong Theory' of the plural - the plural implies cardinality greater than one and is marked - does not hold, and that there are many cases where ''the plural does not mean the same as explicitly adding 'two or more''' (414) (consider for example ''You're welcome to bring your children'' vs. ''You're welcome to bring your two or more children''. Using evidence from adult competence and from adult and child performance, the authors instead argue for a 'Weak Theory' of the plural, which ''is characterized by the assumption that the plural is not subject to an inherent lexical restriction as the singular is'' (429). According to Sauerland et al. the plural is rather subject to pragmatic comparison with the singular, and can therefore not be used in most examples where the singular is possible. Their findings, according to the authors, imply (i) that ''semantic and morphological markedness need to be distinguished'' (430), and (ii) ''that the interpretation of the plural always involves an implicit comparison'' (430).

Tanja Schmid, Markus Bader, and Joseph Bayer present the results of an experiment based on a questionnaire that compared German infinitival non-coherent constructions, where the infinitival complement forms an independent constituent which may be extraposed (e.g. ''... dass Maria prahlt, alle Verwandten zu kennen'') and coherent constructions, where the infinitival complement does not form an independent constituent (e.g. ''*... dass Maria scheint, alle Verwandten zu kennen''). Their paper 'Coherence - an Experimental Approach' addresses the questions (i) whether experimental evidence verifies the validity of their (non-) coherence-tests and the verb class differences proposed in the literature, and (ii) what the factors are that give rise to coherence. Four constructions - topicalisation of the verbal complex, 'long' scrambling of a pronoun, 'long-distance' passive, and wide scope of negation - were used as tests for coherence; two configurations - extraposition of the infinitival complement, and narrow scope of negation - were used to test non-coherence. The intraposed construction, which is assumed to be structurally ambiguous (''... dass Max mir [nur das Lexikon zu kaufen] empfohlen hat'' vs. ''... dass Max mir nur das Lexikon [zu kaufen empfohlen hat]''), was tested, too. Schmid et al. report the following findings: (i) their coherence tests can be considered valid as the different results correlate significantly, (ii) the ambiguous intraposed construction patterns with the coherence tests, and (iii) there is evidence that verbs within a given class behave similarly.

In his paper 'Thinking About What We Are Asking Speakers to Do' Carson T. Schütze argues that it is important to evaluate the status and quality of the various types of linguistic evidence. Specifically he asks whether the data obtained from ''naive'' speakers is reliable, i.e. ''whether we are asking them to do things that they can understand and are capable of doing, and whether we can be confident that they are actually doing what we have asked of them'' (457). Schütze examines a number of case studies in detail, finding that in particular experiments that ''address our questions of interest ... directly'' (477), i.e. experiments where the linguist has a particular hypothesis in mind, can yield questionable results. Schütze shows that these ''bad'' results can have various reasons: in one example the instructions for the participants were unclear and inconsistent, or researchers did not take into account that certain ''scenarios'' that were evoked by their elicitation tests could influence the results, or they failed to see that other factors than the ones tested influenced the answers of the participants, etc. Schütze argues that these shortcomings can be overcome by sticking ''as closely as possible to the ways in which language is actually used for everyday purposes, rather than contriving artificial unfamiliar tasks'' (477) and that experiments that are used to gain direct information about underlying linguistic knowledge have to be improved.

The question Augustin Speyer pursues in his paper 'A Prosodic Factor for the Decline of Topicalisation in English' is whether there is a connection between the loss of the verb-second constraint (V2) and the decline of topicalisation - ''the movement of a non-subject constituent to the left edge of a sentence'' (487) like in ''Beans, John likes'' -, which both occurred at about the same time in the history of English (starting between 1150 and 1250). The fact that pronouns behave differently from full NPs (the use of pronouns in topicalised sentences remains stable after a sharp drop after 1250 whereas the use of full NPs gradually declines) suggests, according to Speyer, ''that the connection might have something to with one of the properties that pronouns have, but not full noun phrases, or vice versa'' (490). Speyer then goes on to discuss the pragmatic and prosodic properties of topicalised sentences, and introduces a constraint which he thinks might have caused the decline of topicalisation, the 'Trochaic Requirement' (TR), which indicates that ''some weak element ... between two accents is compulsory'' (494). In German topicalised constructions this constraint is naturally fulfilled, due to V2 (''Bohnen hasst Maria''), but Present Day English speakers have to (i) either insert an empty timing slot (after 'beans' in ''Beans, John likes''), ''thus creating a dummy weak element'' (496) or (ii) avoid topicalised constructions. Schütze argues that the TR constraint also held in the history of English. As in the Middle English Period V2-word order became more and more marked and was therefore used less and less, speakers avoided ''accent clash'' by avoiding topicalised constructions - the rate of topicalisations decreased. This is confirmed by the fact that pronouns, which are naturally weak elements, do not seem to be affected by the avoidance of topicalisation.

There are three different analyses of coordination. The ''deletion analysis'' (cf. e.g. Chomsky 1957) assumes that conjuncts are derived via a deletion mechanism, e.g. ''[The man is carrying the ladder] and [THE MAN IS CARRYING the bucket]'' (caps indicate deleted material). In the ''phrasal analysis'', ''coordinate phrases ... are base-generated directly by phrase structure rules'' (507), which either results in multi- headed constructions (cf. e.g. Jackendoff 1977) or in analyses that treat conjunctions as heads (cf. e.g. Kayne 1994). The ''node-sharing analysis'' allows for three-dimensional syntactic-structures with single nodes being shared by more than one phrase marker (cf. e.g. Moltmann 1992). Using data from two comprehension studies in agrammatism, and data from reading-time experiments, Ilona Steiner ('On the Syntax of DP Coordination: Combining Evidence from Reading-Time Studies and Agrammatic Comprehension') aims at finding out which of the three analyses is most plausible. The results of the two comprehension studies in agrammatism allowed her to discard the deletion approach; the reading time data provided evidence for the node sharing analysis and allowed her to distinguish between a phrasal analysis and the node sharing analysis. Both types of evidence however, taken together, indicated that the node-sharing analysis is most plausible.

The paper 'Lexical Statistics and Lexical Processing: Semantic Density, Information Complexity, Sex, and Irregularity in Dutch' by Wieke M. Tabak, Robert Schreuder, and R. Harald Baayen combines a survey of the distributional properties of regular and irregular verbs in Dutch verbs with an experimental lexical decision study, which addressed the predictability of these properties for lexical processing in reading. The authors established various factors with the help of which the regularity of a verb can be predicted, e.g. lemma frequency, family size, neighbourhood density, argument structures, auxiliaries, inflectional entropy, noun-verb frequency ratio, spoken-written frequency ratio. To test whether these systematic differences between regular and irregular verbs are reflected in on-line processing, the authors conducted a lexical decision study the results of which challenge many previous hypotheses about regular vs. irregular verbs. Tabak et al. for example found that error analysis and response latencies pointed to a procession advantage for regulars. In both analyses, this advantage was most prominent for past tense forms. This finding challenges Pinker's model (1991, 1997), which predicts that ''regulars should be more difficult to process than irregulars, because regulars would require decomposition into stem and affix in addition to lexical lookup, and therefore should elicit longer instead of shorter latencies'' (550). The more general picture that, according to Tabak et al., emerged from the study is ''that the distinction between regular and irregular verbs is not a simple one. Regulars and irregulars differ not only with respect to their formal properties, but also with respect to their semantic properties and the information structure of their inflectional paradigms'' (552). The authors conclude that ''the fascinating and enigmatic phenomenon of regularity and irregularity in the mental lexicon'' (552) requires further investigation.

In his paper 'The Double Competence Hypothesis: Diachronic Evidence' Helmut Weiß shows how the ''writing-competence'' that underlies the production of historical texts (which are performance data) can be modelled by combining two independently developed approaches to theoretical and historical linguistics: the double competence hypothesis (cf. e.g. Kroch 2001) - ''which assumes that the competence underlying writing (''first order natural languages'' (N1)) is different from the competence underlying speaking (''second order natural languages'' (N2)) since (i) it is acquired later and independently of the latter, and (ii) it is functionally different - and the hypothesis that there are several grades in languages' naturalness (cf. e.g. Ferguson 1959), which assumes that in a monolingual speech community the low variety (often a dialect) is acquired as native language and spoken in everyday communication, whereas the high variety is learned as second, non-native language, and only used in writing and formal communication. In the 14th and 15th centuries, when NHG started to evolve, the distance between these two competences was still very great, whereas in the 19th and 20th centuries, when NHG first became spoken and was acquired as native language, the distance began to decrease. Weiß shows that the ''mixed language'', which is characteristic of OHG texts is a consequence of a diglossic double competence, and ''that a historical syntactic pattern can be analysed in three ways: as the output of (i) the N1 competence, (ii) the N2 competence, or (iii) as a hybrid form'' (570). He concludes with the claim that in modern historical linguistics combining quantitative and theoretical tools is ''the right and only way to overcome the weaknesses of diachronic data in general and the consequences of double competence'' (571).

EVALUATION

Most papers in the volume 'Linguistic Evidence' address issues concerning linguistic evidence in relation to specific linguistic problems, using and combining various data types (experimental data and corpus data are perhaps the most frequently used data types here). The volume shows that the question of how to gain linguistic evidence is (or should be) important for all linguists and that linguists can only gain when they use more than one data type. Evidence involving more than one type of data provides a different, but definitely a more comprehensive perspective on a given linguistic phenomenon - whether it confirms one's hypothesis, or whether it contradicts it. There are only few papers that explicitly address methodological and theoretical questions concerning linguistic evidence (e.g. Featherston, Kempen/Harbusch and Schütze), but as linguistic evidence is quite a new topic of linguistic discussion it may well be hoped that we will get more linguistic evidence-theory and -methodology in the near future.

REFERENCES

Chomsky, Noam (1957) Syntactic Structures. Mouton: The Hague.

Ferguson, Charles (1959) 'Diglossia'. In: Word 15, 325-340.

Fillmore, Charles J. (1992) '''Corpus Linguistics'' or ''computer-aided armchair linguistics'''. In: Jan Svartvik (ed.) Directions in Corpus Linguistics: Proceedings of Nobel Syposium 82, Stockholm, 4-8 August, 1991. de Gruyter, Berlin, Germany, 35-60.

Haegeman, Liliane (1995) The Syntax of Negation [=Cambridge Studies in Linguistics 75]. Cambridge: Cambridge University Press.

Imai, Mutsumi; Gentner, Deirdre (1997) 'A Cross-Linguistic Study of Early Word Meaning: Universal Ontology and Linguistic Influence'. In: Cognition 62, 169-200.

Kayne, Richard (1994) The Antisymmetry of Syntax. Cambridge, MA: MIT Press.

Keller, Frank (2000) Gradience in grammar: Experimental and computational aspects of degrees of grammaticality. Ph.d. thesis. University of Edinburgh.

Keller, Frank; Asudeh, Ash (2001) 'Constraints on linguistic coreference: Structural vs. pragmatic factors: In: Moore, J.D./Stenning, K. (eds.) Proceedings of the 23rd Annual Conference of the Cognitive Science Society. Mahwah, NJ: Lawrence Erlbaum.

Kroch, Anthony S. (2001) 'Syntactic Change'. In: Baltin, Mark/Collins, Chris (eds.) The Handbook of Contemporary Syntactic Theory. Oxford: Blackwell, 699-729.

Jackendoff, Ray (1977) X' Syntax. Cambridge, MA: MIT Press.

Lightfoot, David (1999) The Development of Language: acquisition, change and evolution. Oxford: Blackwell.

Lucy, John A. (1992) Grammatical Categories and Cognition: A Case Study of the Linguistic Relativity Hypothesis. [Studies in the social and cultural foundations of language 13]. Cambridge: Cambridge University Press.

Moltmann, Friederike (1992) Coordination and Comparatives. Cambridge, MA: MIT Press.

Piñango, Maria; Zurif, Edgar; Jackendorf, Ray (1999) 'Real-time processing implications at the syntax-semantics interface'. In: Journal of Psycholinguistic Research 28 (4), 395-414.

Pinker, Stephen (1991) 'Rules of language'. In: Science 153, 530-535.

Pinker, Stephen (1997) 'Words and rules in the human brain'. In: Nature 387, 547-548.

Reinhard, Tanya; Reuland, Eric (1993) 'Reflexivity'. In: Linguistic Inquiry 34, 657-720.

Schütze, Carson T. (1996) The Empirical Basis of Linguistics: Grammaticality Judgments and Linguistic Methodology. Chicago: University of Chicago Press.

Sprouse, Rex; Vance, Barbara (1999) 'An explanation for the decline of null pronouns in certain Germanic and Romance languages'. In: DeGraff, Michael (ed.). Language Creation and Language Change: Creolization, Diachrony and Development. Cambridge, MA: MIT Press, 257-284.

Todorova, Marina; Straub, Kathleen; Badecker, William; Frank, Robert (2000) 'Aspectual coercion and the on-line computation of sentential aspect'. In: Proceedings of the twenty-second annual conference of the Cognitive Science Society. Philadelphia, PA.

ABOUT THE REVIEWER

Elke Gehweiler is reasearch associate in the project Collocations in the German Language at the Berlin-Brandenburgische Akademie der Wissenschaften, Berlin, Germany, and in a project on grammaticalization at the Freie Universität Berlin, where she is currently preparing her Ph.D. thesis on the grammaticalization of adjectives in English and German.