Editor for this issue: Jody Huellmantel <jody
linguistlist.org>
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% ***************************************************************** CALL FOR PAPERS ***************************************************************** NATURAL LANGUAGE PROCESSING AND CORPUS LINGUISTICS New corpora, new practices and new concepts Special issue of the french journal TAL edited by Beatrice Daille (IRIN, Nantes) and Laurent Romary (LORIA, Nancy) ***************************************************************** SCOPE Because of its corpus based, empirical approach and of its post-hoc formalisation, Corpus linguistics has often been viewed as the antithesis to computational linguistics. Over the years, corpus linguistics has developed its own tools and methods for identifying these linguistic constructs which are primary to many linguistic applications such as language learning and translation. Today the development of the Internet, the availability in electronic format of many publications and documentations as well as the increased power of computational tools promotes a renewed use of corpora at all levels of linguistics theorising. In this context, and going beyond a simple listing of technical results, it seems important to recap the methodological and conceptual progresses made in the field of corpus linguistics and to precisely identify the role NLP played in these progresses. TOPICS (NOT LIMITATIVE) In this special issue, we wish to publish either innovative papers or synthesis and prospective articles bearing on the following topics: - Corpora and Linguistic Theorising: Which new theories arose or is arising from corpus work? How do these formal linguistic theories account for the linguistic constructs found in corpora e.g., collocations, idioms etc? - Corpus building: criteria, selection constraints and organisation for which linguistic study? How can we measure the representativity of a corpus with respect to a given linguistic construct? - Methods and techniques: Which methods and techniques can we use for analysis (concordences, statistics, annotations)? - Domains and applications of corpus linguistics: field linguistics, teaching, learning of stochastical models, semantic information retrieval, extraction of mono- or multi-lingual lexica, discourse analysis, translation. - Infrastructure, tools and availability: Which representation standards? What for? Platforms providing the access to big corpora. Analysis tools. FORMAT Authors are strongly encouraged to use LaTeX2e and the HERMES style files <http://www.editions-hermes.fr/ rubrique Auteurs LANGUAGE Articles can be written in French or in English. English written articles are only accepted from non-french speaking authors. DEADLINES The deadline for submission is 1 February 2001. A notification of intention to submit should be sent to Beatrice Daille (Beatrice.DailleMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueirin.univ-nantes.fr) or Laurent Romary (Laurent.Romary
loria.fr) before 1 December 2000. The articles will be referreed by a member of the TAL editorial board and two members of the editorial committee specifically created by the editors for this special issue. The decision of the editorial boards will be communicated to the authors before 1 April 2001. The final version of the accepted papers is due on 1 May for a publication scheduled for Autumn 2001. SUBMISSIONS The articles must be submitted either electronically to Isabelle.Blanchard
loria.fr or as hardcopy (three copies) to: Isabelle Blanchard Batiment LORIA-CNRS B.P. 239 F-54506 Vandoeuvre Les Nancy Cedex FRANCE EDITORIAL COMMITTEE (Preliminary) Claire-Blanche Benveniste, Didier Bourigault, Lynne Bowker, Etienne Brunet, Lou Burnard, Jean Carletta, Dan Cristea, Alexander Geyken, Gaston Gross, Benoit Habert, Nancy Ide, Judith Klavans, Martine Mazaudon, Elena Paskaleva, Jennifer Pearson, Marie-Paule P�ry-Woodley, Jean-Marie Pierrel, Fran�ois Rastier, Andr� Salem, Anatole Shaikevich, Gary Simons, John Sinclair, Wolfgang Teubert, Jean V�ronis, Dusko Vitas. JOURNAL T.A.L. (http://www.atala.org/tal/) The international journal Traitement Automatique des Langues (TAL) has been published since 1969 by the french Association pour le traitement automatique des langues (ATALA) with the support of the Centre National pour la recherche scientifique (CNRS). The journal TAL covers all fields of computational linguistics and its aim is to provide mainly (but not only) french speaking researchers and students with publications in all domains of computational linguistics. It appears three times a year and is distributed by HERMES. T.A.L. EDITORIAL BOARD Pierrette Bouillon (ISSCO, Gen�ve) Philippe Blache (CNRS, Aix-en-Provence) -- Chief Editor Dani�le Cl�ment (Bergische Universit�t Wuppertal) Christophe d'Alessandro (LIMSI-CNRS, Orsay) -- Chief Editor Anne Condamines (CNRS, Toulouse le Mirail), Claire Gardent (Universit�t des Saarlandes) -- Chief Editor Marc El-B�ze (Universit� d'Avignon), Jean-Louis Lebrave (CNRS, Paris), Piet Mertens (Katholieke Universiteit Leuven) �velyne Tzoukermann (Bell Labs) Bernard Victorri (ENS, Paris) Pierre Zweigenbaum (AP-HP, Paris 6) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% ***************************************************************** APPEL A SOUMISSIONS ***************************************************************** TRAITEMENT AUTOMATIQUE DES LANGUES ET LINGUISTIQUE DE CORPUS Nouveaux corpus, nouvelles pratiques, nouveaux concepts Num�ro sp�cial de la revue TAL dirig� par Beatrice Daille (IRIN, Nantes) et Laurent Romary (LORIA, Nancy) ***************************************************************** THEME La linguistique de corpus est souvent oppos�e � la linguistique informatique du fait de sa d�marche empirique � partir de corpus et d'une formalisation des ph�nom�nes linguistiques � posteriori. Elle a d�velopp� depuis plusieurs ann�es ses propres m�thodes et outils pour identifier certains ph�nom�nes linguistiques primordiaux dans de nombreux domaines de linguistique appliqu�e tels que l'apprentissage des langues ou la traduction. Aujourd'hui, le d�ploiement de l'Internet, la disponibilit� de nombreuses publications et documentations sous format �lectronique et l'accroissement en puissance des outils informatiques a favoris� un renouveau de l'usage de collections textuelles � tous les niveaux de l'analyse linguistique. Dans ce contexte et au del� du recensement de simples progr�s techniques, il semble important de faire le point sur les nouvelles avanc�es m�thodologiques et conceptuelles de la linguistique de corpus, ainsi que de pr�ciser quel r�le le TAL a jou� dans ces avanc�es. SUJETS (LISTE NON LIMITATIVE) Dans ce num�ro sp�cial, nous souhaitons publier soit des papiers innovants, soit des articles de synth�se et de prospective autour des th�matiques suivantes : - Corpus et mod�les linguistiques : Quelles th�ories nouvelles sont issues ou �mergent du travail sp�cifique sur corpus ? Comment les th�ories de linguistique formelle rendent-elles compte des ph�nom�nes linguistiques observ�s en corpus, tels que les collocations, les phras�ologies, etc ? - Constitution de corpus : crit�res, contraintes de s�lection et organisation pour quelle �tude linguistique ? Comment mesurer la repr�sentativit� d'un corpus par rapport au ph�nom�ne linguistique �tudi� ? - M�thodes et techniques : Quelles m�thodes et techniques d'analyse (concordances, statistiques, annotations) ? - Domaines et applications de la linguistique de corpus : Linguistique de terrain, enseignement, apprentissage de mod�les stochastiques, m�thodes d'acc�s au contenu, extraction de lexiques mono ou multilingues, analyse de discours, traduction. - Infrastructures, outils et accessibilit� : quels standards de repr�sentation? pour quoi faire ? plate-forme d'acc�s aux grands corpus. Outils d'analyse. FORMAT Nous recommandons l'utilisation de LaTeX2e pour la soumission des articles. Les feuilles de style sont disponibles chez HERMES <http://www.editions-hermes.fr/ rubrique Auteurs> LANGUE Les articles sont �crits en fran�ais ou en anglais. Les soumissions en anglais ne sont accept�es que pour les auteurs non francophones. DATES LIMITES La date limite de soumission est fix�e au 1er f�vrier 2001. Les personnes qui ont l'intention de soumettre un article sont invit�es � prendre contact avec B�atrice Daille (Beatrice.Daille
irin.univ-nantes.fr) ou Laurent Romary (Laurent.Romary
loria.fr) avant le 1er d�cembre 2000. Les articles seront relus par un membre du comit� de r�daction de la revue TAL et deux relecteurs du comit� de lecture constitu� sp�cifiquement par les coordinateurs pour ce num�ro. La d�cision du comit� de r�daction sera transmise aux auteurs avant le 1er avril. La version d�finitive des articles accept�s sera � remettre pour le 1er mai pour une publication pr�vue � l'automne 2001. ENVOI DES ARTICLES Les articles doivent �tre envoy�s par voie �lectronique � l'adresse suivante : Isabelle.Blanchard
loria.fr ou en version papier (trois exemplaires) par voie postale � l'adresse suivante : Isabelle Blanchard B�timent Loria-CNRS B.P. 239 F-54506 Vandoeuvre Les Nancy cedex COMITE DE LECTURE SPECIFIQUE (provisoire) Claire-Blanche Benveniste, Didier Bourigault, Lynne Bowker, Etienne Brunet, Lou Burnard, Jean Carletta, Dan Cristea, Alexander Geyken, Gaston Gross, Benoit Habert, Nancy Ide, Judith Klavans, Martine Mazaudon, Elena Paskaleva, Jennifer Pearson, Marie-Paule P�ry-Woodley, Jean-Marie Pierrel, Fran�ois Rastier, Andr� Salem, Anatole Shaikevich, Gary Simons, John Sinclair, Wolfgang Teubert, Jean V�ronis, Dusko Vitas. LA REVUE T.A.L. (http://www.atala.org/tal/) L'ATALA publie depuis 1969 la revue internationale Traitement Automatique des Langues avec le concours du CNRS. Trois num�ros par an sont consacr�s aux diff�rents aspects du traitement automatique du langage naturel. La revue TAL s'adresse en priorit� aux chercheurs et aux �tudiants de langue fran�aise. Elle est publi�e par l'ATALA : (http://www.atala.org) et est diffus�e par HERMES. COMITE DE REDACTION Pierrette Bouillon (ISSCO, Gen�ve) Philippe Blache (CNRS, Aix-en-Provence) -- Chief Editor Dani�le Cl�ment (Bergische Universit�t Wuppertal) Christophe d'Alessandro (LIMSI-CNRS, Orsay) -- Chief Editor Anne Condamines (CNRS, Toulouse le Mirail), Claire Gardent (Universit�t des Saarlandes) -- Chief Editor Marc El-B�ze (Universit� d'Avignon), Jean-Louis Lebrave (CNRS, Paris), Piet Mertens (Katholieke Universiteit Leuven) �velyne Tzoukermann (Bell Labs) Bernard Victorri (ENS, Paris) Pierre Zweigenbaum (AP-HP, Paris 6)
.............................................. Call for Contributions .............................................. Languages of the World (LW) Languages of the World is a booklet series for STUDIES ON GRAMMATICAL ISSUES; LANGUAGE TYPOLOGY; and the results of LINGUISTIC FIELD RESEARCH. The first ten issues have been published in journal form. From October 2000 on each issue focuses on a single topic (32 - 150pp), and is available as a separate booklet. Proposals should be sent to: Ulrich Lueders (ed.), LINCOM EUROPA, Freibadstr. 3, D-81543 Muenchen (FAX +49 89 62269404). The following issues are available now: LW12: A Conceptual Analysis of Tongan Spatial Nouns: From Grammar to Mind Giovanni Bennardo University of Illinois at Urbana-Champaign In Churchward (1953) a set of Tongan nouns are labeled 'local', that is "construed as if it were the proper name of a place" (p. 88). Some of these nouns reappear under another label, that is, 'preposed' nouns (p. 214-16) and they are defined as nouns that can be "placed immediately before another noun instead of being connected with it by means of a preposition" (p.214). This peculiarity was exploited by Broschart (1993) to argue for a subset of these nouns to be considered as classifiers. In this work the author tries to clarify the border of this fuzzy subset of Tongan nouns differently addressed by Churchward and Broschard. The analysis of this newly defined subset of Tongan nouns, 'spatial' nouns, is conceptual, that is, based on a set of primitive (and possibly universal) spatial concepts suggested by Lehman & Bennardo (1992) and Bennardo (996). The conceptual apparatus is the result of extensive analyses conducted on both English and Tongan spatial prepositions. Further analyses regarded representations of spatial relationships in other languages like Burmese, Thai and Italian. Following Lucy's suggestion, grammatical features of the Tongan language represent the path along which the conceptual analysis moves. In fact, five structural contexts in which the 'spatial' nouns appear represent the starting point of the analysis. The analysis will weave through the grammatical and conceptual levels and will end up in sorting the nouns into three separate groups according to a combination of their conceptual content and grammatical possibilities. Finally, the results of this analysis call for an interesting modification of the conceptual apparatus. 3 89586 917 1. Languages of the World 12. 34pp. USD 9.50 / DM 17 / � 5.60. LW13: The Lord's Prayer in Erromangan: Literacy and Translation in a Vanuatu Language Terry Crowley University of Waikato Erromangan, an Oceanic language of southern Vanuatu, has a written literature that until recently was restricted exclusively to materials relating to recently introduced Christianity. This literature is entirely translated, with the materials written by European missionaries in the late nineteenth and early twentieth centuries. In many respects, these translations are structurally deviant to the point where intelligibility is sometimes impaired. Massive population loss and major language shift on the island in the second half of the nineteenth century should has predisposed this language to massive simplification and homogenisation in the direction of English according to some scenarios, especially were literacy and Christianisation are involved. However, the remaining Erromangan language has remained vital, structurally complex and largely intact, demonstrating that the linguistic disruption posed by missionary-inspired literacy is nothing like as powerful as some have suggested. ISBN 3 89586 973 2. Languages of the World 13. 24 pp. USD 10 / DM 19 / � 6. LW15: Ket Prosodic Phonology Edward J. Vajda Western Washington University The present study proposes a complete inventory of the segmental and suprasegmental phonemic units for the southern dialect of Ket, a language isolate spoken in Central Siberia. It argues that Ket contains a constrastive system of tones operating within the domain of the phonological word rather than the syllable. This word tone system consists of four tonemes, two of which have disyllabic and monosyllabic allotones. Tone in Ket serves to delimit one word from another by marking the leftmost two syllables of each phonological word with one of four contrastive combinations of melodic (height and contour) and non-melodic features (vowel length and glottalization). In addition, the four tonemes distinguish meaning by forming numerous minimal pairs. The article describes Ket segmental phonology as containing only 12 consonant and 7 vowel phonemes. Many constrasts which previous researchers treated as phonemic (such as the difference between tense vs. lax mid vowels and plosives vs. fricatives in word final position) turn out to be allophonic when prosodic data are considered. 3 89586 915 5. Languages of the World 15. 36pp. USD 9.50 / DM 17 / � 5.60. LW17: Reduplication in Tiriy� (Cariban) Sergio Meira Max-Planck-Institut f�r Psycholinguistik This study presents original data illustrating previously undescribed reduplicative patterns found in Tiriy�, a Cariban language spoken in Northern Brazil; this is the first time that reduplication in a Cariban language is described in detail. One of the patterns is simpler, and its synchronic cases of variation suggest a certain path of historical evolution. For the other pattern, the complexity of the several subcases appear to indicate antiquity and make formal accounts significantly more difficult. ISBN 3 89586 914 7. Languages of the World 17. 26pp. USD 10 / DM 20 / � 7 LW18: Basic Word Order and Sentence Types in Kari'�a Andr�s Romero-Figeroa Universidad de Oriente, Cuman�, Venezuela The purpose of this research is to study the basic syntactic order in Kari'�a through the analysis of an integrated corpus encopassing simple sentences taken from conversations and texts ellicited from natives. The fieldwork sessions for this work were carried out between January and September of 1996 in Cachama, a village located in the heart of the Kari'�a homeland in northeastern Venezuela. This study covers the primary syntactic elements, i.e. Subject, Verb and Object. As well, some consideration is given to other sentential elements of this language - specially obliques and adverbials. Finally, a survey of some sentence types in Kari'�a is included. In general, the study pursues to determine the prevailing syntactic order in Kari'�a, and to account for the most common arrangements for quotative, intransitive, transitive, ditransitive, copulative, imperative, interrogative and negative sentences. 3 89586 686 5. Languages of the World 18. 30pp. USD 11.00 / DM 20 / � 6.50. LW20: The Loss of German in Upper Silesia after 1945 Volkmar Engerer Statsbiblioteket Aarhus In the first part of the study, an overview over Upper Silesia and the numerous historical language shifts in this area is given. With at least five language shifts and three phases of complete language loss, Upper Silesia constitutes quite an illustrative case for loss and maintenance in a region. In part two, a conceptualisation of language shift is presented. Two approaches to language shift are then developed, the processual and the correlative. The latter emphasises the competence dimension, divided into an analysis of one language only, German, and an analysis of languages as components of multilingual profiles. Part three presents examples of analyses of isolated German, using the correlative approach. The results in both domains show that German is tied to an urban milieu and has a dominant function as a professional language with high prestige. Part 4 demonstrates the use of multilingual profiles, now from a processual perspective. The analyses show a clear consolidation of Polish with an as yet undecided competition between Upper Silesian and German as second languages. The tendency in the direction of the trilingual profile German/Polish/Upper Silesian seems to have a future if the domains of use stabilise. ISBN 3 89586 663 6. Languages of the World 20. Ca. 24pp. USD 9 / DM 18 / � 6. In preparation: LW21: The properties of certain classes of indirect verbs and passives of state in modern Georgian Marcello Cherchi The University of Chicago Indirect constructions in Georgian have been discussed with respect to several types of verbs in the literature. When a particular construction is identified as "indirect" (or "inverse"), the investigator generally invokes a line of argumentation which relies upon comparison with a putatively similar predicate or predicate type in an Indo-European language. Our personal feeling is that for the purposes of linguistic analysis it is more productive to view the so-called "indirect" verbs as basic - rather than as derived - structural types within Georgian grammar. However, in the present paper paper we would like to avoid becoming enmeshed in that dispute by starting from a different analytical perspective. Specifically, we will attempt to delimit a class of verbs based on a formal definition and examine the characteristics of the members of that class. It will turn out that the majority of the verbs involved have been clssified as "indirect" by one investigator or another, but we would prefer to view that as a secondary, though certainly interesting result. The more importantresult is the significance of this sort of analysis for classification within the Georgian verbal system. In particular, it supports posting a class that includes two types of verbs which other investigators have generally partitioned into two distinct classes. 3 89586 919 8. Languages of the World 21. 24pp. USD 9.50 / DM 17 / � 5.60. LW24: A Priori Artificial Languages Alan Libert University of Newcastle The best known artificial language is Esperanto. However, hundreds of other artificial languages have been proposed, although some have not progressed beyond the stage of sketches and few have seen much actual use. Those which are not consciously based on natural languages are called a priori languages. Such languages have been less successful than artificial languages built with elements of natural languages, such as Esperanto and Interlingua. However, a priori languages are of considerable theoretical interest, in particular from the point of view of language universals: if a universal property holds even of languages created "from scratch", then it can indeed be seen as a property of any (usable) human language. Therefore, in the description of the grammars of several a priori languages, particular attention will be given to whether their features are in accord with proposed universals, of both the Greenbergian and Chomskyan types. After an introduction one chapter each will be devoted to phonetics/phonology, writing systems, lexicon, morphology, syntax, and semantics. The languages described include aUI, Babm, Fitusa, Loglan/Lojban, and Suma. Most of these languages have received very little attention, even from scholars studying artificial languages. ISBN 3 89586 667 9. Languages of the World 24. DM 68 / USD 44 / � 25. 2001/I.Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue