Editor for this issue: Renee Galvis <renee
linguistlist.org>
********************** NEW DEADLINE: 25th of February 2002 ************** LREC 2002 Workshop on LINGUISTIC KNOWLEDGE ACQUISITION AND REPRESENTATION: BOOTSTRAPPING ANNOTATED LANGUAGE DATA Las Palmas, Canary Islands, Spain 2nd June 2002 ********************** NEW DEADLINE: 25th of February 2002 ************** _____________________________ MOTIVATION AND AIMS Provision of large-scale labelled language resources, such as tagged corpora or repositories of pre-classified text documents, is a crucial key to steady progress in an extremely wide spectrum of research, technological and business areas in the HLT sector. The continuously changing demands for language-specific and application-dependent annotated data (e.g. at the syntactic or at the semantic level), indispensable for design validation and efficient software prototyping, however, are daily confronted by the labelled-data bottleneck. Hand-crafted resources are often too costly and time-consuming to be produced at a sustainable pace, and, in some cases, they even exceed the limits of human conscious awareness and descriptive capability. Possible ways to circumvent, or at least minimise, this problem come from the literature on automatic knowledge acquisition and, more generally, from the machine-learning community. Annotated data are bootstrapped by training a machine-learning classifier with a small sample of pre-annotated data and by using the induced classifier to annotate more data. Co-learning provides an alternative methodology, which essentially consists in iterative cooperation of two or more independent learning systems. Another promising route consists in automatically tracking down recurrent knowledge patterns in unstructured or implicit information sources (such as free texts or machine readable dictionaries) for this information to be moulded into explicit representation structures (e.g. subcategorisation frames, syntactic-semantic templates, ontology hierarchies etc.). We believe that all these attempts at bootstrapping labelled data are not only of practical interest (for continuous updating, management and validation of dynamic resources), but also point to a bunch of germane theoretical issues. In particular, the workshop intends to focus on the issue of interaction between techniques for inducing structured knowledge from raw data and formal methods of linguistic knowledge representation. Gaining insights into this issue is an essential requirement for explaining the effective use of linguistic knowledge by cognitive agents. Although the cognitive and engineering views of the form and acquisition of linguistic knowledge need not be related, data from neuroscience and psychology are indeed relevant when evaluating different ways of representing information in artificial systems, and different models for linguistic knowledge acquisition. We encourage in-depth analysis of underlying assumptions of the proposed bootstrapping methods and discussion of possible relevant connections with existing annotation and representation schemes. This investigation is likely to have significant repercussions on the way linguistic resources will be designed, developed and used for applications in the years to come. As the two aspects of knowledge representation and acquisition are profoundly interrelated, progress on both fronts can only be achieved, in our view of things, through a full appreciation of this deep interdependency. TOPICS OF INTEREST Possible themes for contributions are: * development of 'data-driven' annotation/representation schemes * dynamic update, customisation and tuning of labelled resources through acquired data * 'hybrid models' of linguistic knowledge extraction, whereby machine learning methods are integrated with formal structures of knowledge representation * incremental linguistic knowledge-bases * formal representation and structuring of information flow automatically acquired from texts * knowledge acquisition and linguistic resources lifecycle * linguistic knowledge acquisition and representation in cognitive tasks NEW!!! IMPORTANT DATES NEW!!! Deadline for workshop abstract submission: 25th of February 2002 Notification of acceptance: 20th of March 2002 Final version of paper for workshop proceedings: 20th of April 2002 Workshop: 2nd June 2002 (afternoon session) SUBMISSIONS The organizers welcome contributions describing existing research related to the topics of the workshop. Each presentation will be 25 minutes long (20 minutes for presentation and 5 minutes for questions and discussion). Submissions should include: title; author(s); affiliation(s); and contact author's e-mail address, postal address, telephone and fax numbers. Abstracts (maximum 500 words, plain-text format) must be sent to: simoMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueilc.pi.cnr.it The final version of the accepted papers should not be longer than 4,000 words or 10 A4 pages. Instructions for formatting and presentation of the final version will be sent to authors upon notification of acceptance. ORGANISING COMMITEE Alessandro Lenci (Universit=E0 di Pisa, Italy) Simonetta Montemagni (Istituto di Linguistica Computazionale - CNR, Italy) Vito Pirrelli (Istituto di Linguistica Computazionale - CNR, Italy) PROGRAM COMMITTEE Harald Baayen (Max Planck Institute for Psycholinguistics - Nijmegen) Rens Bod (University of Amsterdam, Holland) Michael R. Brent (Washington University, USA) Nicoletta Calzolari (Istituto di Linguistica Computazionale - CNR, Italy) Jean-Pierre Chanod (Xerox Research Centre Europe, Grenoble, France) Walter Daelemans (University of Antwerp, Belgium) Dekang Lin (University of Alberta, Edmonton, Canada) Horacio Rodriguez (Universidad Politecnica de Catalunya) Fabrizio Sebastiani (Istituto per l'Elaborazione dell'Informazione - CNR, Italy) Lucy Vanderwende (Microsoft Research, Redmond, USA) Francois Yvon (Ecole Nationale Superieure des Telecommunications, Paris Frances) Menno van Zaanen (University of Amsterdam, The Netherlands) CONTACT PERSON Simonetta Montemagni Istituto di Linguistica Computazionale (ILC) - CNR Area della Ricerca di Pisa Via Moruzzi 1, 56124 Pisa, ITALY e-mail: simo
ilc.pi.cnr.it
ESSLLI-2002 Workshop on Machine Learning Approaches in Computational Linguistics August 5-9, 2002 A workshop held as part of the 14th European Summer School in Logic, Language and Information ESSLLI-2002 Trento, Italy August 5-16, 2002 ** Second CALL FOR PAPERS ** ORGANIZERS: Erhard Hinrichs, Sandra Kuebler (Universitaet Tuebingen) DESCRIPTION: Over the last decade, machine learning approaches have established themselves as an important subfield of computational linguistics. The resulting body of research is characterized by a wide range of techniques. These techniques have been successfully applied to a variety of natural language annotation tasks, such as part-of-speech tagging, shallow and deep parsing, word sense disambiguation, anaphora resolution, PP attachment. The purpose of this workshop is to provide a forum for these junior researchers to present their work and discuss the relative merits of their machine learning approaches as they apply to natural language phenomena. Possible learning approaches are: * inductive logic programming * memory-based learning * transformation-based learning * decision trees * genetic algorithms * connectionist learning SUBMISSION: All researchers in the area, but especially Ph.D. students and young researchers, are invited to submit a paper. Electronic submissions are highly encouraged (preferably as plain ASCII or Postscript). Submissions should not exceed 10 (A4 or letter) pages, typeset in 10-12 points, with at least 2.5 cm / 1 inch margins. Submitted papers should be anonymous and be accompanied by an e-mail listing the following details: - Title - Authors' names and affiliation - Address - E-mail addresses All submissions will be reviewed by an international program committee. The accepted papers will be made available in a summer school reader. If sufficiently many high-quality papers are submitted, we intend to publish them in an edited volume. Submissions should be sent before March 15, 2000 to the following address: Sandra Kuebler Universitaet Tuebingen Seminar fuer Sprachwissenschaft Wilhelmstr. 113 D-72074 Tuebingen Germany kueblerMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuesfs.nphil.uni-tuebingen.de If electronic submission is impossible, please send four copies of your paper to the above address. Informal enquiries by e-mail to the organizers are welcome. IMPORTANT DATES: Mar 15, 2002: Deadline for submissions Apr 15, 2002: Notification of acceptance May 15, 2002: Final version due Aug 5, 2002: Start of workshop PROGRAM COMMITTEE: Steven Abney Anja Belz Rens Bod Antal van den Bosch Sabine Buchholz Walter Daelemans Ido Dagan Herve Dejean James Hammerton Erhard Hinrichs, co-chair Yuval Krymolowski Sandra Kuebler, co-chair Paola Merlo John Nerbonne TBC Miles Osborne Erik Tjong Kim Sang Jorn Veenstra TBC Andreas Wagner FURTHER INFORMATION: To obtain further information about ESSLLI-2002 please visit http://www.esslli2002.it/ This workshop is held as part of the ESSLLI-2002 summer school. Therefore all workshop participants are required to register for ESSLLI-2002. Registration information will be announced in due time by the local organizers on the ESSLLI-2002 website.