Editor for this issue: Dina Kapetangianni <dina
linguistlist.org>
LREC 2002 Workshop on LINGUISTIC KNOWLEDGE ACQUISITION AND REPRESENTATION: BOOTSTRAPPING ANNOTATED LANGUAGE DATA Las Palmas, Canary Islands, Spain 2nd June 2002 _____________________________ MOTIVATION AND AIMS Provision of large-scale labelled language resources, such as tagged corpora or repositories of pre-classified text documents, is a crucial key to steady progress in an extremely wide spectrum of research, technological and business areas in the HLT sector. The continuously changing demands for language-specific and application-dependent annotated data (e.g. at the syntactic or at the semantic level), indispensable for design validation and efficient software prototyping, however, are daily confronted by the labelled-data bottleneck. Hand-crafted resources are often too costly and time-consuming to be produced at a sustainable pace, and, in some cases, they even exceed the limits of human conscious awareness and descriptive capability. Possible ways to circumvent, or at least minimise, this problem come from the literature on automatic knowledge acquisition and, more generally, from the machine-learning community. Annotated data are bootstrapped by training a machine-learning classifier with a small sample of pre-annotated data and by using the induced classifier to annotate more data. Co-learning provides an alternative methodology, which essentially consists in iterative cooperation of two or more independent learning systems. Another promising route consists in automatically tracking down recurrent knowledge patterns in unstructured or implicit information sources (such as free texts or machine readable dictionaries) for this information to be moulded into explicit representation structures (e.g. subcategorisation frames, syntactic-semantic templates, ontology hierarchies etc.). We believe that all these attempts at bootstrapping labelled data are not only of practical interest (for continuous updating, management and validation of dynamic resources), but also point to a bunch of germane theoretical issues. In particular, the workshop intends to focus on the issue of interaction between techniques for inducing structured knowledge from raw data and formal methods of linguistic knowledge representation. Gaining insights into this issue is an essential requirement for explaining the effective use of linguistic knowledge by cognitive agents. Although the cognitive and engineering views of the form and acquisition of linguistic knowledge need not be related, data from neuroscience and psychology are indeed relevant when evaluating different ways of representing information in artificial systems, and different models for linguistic knowledge acquisition. We encourage in-depth analysis of underlying assumptions of the proposed bootstrapping methods and discussion of possible relevant connections with existing annotation and representation schemes. This investigation is likely to have significant repercussions on the way linguistic resources will be designed, developed and used for applications in the years to come. As the two aspects of knowledge representation and acquisition are profoundly interrelated, progress on both fronts can only be achieved, in our view of things, through a full appreciation of this deep interdependency. TOPICS OF INTEREST Possible themes for contributions are: * development of 'data-driven' annotation/representation schemes * dynamic update, customisation and tuning of labelled resources through acquired data * 'hybrid models' of linguistic knowledge extraction, whereby machine learning methods are integrated with formal structures of knowledge representation * incremental linguistic knowledge-bases * formal representation and structuring of information flow automatically acquired from texts * knowledge acquisition and linguistic resources lifecycle * linguistic knowledge acquisition and representation in cognitive tasks IMPORTANT DATES Deadline for workshop abstract submission: 15th of February 2002 Notification of acceptance: 15th of March 2002 Final version of paper for workshop proceedings: 15th of April 2002 Workshop: 2nd June 2002 (afternoon session) SUBMISSIONS The organizers welcome contributions describing existing research related to the topics of the workshop. Each presentation will be 25 minutes long (20 minutes for presentation and 5 minutes for questions and discussion). Submissions should include: title; author(s); affiliation(s); and contact author's e-mail address, postal address, telephone and fax numbers. Abstracts (maximum 500 words, plain-text format) must be sent to: simoMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueilc.pi.cnr.it The final version of the accepted papers should not be longer than 4,000 words or 10 A4 pages. Instructions for formatting and presentation of the final version will be sent to authors upon notification of acceptance. ORGANISING COMMITEE Alessandro Lenci (Universit� di Pisa, Italy) Simonetta Montemagni (Istituto di Linguistica Computazionale - CNR, Italy) Vito Pirrelli (Istituto di Linguistica Computazionale - CNR, Italy) PROGRAM COMMITTEE Harald Baayen (Max Planck Institute for Psycholinguistics - Nijmegen, The Netherlands) Rens Bod (University of Amsterdam, Holland) Michael R. Brent (Washington University, USA) Nicoletta Calzolari (Istituto di Linguistica Computazionale - CNR, Italy) Jean-Pierre Chanod (Xerox Research Centre Europe, Grenoble, France) Walter Daelemans (University of Antwerp, Belgium) Dekang Lin (University of Alberta, Edmonton, Canada) Horacio Rodriguez (Universidad Politecnica de Catalunya) Fabrizio Sebastiani (Istituto per l'Elaborazione dell'Informazione - CNR, Italy) Lucy Vanderwende (Microsoft Research, Redmond, USA) Fran�ois Yvon (Ecole Nationale Superieure des Telecommunications, Paris Frances) Menno van Zaanen (University of Amsterdam, The Netherlands) CONTACT PERSON Simonetta Montemagni Istituto di Linguistica Computazionale (ILC) - CNR Area della Ricerca di Pisa Via Moruzzi 1, 56124 Pisa, ITALY e-mail: simo
ilc.pi.cnr.it
Workshop on Wordnet Structures and Standardization and how these affect Wordnet Applications and Evaluation Workshop held in conjunction with the Third Language Resources and Evaluation Conference (LREC 2002) in Las Palmas, Spain May 28, 2002 CALL FOR PAPERS Wordnets, which are structured along the lines of the Princeton WordNet, have become popular lexical-semantic resources in the field of language technology. Various initiatives to monolingual and multilingual wordnet construction have been launched (EuroWordNet, BalkaNet, Portuguese Wordnet etc.), and numerous language processing tasks rely on wordnet resources and their implicit knowledge structures. Existing wordnets vary as with respect to their stage of development, coverage of concepts, encoding principles of linguistic contents and semantic relations, and thus their applicability in different NLP tasks. Furthermore, language-specific peculiarities of wordnets have to be considered in the field of cross-lingual applications. Recently attempts have been made towards the construction of wordnets for the less-studied languages, which are in need of reliable standards, yielding at the same time new perspectives on wordnet construction. This one-day workshop emphasizes two major topics: wordnet structures for less-studied languages on the one hand, and wordnet standardization, evaluation and application on the other hand. The workshop aims at bringing together wordnet builders and wordnet appliers from academia and industries in order to integrate the efforts being made by different sites. One major topic focuses on wordnets for less-studied languages, i.e. Eastern European and Scandinavian languages which have recently started developing sementic networks in order to exchange new approaches for linguistic structures and architectures of semantic networks and communicate their preliminary results to a wider research community. The other major topic discusses standardization issues for wordnets and wordnet-related tools, as well as evaluation of wordnet resources and the information encoded in them, and experiences with wordnet applications in the area of information retrieval and sense tagging. Conference topics: - guidelines and methodologies for building wordnets; - new approaches to wordnet construction; - building of wordnets for less-studied languages; - architecture of semantic networks and its relationship to the language type; - semantic relations of less-studied languages and their representations; - structure as language-independent module; - applicability of WordNet assumptions to other language types; - standardization of wordnet specifications including the Interlingual Index as a universal index of meaning; - standardization of wordnet representations as with respect to metalanguages (XML, etc.); - compatibility issues with regard to different formal representations; - criteria and methods for verifying the content encoded in wordnets; - consistency checking, comparison and evaluation of wordnet modules; - evaluation of the value being added by integrating wordnets in natural language processing tasks; - experiences from sense-tagging with wordnets. Submissions Papers are invited that will describe existing research connected to the topics of the workshop. Each presentation will be 20 minutes long (15 minutes and 5 minutes of discussion). Each submission should indicate: title; author(s); affiliation(s); and contact author's e-mail address, postal address, telephone and fax numbers. Abstracts (maximum 1.500 words, plain-text format) should be sent to the respective contact persons: Papers related to Wordnet Structures and Applications for the Less-Studied Languages should be submitted to: mathiouMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueceid.upatras.gr Papers related to Wordnet Applications, Standardization & Evaluation should be submitted to: kunze
sfs.nphil.uni-tuebingen.de All submissions will be reviewed by an international programme committee. Accepted papers will be published in the Workshop Proceedings. The final version of the accepted papers should be no longer than 4,000 words or 10 A4 pages. Instructions for formatting and presentation of the final version will be sent to authors upon notification of acceptance. Important Dates Deadline for abstract submission: 10th of February 2002 Notification of acceptance: 10th of March 2002 Final version of paper: 5th of April 2002 Pre-conference Workshop: 28th of May 2002 Organizing Committee Dimitris N. Christodoulakis (Patras University, Greece) Claudia Kunze/ Lothar Lemnitzer (University of Tuebingen, Germany) Karel Pala (Masaryk University Brno, Czech Republic) Contact Persons Prof. Dimitris N. Christodoulakis Databases Laboratory of Computer Engineering & Informatics Department Patras University GR 26500 Greece Phone: +30 61 960 385 Fax: +30 61 960 438 Email: dxri
cti.gr Claudia Kunze Seminar fuer Sprachwissenschaft Universitaet Tuebingen Wilhelmstr. 113 D-72074 Tuebingen Germany Phone: +49 7071 29 77474 Fax: +49 7071 551335 Email: kunze
sfs.uni-tuebingen.de Programme Committee Christiane Fellbaum (Princeton University, USA) Piek Vossen (Irion Technology Delft, The Netherlands) Kemal Oflazer (Sabanci University Istanbul, Turkey) Sofia Stamou (CTI Patras, Greece) Jeroen Hoppenbrouwers (Tilburg University, The Netherlands) Randee Tengi (Princeton University, USA) Wim Peters (Sheffield University, GB) Kadri Vider (Universtiy of Tartu, Estonia) Julio Gonzales (UNED Madrid, Spain) Palmira Marrafa (University of Lisboa, Portugal) Paul Buitelaar (DFKI Saarbruecken, Germany) Andreas Wagner (University of Tuebingen, Germany) Erhard Hinrichs (University of Tuebingen, Germany) Simonetta Montemagni (University of Pisa, Italy) R.J.H.M Ermers (Almaty, Kazakhstan) Workshop Fee for Conference participants: 90 EURO for others: 140 EURO To obtain further information about the workshop please visit http://www.lrec-conf.org/lrec2002/index.html or http://www.cti.gr/nlp/