Editor for this issue: Marie Klopfenstein <marie
linguistlist.org>
Ideas and Strategies for Multilingual Grammar Development Location: Vienna, Austria Date: 25-AUG-03 - 29-AUG-03 Call Deadline: 14-Mar-2003 Contact Person: Melanie Siegel Meeting Email: siegelMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuedfki.de Linguistic Subfield(s): Computational Linguistics This is a session of the following conference: 15th European Summer School in Logic, Language and Information Meeting Description: In this workshop at the 2003 European Summer School in Logic, Language, and Information (ESSLLI2003) in Vienna, participants will address the issue of building a methodology for parallel grammar development in linguistically rich frameworks. This methodology should guide the definitions of common formats, procedures, development tools, grammar components, and documentation practice, as well as standardized evaluation methods. Final Call for Papers ESSLLI Workshop 25 to 29 August 2003 Ideas and Strategies for Multilingual Grammar Development Taking place during ESSLLI 2003 (18-29 August), Vienna http://www.logic.at/esslli03/ Workshop Website: http://www.dfki.uni-sb.de/~siegel/esslli/ In this workshop at the 2003 European Summer School in Logic, Language, and Information (ESSLLI2003) in Vienna, participants will address the issue of building a methodology for parallel grammar development in linguistically rich frameworks. This methodology should guide the definitions of common formats, procedures, development tools, grammar components, and documentation practice, as well as standardized evaluation methods. Topics of the workshop include: - Methodology for multilingual, broad-coverage, deep grammar development within linguistically rich frameworks such as those using unification-based grammars. Such approaches may include grammar templates, external specifications, or other tactics. - Guidelines for grammar writers either in the initial stages of development or as long-term best practices. - Organization of a grammar in layered, reusable structures. - Strategies for adapting existing resources such as taggers or morphological analyzers. The workshop will be held during the second week of ESSLLI2003, 25-29 August 2003, with each of the five sessions allowing for the presentation of three 20-minute papers followed by discussion. Submission details: Abstracts of not more than four pages (with a minimum font size of 11pt and margins of 2.5cm) on any of the above topics are due by Friday, 14 March 2003 with electronic submission in either PostScript or PDF format, to multigram
coli.uni-sb.de. Reviewing will be done anonymously, and the final program will be determined by the workshop organizers based on these reviews. Authors will be advised of the results by Monday, 5 May 2003. Full papers to be included in the workshop proceedings will be due by Saturday, 24 May 2003.
************************************************************** ACL2003 News Letter No.3 (4th of March, 2003) ************************************************************** Hitoshi Isahara (Publicity Chair, CRL) and Masaki Murata (CRL) - --------------------------------------------------------------- Venue: Convention Center of Sapporo, Sapporo, JAPAN Dates: Tutorials and Pre-conference Workshops: July 7, 2003 Main Conference: July 8-10, 2003 Post-conference Workshops: July 11-12, 2003 (For details, see the Web site http://www.ec-inc.co.jp/ACL2003/) This news letter includes 1) News from Program Committee of Main Conference 2) Extended Deadline of Student Research Workshop 3) Life Time Achievement Award 4) Abstracts of Tutorials 4-1) Finite State Language Processing 4-2) Maximum Entropy Models, Conditional Estimation, and Optimization without the Magic 4-3) Knowledge Discovery from Text 4-4) Spoken Language Processing: Separating Science Fact from Science 5) Deadlines and Web Sites 5-1) Student Research Workshop 5-2) Interactive Poster/Demo Sessions 5-3) Associated Conferences (EMNLP2003 and IRAL2003) 5-4) ACL Workshops 5-5) Exhibits and Sponsorship 6) Important Announcements from Several Associated Conferences and Workshops - --------------------------------------------------------------- 1) News from Program Committee of Main Conference 376 papers were submitted to the main conference. This is far more than we expected. Thank you for your interest in ACL2003. - --------------------------------------------------------------- 2) Extended Deadline of Student Research Workshop The paper submission deadline of Student Research Workshop was extended: Paper submission deadline: March 15, 2003 (extended) (Note that we NO LONGER require early registration of papers.) Web site: http://tangra.si.umich.edu/clair/acl03-student/ We would appreciate it if you could inform your students that the deadline has been extended. - --------------------------------------------------------------- 3) Life Time Achievement Award A ceremony for the second Life Time Achievement Award will be held during ACL 2003. The LTA was established at the 40th anniversary conference of ACL last year. The first winner of the LTA was Prof. Aravind Joshi of the University of Pennsylvania. - --------------------------------------------------------------- 4) Abstracts of Tutorials There will be four tutorials, to be given by leading experts in language and speech processing. The tutorials will take place on July 7. The abstracts of the tutorials and the profiles of the speakers will be described on the ACL-03 web site. For details, see the Web site http://www.ec-inc.co.jp/ACL2003/Tutorials.html. - ----- 4-1) Finite State Language Processing Gertjan van Noord (University of Groningen, The Netherlands) Finite state automata are well-understood, and inherently compact and efficient models of simple languages. In addition, finite state automata can be combined in various interesting ways, with the guarantee that the result again is a finite state automaton. In the introductory part of the tutorial, finite state acceptors and finite state transducers (both weighted and unweighted) are introduced, and we briefly review their formal and computational properties. In the second part of the tutorial, we illustrate the use of finite state methods in dictionary construction. In particular, we present an application of perfect hash automata in tuple dictionaries. Tuple dictionaries provide a very compact representation of huge language models of the kind typically used in NLP applications (including Ngram language models). In the third part of the tutorial we focus on regular expressions for NLP. The type of regular expressions used in modern NLP applications has evolved dramatically from the regular expressions found in standard Computer Science textbooks. In recent years, various high level regular expression operators have been introduced (such as contexted replacement operators). The availability of more and more abstract operators make the regular expression notation more and more attractive. The tutorial provides an introduction into the regular expression calculus. The examples use the notation of the Fsa Utilities toolkit: a freely available implementation of the regular expression calculus. We introduce various regular expression operators for acceptors and transducers. We then continue to show how new regular expression operators can be defined. In the last part of the tutorial, we focus in more detail on regular expression operators that turned out to be useful for the description of certain aspects of phonology using ideas from Optimality Theory. This part of the tutorial describes the lenient composition operator of Karttunen, and the optimality operator of Gerdemann and van Noord, as well as a number of alternatives (Eisner, Jaeger). - ----- 4-2) Maximum Entropy Models, Conditional Estimation, and Optimization without the Magic Dan Klein and Christopher D. Manning (Stanford University, USA) This tutorial presents the foundations of maximum entropy models, optimization methods to learn them, and various issues in the use of graphical models more complex than simple naive-Bayes (NB) or HMM models. The focus is on intuition and understanding, using visual illustrations and simple examples rather than detailed derivations whenever possible. Maximum Entropy Models: What maximum entropy models are, from first principles, what they can and cannot do, and how they behave. Lots of examples. The equivalence of maxent models and maximum-likelihood exponential models. The relationship between maxent models and other classifiers. Smoothing methods for maxent models. Basic Optimization: Unconstrained optimization: convexity, gradient methods (both simple descent and more practical conjugate methods). Constrained optimization: Lagrange multipliers and several ways of turning them into a concrete optimization system. Other fun things to do with optimization. Specialized iterative scaling methods vs. general optimization. Model Structures: Conditional independence in graphical models (focusing on NB, HMMs, and PCFGs). Practical ramifications of various independence assumptions. Label and observation biases in conditional structures. Survey of sequence models (HMMs, MEMMs, CRFs, and dependency networks). Prerequisites: Familiarity with basic calculus and a working knowledge of NB and HMMs are required. Existent but possibly vague knowledge of general Bayes' nets or basic information theory is a plus. Most importantly: a low tolerance for conceptual black boxes labeled "magic here". - ----- 4-3) Knowledge Discovery from Text Dan Moldovan (University of Texas at Dallas, USA) Roxana Girju (Baylor University, USA) Knowledge Discovery is a fast growing area of research and commercial interest. While knowledge may be discovered from many sources of information, this tutorial focuses on the discovery of knowledge from open texts, the largest source of knowledge. The problem of Knowledge Discovery from Text (KDT) is to extract explicit and implicit concepts and semantic relations between concepts using Natural Language Processing techniques. The discovery process is guided by the notion of context specified either by seed concepts or in some other more formal way. KDT, while deeply rooted in NLP, actually draws on methods from statistics, machine learning, reasoning, information extraction, knowledge management, cognitive science and others for its discovery process. The emphasis here is on the automatic discovery of new concepts and on the large number of semantic relations that link them. This tutorial presents recent results from KDT research and system implementations. Since the goal of KDT is to get insights into large quantities of text data and bring to bear text semantics, it plays an increasingly significant role in emerging applications, such as Question Answering, Summarization, Text Understanding and Ontology Development. This tutorial is aimed at researchers, practitioners, educators, and research planners who want to keep in sync with the newly emerging KDT technology. - ----- 4-4) Spoken Language Processing: Separating Science Fact from Science Fiction Roger K. Moore (20/20 Speech Ltd, UK) The advent of talking and listening machines has long been hailed as "the next big thing" in human-machine interaction. Indeed only recently, the IEEE Spectrum magazine (September 2002) named speech as one of five technologies likely to reap big market rewards in the next five years. Certainly, the frequency with which members of the general public come across speech-enabled applications in their everyday lives does seem to be on the increase, and the marketplace is currently able to support a number of sizeable commercial companies who are supplying speech-based products and services - as well as a growing academic community of speech scientists and engineers. This apparent progress has been fuelled by a number of key developments: the relentless increase in available computing power, the introduction of 'data-driven' techniques for speech pattern modelling, and the institution of public system evaluations. This tutorial will chart the main advances that have been made in spoken language processing algorithms and applications over the past few years. The key enabling technologies of 'automatic speech recognition', 'text-to-speech synthesis' and 'spoken language dialogue' will be explained in some detail, with emphasis being placed on how the technology works and, perhaps more importantly, why it sometimes doesn't. Insight will also be given into the linguistic/paralinguistic properties of speech signals and human spoken language, and comparisons will be drawn between the capabilities of 'automatic' and 'natural' spoken language processing systems. The tutorial is aimed at both specialists and non-specialists in the language prcessing field, and will be of great interest to anyone who is keen to develop a greater understanding of the main issues involved in spoken language processing. Prof. Moore will cover theoretical and practical aspects of the inner workings of state-of-the-art spoken language systems, as well as providing a balanced overview of their capabilities in relation to other modes of human-machine interaction. The tutorial will incorporate question-and-answer opportunities, and will conclude with a survey of open research issues and some predictions for the future. - --------------------------------------------------------------- 5) Deadlines and Web Sites The student research workshop, the interactive poster/demo sessions, the associated conferences (EMNLP2003 and IRAL2003) and the workshops have their own submission deadlines and sites. Please see the web sites for the details. - ----- 5-1) Student Research Workshop Paper submission deadline: March 15, 2003 (extended) Web site: http://tangra.si.umich.edu/clair/acl03-student/ - ----- 5-2) Interactive Poster/Demo Sessions Paper submission deadline: May 1, 2003 Web site: http://cl.aist-nara.ac.jp/staff/matsu/poster.html - ----- 5-3) Associated Conferences (EMNLP2003 and IRAL2003) AC1 The Eighth Conference on Empirical Methods in Natural Language Processing (EMNLP2003) Submission deadline: April 4, 2003 Conference date: July 11-12, 2003 Web site: http://www.ai.mit.edu/people/mcollins/emnlp03.html AC2 The Sixth International Workshop on Information Retrieval with Asian Languages (IRAL2003) Submission deadline: April 15, 2003 Conference date: July 7, 2003 Web site: http://research.nii.ac.jp/IRAL2003/ - ----- 5-4) ACL Workshops WS1 Multilingual Summarization and Question Answering - Machine Learning and Beyond Submission deadline: April 21, 2003 Workshop date: July 11-12, 2003 Web site: http://www.isi.edu/~cyl/msqa-ml-acl2003/ WS2 Natural Language Processing in Biomedicine Submission deadline: April 10, 2003 Workshop date: July 11, 2003 Web site: http://www-tsujii.is.s.u-tokyo.ac.jp/ACL03/bionlp.htm WS3 The Lexicon and Figurative Language Submission deadline: April 13, 2003 Workshop date: July 11, 2003 Web site: http://www.cs.bham.ac.uk/~amw/ACLWorkshop.html WS4 Multilingual and Mixed-language Named Entity Recognition: Combining Statistical and Symbolic Models Submission deadline: April 4, 2003 Workshop date: July 12, 2003 Web site: http://research.microsoft.com/conferences/mulner-acl03/ WS5 The Second International Workshop on Paraphrasing: Paraphrase Acquisition and Applications Submission deadline: April 21, 2003 Workshop date: July 11, 2003 Web site: http://nlp.nagaokaut.ac.jp/IWP2003/ WS6 Second SIGHAN Workshop on Chinese Language Processing Deadline: the workshop submission deadline: March 10, 2003 Deadline: the word segmentation bakeoff: April 22-25, 2003 Workshop date: July 11-12, 2003 URL: the workshop: http://www.sighan.org/swclp2/ URL: the bakeoff: http://www.sighan.org/bakeoff2003/ WS7 Multiword Expressions: Analysis, Acquisition and Treatment Submission deadline: April 5, 2003 Workshop date: July 12, 2003 Web site: http://www.cl.cam.ac.uk/users/alk23/mwe/mwe.html WS8 Linguistic Annotation: Getting the Model Right Submission deadline: April 5, 2003 Workshop date: July 11, 2003 Web site: http://www.cs.vassar.edu/~ide/events/ACL2003-LR/ WS9 Workshop on Patent Corpus Processing Submission deadline: April 10, 2003 Workshop date: July 12, 2003 Web site: http://www.slis.tsukuba.ac.jp/~fujii/acl2003ws.html WS10 Towards a Resources Information Infrastructure Submission deadline: April 13, 2003 Workshop date: July 11-12, 2003 Web site: http://www.elsnet.org/acl2003-workshop/ - ----- 5-5) Exhibits and Sponsorship Application Deadline for both: April 1, 2003 For details, see Exhibits and Sponsorship at http://www.ec-inc.co.jp/ACL2003/. - --------------------------------------------------------------- 6) Important Announcements from Several Associated Conferences and Workshops - ----- AC1 The Eighth Conference on Empirical Methods in Natural Language Processing (EMNLP2003) Abstract: SIGDAT, the Association for Computational Linguistics' special interest group on linguistic data and corpus-based approaches to NLP, invites submissions to EMNLP 2003. The conference will be held on July 11-12 in Sapporo, Japan, immediately following the 41st meeting of the ACL (ACL 2003). URL: http://www.ai.mit.edu/people/mcollins/emnlp03 Deadline: 4 April 2003 - ----- WS1 Multilingual Summarization and Question Answering - Machine Learning and Beyond Abstract: Automatic summarization and question answering (QA) aim at producing a concise representation of the key information content. Rule-based or statistical-based approaches to summarization and QA systems have shown promising results; it is, however, very difficult to find good evaluation functions or rules that work well across domains. In consequence, various machine learning (ML) techniques have recently been applied to summarization and QA systems. The purpose of this workshop is to provide a forum for exploring the commonality underling this diversity of problem domains and approaches. Deadline: 21 April 2003 - ----- WS2 Natural Language Processing in Biomedicine Invited speaker: Prof. Carol Friedman, CUNY/ Columbia University 'Opportunities and Challenges for NLP in Biomedicine' The aim of this workshop is to bring together NLP researchers in biomedicine and to discuss recent advances in the computational analysis of text, which go beyond traditional keyword-based indexing methods and begin to offer content-based analysis. Knowledge discovery in the rapidly growing area of biomedicine is of paramount importance. Processing biomedical texts is a challenge especially in the areas of terminology, ontology building, information extraction, annotation tools, sharing and integration of knowledge from factual and textual data bases and evaluation of biomedical applications among others. One of the aims of the workshop is to create SIGs in areas of common interest such as annotation standards in biology, evaluation metrics, standardisation of terminological resources etc Submission deadline: April 10, 2003 Workshop date: July 11, 2003 Web site: http://www-tsujii.is.s.u-tokyo.ac.jp/ACL03/bionlp.htm - ----- WS3 The Lexicon and Figurative Language Abstract: The lexicon has variously been treated as a list of word senses, a list of hierarchically related senses, (e.g. WordNet), and as a structured entity containing rich lexical representations and means to generate novel uses of words. Figurative language poses problems for all these approaches, and a common claim is that metaphor is a cognitive not a linguistic phenomenon; instead, word senses are related in terms of their underlying conceptual domains. The major theme of this SIGLEX endorsed workshop is to explore and attempt to reconcile these different approaches to figurative language and the lexicon - although papers exploring other aspects of figurative language will also be welcome. Deadline: 13 April 2003 Web site: http://www.cs.bham.ac.uk/~amw/ACLWorkshop.html - ----- WS4 Multilingual and Mixed-language Named Entity Recognition: Combining Statistical and Symbolic Models Invited speaker: David Yarowsky Named Entity (NE) Recognition systems vary widely, from high-speed bulk methods optimized for indexing, to deep semantic parsers tuned for specific domains. Optimal ways to combine statistical and symbolic models also vary, depending on applications and tasks. Is it possible to: -maximize use of knowledge-rich resources (e.g. lexicons, NE grammars, parsing) while permitting corpus-based training for domain or language? -acquire and share resources (including lexicons and grammars) across languages? -balance performance speed with reasonable accuracy? -use specific language patterns while permitting rapid transfer to another language? -minimize variability in results across language types? We welcome research on combined models, in which these tradeoffs are calculated in particular ways. Demonstrations of implemented NE systems are also welcome. Submit papers by April 4 electronically in Word, PDF or PostScript format. Assign a filename based on the paper's title, transfer to ftp://ftp.research.microsoft.com/incoming/josephp then email an identification page with title, author(s), contact details, and filename to molsenMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuemicrosoft.com URL: http://research.microsoft.com/conferences/mulner-acl03/ - ----- WS5 Second International Workshop on Paraphrasing: Paraphrase Acquisition and Applications Abstract: Paraphrases, variant ways of conveying the same information, are of interest because they present challenges for many NLP tasks, such as MT, IR, QA, etc. This workshop is open to investigation of all aspects of paraphrase, with a particular focus on the automatic acquisition of paraphrases from corpora, and on the development of a standardized paraphrase framework or resource for use in applications. URL: http://nlp.nagaokaut.ac.jp/IWP2003/ Deadline: 21 April 2003 - ----- WS6 Second Sighan Workshop on Chinese Language Processing (July 11-12, 2003) Abstract: As more resources for Chinese NLP have become available to the public recently, it is crucial to set up a platform that allows easy comparison of different approaches to various NLP tasks. Sighan is conducting a word-segmentation bakeoff before the workshop. Researchers all over the world are welcome to participate. As a part of this Sighan workshop, we are going to release the bakeoff results, followed by the presentation of bakeoff participants and the general discussions on future evaluations. A second part of the workshop will consist of presentations of papers on all aspects of Chinese language processing. URL: the workshop: http://www.sighan.org/swclp2/ URL: the bakeoff: http://www.sighan.org/bakeoff2003/ Deadline: the workshop submission deadline: March 10, 2003 Deadline: the word segmentation bakeoff: April 22-25, 2003 - ----- WS7 Multiword Expressions: Analysis, Acquisition and Treatment The workshop will concentrate on the analysis, acquisition and treatment of multiword expressions (MWEs), such as phrasal verbs (e.g. "add up"), nominal compounds (e.g. "radar footprint"), and institutionalized phrases (e.g. "salt and pepper"). In particular we focus on addressing the problems that MWEs pose for natural language processing applications. URL: http://www.cl.cam.ac.uk/users/alk23/mwe/mwe.html Submission Deadline: 05 April 2003 - ----- WS9 Workshop on Patent Corpus Processing Abstract: The goal of this workshop is to foster research and development of the technology for patent corpus processing, by providing a forum in which researchers and practitioners can exchange and share their ideas, approaches, perspectives, and experiences from their work in progress. We invite both research papers and project papers associated with, but not limited to, the rudiments of patent corpus processing. We also invite papers addressing applications and user studies. Deadline: 10 April 2003 ============================================================