Editor for this issue: T. Daniel Seely <seely
linguistlist.org>
********************************************************************** Call for Submissions Please Distribute Widely ********************************************************************** ENVGRAM COMPUTATIONAL ENVIRONMENTS FOR PRACTICAL GRAMMAR DEVELOPMENT, PROCESSING AND INTEGRATION WITH OTHER NLP MODULES Madrid, Spain, July 11 or 12, 1997 (in conjunction with ACL-97/EACL-97) WORKSHOP DESCRIPTION With a growing number of NLP applications going beyond the status of simple research systems, there is also a more evident need for better methods, tools and environments to support the development and reuse of large scale linguistic resources and efficient processors. This new area of research, often referred to as Linguistic Engineering, is rapidly gaining interest along side the more traditional ones concerned with formalisms or algorithm studies and development. Aspects of linguistic engineering range from grammar development environments, through the construction and maintenance of large scale linguistic resources, to methodologies for quality assurance and evaluation. Some of the most prominent examples of sophisticated development platforms comprising tracer, debugger and all kinds of highly important visualization tools are ALEP (funded by the European Union), GATE (common infrastructure for building LE architectures using pre-existing components), GWB (LFG-workbench developed at Xerox Parc) PAGE (typed feature logics-based grammar development developed at DFKI), and many others. There have been a number of projects on the development of large-scale computational lexicons (e.g. Acquilex), as well as projects concerned with the development of standards and reference data for diagnostics and evaluation (e.g. TSNLP). However, while these platforms and components typically provide fairly clean formalisms, processing components and data, it is not yet clear to which extent current results and approaches fit the requirements for scale development and deployment of real NLP applications. In this connection, a number of pending issues need be addressed, the relevance of which becomes particularly clear when the focus is shifted from linguistic formalism to usability and user/application requirements. The following points are examples of relevant topics: - What is the state of the art in Grammar Development Environments? There are a number of systems on the market already. Given the enormous cost of developing such environments, it is unlikely that many others will be developed from scratch. Up to what point do the existing systems meet actual user requirements? What experiences are there in tailoring such systems to specific applications? - How can we meet the demands arising from distributed grammar development? Even if in the past the biggest systems have been based on the work of one individual, it is unwise and unpractical to have one large grammar developed by single writers. Thus, the development and maintenance of large grammars tends to be more and more a joint effort involving many computational linguists. What specific requirements and prerequisites have to be met in a development environment to ensure a smooth cooperation between different authors leading to the necessary modularity, consistency and integratability of grammar fragments? - How can we meet the demands of multi-lingual grammar development? For many applications (even outside machine translation itself) multi-linguality is becoming an indispensable standard feature. The parallel development of several grammars in different languages will require some synchronization of linguistic knowledge bases and sharing of processing components. Can different language specific grammars share a common core grammar? Is it useful to build on modern formalisms which allow an object oriented design (such as typed feature logics) or even on theories of a putative "universal grammar". - What is the appropriate division of labour in a large scale development environment? Sophisticated applications may require a whole range of knowledge sources and processors, addressing, e.g. computational morphology, syntax, semantics, lexicography, corpus analysis, parsing and generation to name but a few. What approaches and methods can be devised and which tools and facilities should be employed to facilitate and support the integration of different levels of linguistic abstraction, of different processing modules and the cooperation between grammar writing and processor design ? - How can we facilitate the shift from reusability to usability? Grammar development in academic and research oriented environments has often concentrated on the maximum generality and reusability of the linguistic resources developed. However, for building actual applications and for applying systems to specific domains, this generality can turn out to be a drawback rather than an asset. Thus, the question is how one can support the specialization and customization to more constrained domains without sacrificing the advantages of more a more general and reusable design. - What are the necessary ingredients for quality assurance in grammar development? The incremental construction of large grammars in particular in a distributed environment makes it necessary to maintain sufficient control over different versions. Coverage and speed are expected to increase over the development cycles. Quality assurance, testing and diagnostics cannot be carried out properly, if they are based on the odd collection of test items or some arbitrarily chosen corpus fragment. Evaluation of a system, which goes even further, will require a minimum degree of standardization of reference material. What are then the appropriate methods and data to be applied for these purposes? How can they be constructed, collected and customized to specific applications and domains? The workshop will be the occasion to discuss the results achieved and the most promising directions and to highlight pending problems. Contributions are solicited from institutions (both research-oriented and industrial) involved in the production of NLP applications. Invited Speaker Hans Uzkoreit (DFKI) "Reference Data and Grammar Development Environments" ORGANIZING COMMITTEE Fabio Pianesi (Primary Contact), IRST, Italy (pianesiMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueirst.itc.it) Dominique Estival, University of Melbourne, Australia (D.Estival
linguistics.unimelb.edu.au) Alberto Lavelli, IRST, Italy (lavelli
irst.itc.it) Klaus Netter, DFKI, Germany (netter
dfki.uni-sb.de) PROGRAMME COMMITTEE Harry Bunt, Tilburg University, The Netherlands Bob Carpenter, Lucent Technologies Bell Labs, USA Jochen Dorre, University of Stuttgart, Germany Dominique Estival, University of Melbourne, Australia Dan Flickinger, CSLI Stanford, USA Klaus Netter, DFKI, Germany Fabio Pianesi, IRST, Italy Steven Pulman, SRI Cambridge, UK Antonio Sanfilippo, Sharp, UK PROGRAMME CHAIRS Klaus Netter, DFKI, Germany Fabio Pianesi, IRST, Italy SUBMISSIONS Authors are asked to submit previously unpublished papers; ALL SUBMISSIONS SHOULD BE SENT TO FABIO PIANESI. A limited number of position papers could also be considered. Each submission will undergo multiple reviews. The papers should be full length (not exceeding 3200 words, exclusive of references), also including a descriptive abstract of about 200 words. Electronic submissions are strongly preferred, either in self-contained LaTeX format (using the ACL-97 submission style; see: ftp://ftp.cs.columbia.edu/acl-l/, as well as the submission guidelines for the main conference, at http://www.ieec.uned.es/cl97/), or as a PostScript file. In exceptional circumstances, Microsoft Word files will also be accepted as electronic submissions, provided they follow the same formating guidelines. Hard copy submissions should include eight copies of the paper. A separate title page should include the title of the paper, names, addresses (postal and e-mail), telephone and fax number of all authors. Any correspondence will be addressed to the first author (unless otherwise specified). Authors will be responsible for preparation of camera-ready copies of final versions of accepted papers, conforming to a uniform format, with guidelines and a style file to be supplied by the organisers. REQUIREMENTS A paper accepted for presentation cannot be presented or have been presented at any other meeting. Please indicate in your submission if you have submitted your paper to another conference. ORGANISATION OF SESSIONS Presentations will be allocated 25 minute slots each, plus an extra five minutes for discussion, distributed over morning and afternoon sessions, including an invited talk and a (closing) general discussion. WORKSHOP PARTICIPATION Workshop attendance will be limited to maximally 40 people, persons without a submission should contact the organizers as soon as possible. According to the ACL/EACL workshop guidelines, all workshop participants must register for the ACL/EACL main conference. DEMOS Depending on the availability of time and appropriate computing facilities, a demo session will be organised. SCHEDULE Submission deadline: 10 March 1997 Notification of acceptance: 4 April 1997 Camera-ready versions of accepted papers due: 27 April 1997 Workshop: 11 or 12 July 1997 ADDRESS FOR SUBMISSIONS AND FURTHER INFORMATION Fabio Pianesi IRST - Istituto per la Ricerca Scientifica e Tecnologica 38050, Povo Trento, Italy tel: +461-314327 fax: +461-302040 e-mail: pianesi
irst.itc.it