Editor for this issue: Jody Huellmantel <jody
linguistlist.org>
PLEASE NOTE THAT THE DEADLINE FOR ROMAND2000 HAS BEEN EXTENDED TO AUGUST 12th First Call for Papers ROMAND 2000 1st workshop on RObust Methods in Analysis of Natural language Data *** EXTENDED DEADLINE *** Department of Computer Science Swiss Federal Institute of Technology - Lausanne October 19-20 2000 http://lithwww.epfl.ch/romand2000/ ROMAND 2000 is the first of a series of workshop that aims at bringing together researchers working on robust methods in natural language processing. The term "natural language" is intended as all possible modalities of human communication and it is not restricted to written or spoken language. The main goal of the workshop will be to bring together researchers working in fields like artificial intelligence, computational linguistics, human-computer interaction, cognitive science who are facing with the problem of feasible and reliable systems implementation. Theoretical aspects of robustness in NLP are welcome as well as engineering and industrial experiences. The workshop will be held in collaboration with the TALN 2000 conference (le Traitement Automatique des Langues Naturelles - Automatic Natural Language Processing) which will be held in Lausanne from October 16th to 18th. The ROMAND workshop will be held just afterwards, from the 19th to 20th. We invite abstracts on all topics related to robustness in natural language processing, including, but not limited to: Robust Text Analysis Information Extraction Spoken Dialogue systems Multimodal human-computer interfaces Natural Language Architectures NLP and Soft Computing Robust Semantics Underspecification Multimedia document analysis Robust Parsing Complexity of linguistic analysis Hybrid methods in computational linguistics Text Mining SUBMISSION PROCEDURE: Authors should submit an anonymous extended abstract of at most 6 (included references) single-column pages with 10' body font size (for talks with a duration of 20' plus 10' discussion) together with a separate page specifying the authors' names, affiliation, address, and e-mail address. The abstracts should be submitted electronically (in postscript or pdf format) to: romandMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueepfl.ch. IMPORTANT DATES: Papers due: August 12th Acceptance notice: August 28th Final version due: September 28th Conference: October 19-20 WORKSHOP COMMITTEE: Program chairs are Afzal Ballim Afzal.Ballim
epfl.ch Vincenzo Pallotta Vincenzo.Pallotta
epfl.ch Hatem Ghorbel Hatem.Ghorbel
epfl.ch Program committee Steve Abney Wolfgang Menzel Jean-Pierre Chanod Alberto Lavelli Rens Bod Giorgio Satta Joachim Niehren Roberto Basili Maria Teresa Pazienza Manuela Boros Diego Molla' Aliod Herv� Bourlard B. Srinivas C.J. Rupp Peter Asveld ORGANIZATION: This year's workshop is organized in collaboration with the TALN 7eme conf�rence annuelle sur LE TRAITEMENT AUTOMATIQUE DES LANGUES NATURELLES ( http://liawww.epfl.ch/taln2000/ ). The workshop will take place at the Swiss Federal Institute of Technology, Lausanne. The workshop is endorsed by ATALA (Association pour le Traitement Automatique des LAngues). REGISTRATION: Details about the registration procedure will be posted later at the official web site. The registration fee will be: Normal registration: 150.- CHF For registered TALN attendee: 100.- CHF FURTHER INFORMATION: For any information related to the organization, please contact: Vincenzo Pallotta DI-LITH EPFL IN F Ecublens 1015 Lausanne Switzerland tel. +41-21-693 52 97 fax. +41-21-693 52 78 Vincenzo.Pallotta
epfl.ch News about the conference will be posted on the workshop's Web page at http://lithwww.epfl.ch/romand2000/ - Pour le comit� d'organisation de TALN 2000, For the organising committee of TALN 2000, Cristian Ciressan
CALL FOR PARTICIPATION Web-Based Language Documentation and Description Philadelphia USA, 12-15 December 2000 http://www.ldc.upenn.edu/exploration/ Institute for Research in Cognitive Science University of Pennsylvania Organizers: Steven Bird (U Penn) and Gary Simons (SIL International) [The full version of this abridged CFP is available from the above page.] This workshop will lay the foundation of an open, web-based infrastructure for collecting, storing and disseminating the primary materials which document and describe human languages, including wordlists, lexicons, annotated signals, interlinear texts, paradigms, field notes, and linguistic descriptions, as well as the metadata which indexes and classifies these materials. The infrastructure will support the modeling, creation, archiving and access of these materials, using centralized respositories of metadata, data, best practice guidelines, and open software tools. BACKGROUND Recent years have witnessed dramatic advances in the mass storage and web delivery technologies, making it possible to house virtually unlimited quantities of speech data online, and to disseminate this data over the web. The development of XML and Unicode greatly facilitate the interchange and reuse of structured multimodal and multilingual data and the development of interoperating software tools. These developments are having a pervasive influence on the way primary linguistic data are gathered, stored, analyzed and disseminated, as demonstrated by the initiatives surveyed on the linguistic exploration page (http://www.ldc.upenn.edu/exploration/ ), and the papers presented at the Linguistic Exploration Workshop at the Chicago LSA Meeting (http://www.ldc.upenn.edu/exploration/LSA/ ). CHALLENGES With these new technological opportunities are concomitant needs and challenges for modeling, creating, archiving and accessing data: I Data Models. A diverse range of data types are required in language documentation and linguistic fieldwork, including word lists, lexicons, annotated signals, writing system documentation, interlinear texts, paradigms, field notes, and linguistic descriptions. We need flexible and general models for these data types (including links between them), and good ways to represent information which is either partial, uncertain, evolving, or disputed. We need to develop a consensus in the community regarding best practice for modeling these kinds of data, to ensure maximal reusability of data and software. II Data Archives. Whether just the private collection of a single researcher or a large and centralized repository, language data needs to be stored and reused. To support this, we need durable and open storage and interchange formats that embody the best practice consensus. We need to convert (parochial) 8-bit character codings to Unicode, using a general tool for character conversion along with a host of conversion tables for specific character sets. We also need to convert markup into the best practice formats we have defined. We need a mechanism to support durable citation of data, so that document authors do not need to duplicate all the data they reference just to be sure that the links will not break. More generally, we need a metadata standard for indexing the resources, regardless of format and availability, and a wide-coverage index conforming to the standard, so that someone interested in a particular language or region can find all the electronic resources that are pertinent to it, without having to determine how each of several different archives have named and classified their holdings. III Data Creation. Now that mass storage is so inexpensive, researchers are creating large amounts of digital data covering the types listed above. Both the number and scale of these collection efforts are growing rapidly. We need software tools supporting data creation, conforming with best practice, and covering primary collection of textual data (wordlists, texts) and recordings (audio, video, physiological), along with transcription and annotation of the primary materials conforming to a broad range of descriptive and analytical practices. IV Data Access. Once data has been created and archived, there exist a variety of access modes. A region of data is identified by browsing, by launching a query, or by following a reference. The selection is displayed according to appropriate conventions and styles, or converted into some other form (e.g. for statistical analysis and visualization). The selection may be corrected, imported into a document, analyzed, and annotated, leading to the creation of secondary data and/or the elicitation of new primary data. We need to develop suitable delivery mechanisms including stylesheets, conversion tools, indexing methods, and query languages, which encompass the needs for security and privacy. We need standard application programming interfaces and a library of reusable components, to support the development of software for new modes of access. Many of the activities listed above are already underway; the lure of the technology is great despite the lack of infrastructure. However, it is beyond the capacity of any single individual or institution to develop this infrastructure of standards and tools on their own. There is a pressing need for close cooperation between these initiatives, so that scarce human, software and data resources are used optimally. WORKSHOP OBJECTIVES This workshop will lay the foundation of an open, web-based infrastructure for collecting, storing and disseminating the primary materials which document and describe human languages. The infrastructure will support the modeling, creation, archiving and access of these materials, using centralized respositories of metadata, data, best practice guidelines, and open software tools. To meet this goal, we have identified three main objectives which can be substantially achieved at the present time: Objective 1: to develop a comprehensive framework which identifies all the infrastructural needs, designates appropriate roles for existing results as pieces of an overall solution, and sets out a coordinated response to the remaining challenges. Objective 2: to found centralized repositories (and nominate existing ones) for housing components of the infrastructure, so that data, tools, formats and standards can be collected, indexed, and made available to the community. Objective 3: to begin construction of the repositories, by identifying the contribution of past and present activities by the participants and by other individuals and institutions, and by gathering the results and their documentation. CALL FOR PARTICIPATION The workshop will include paper presentations and working sessions to develop the infrastructure. Interested members of the community are invited to participate in the workshop. There is a limit on available places, and participants will be identified on the basis of submitted abstracts. Funding is available for authors of accepted papers. Abstracts. One page abstracts are invited which describe substantive contributions to the repositories, or which discuss concrete problems for web-based language documentation and description, and describe possible solutions. Papers. Authors of accepted abstracts will be asked to prepare a 2-3,000 word paper plus associated materials. Address submissions to: Steven.BirdMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueldc.upenn.edu, Gary_Simons
sil.org Timetable. Friday 1 September Abstract deadline Friday 29 September Acceptance notification Friday 24 November Paper deadline 12-15 December Workshop IMPORTANT: FOR FURTHER INFORMATION Intending authors should consult the EXTENDED CFP, available from the linguistic exploration page (http://www.ldc.upenn.edu/exploration/ ). To be sure of receiving future announcements, please subscribe to the LINGUISTIC-EXPLORATION mailing list, referenced from that page. - Steven Bird Gary Simons University of Pennsylvania SIL International Steven.Bird
ldc.upenn.edu Gary_Simons
sil.org http://www.ldc.upenn.edu/sb http://www.sil.org/SIL/roster/simons.htm