Editor for this issue: Jody Huellmantel <jody
linguistlist.org>
FINAL CALL FOR PAPERS ACL Workshop: COMPARING CORPORA October 2000 Hong Kong University of Science and Technology THEME ===== Anyone who has worked with corpora will be all too aware of differences between them. Depending on the differences, it may, or may not, be reasonable to expect results based on one corpus to also be valid for another. It may, or may not, be appropriate for a grammar, or parser, based on one to perform well on another. It may, or may not, be straightforward to port an application from a domain of the first text type to a domain of the second. Currently, characterisations of corpora are mostly textual and at different levels of generality. A corpus is described as ``Wall Street Journal'' or ``transcripts of business meetings'' or ``foreign learners' essays (intermediate grade)''. It would be desirable to be able to place a new corpus in relation to existing ones, and to be able to quantify similarities and differences. Allied to corpus-similarity is corpus-homogeneity. An understanding of homogeneity is a prerequisite to a measure of the similarity -- it makes little sense to compare a corpus sampled across many genres, like the Brown, with a corpus of weather forecasts, without first accounting for the one being broad, the other narrow. Given the centrality of corpora to contemporary language engineering, it is remarkable how little research there has been to date on the question. Biber's work, coming from sociolinguistics, has made a considerable impact, with various researchers in computational lingustics taking forward the model (Biber 1989, 1995). Studies in text classification, genre and sublanguage are also salient, but it is rarely evident how well the technologies ddeveloped in these fields are suited to measuring corpus similarity or homogeneity. The workshop will welcome contributions concerned with measuring and comparing corpora using quantitative methods, from any field. Where and when ============== The workshop will last half a day and will be on either 7th or 8th Oct, the main ACL conference being 3rd-6th October. The venue will be the same. Submissions: ============ Submissions are limited to original, unpublished work. Papers may not exceed 3200 words (exclusive of title page and references). They must be received by July 8, 2000, in hard copy (4 copies) OR postscript OR rtf format. Electronic delivery is to compcorpMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueitri.brighton.ac.uk and hard copies are to be mailed to Compcorp submission ITRI University of Brighton Lewes Road Brighton BN2 4GJ United Kingdom Important Dates: July 8, 2000 Submission (of full-length paper) August 17, 2000 Acceptance notice September 1, 2000 Camera-ready paper received October 7 or 8 Workshop date Co-ordinators ============= Adam Kilgarriff - University of Brighton, UK Tony Berber Sardinha - Catholic University of Sao Paulo, Brazil Programme committee =================== Douglas Biber Northern Arizona University Jeremy Clear University of Birmingham Ted Dunning MusicMatch Software, Inc. Tomaz Erjavec Jozef Stefan Institute, Slovenia Pascale Fung University of Science and Technology, Hong Kong Greg Grefenstette Xerox Research Centre Europe Benoit Habert LIMSI, France Przemek Kaszubski Adam Mickiewicz University, Poland Adam Kilgarriff University of Brighton David Lee University of Lancaster Oliver Mason University of Birmingham Doug Oard University of Maryland Tony Rose Canon Research Tony Berber Sardinha Catholic University of Sao Paulo, Brazil George Tambouratzis ILSP, Athens Christopher Tribble King's College, London University Website ======= http://www.itri.bton.ac.uk/events/compcorp
- ------------------------------------------------------------------ SECOND CALL FOR PAPERS ACL'2000 Workshop on Recent Advances in Natural Language Processing and Information Retrieval October 7/8, 2000 Hong Kong University of Science and Technology - ------------------------------------------------------------------ Aims and scope - ------------ This workshop aims at fostering the interaction between researchers in the areas of Natural Language Processing (NLP) and Information Retrieval (IR), and furthermore, promoting discussions on the current and potential benefits of common approaches to related research challenges. The central topic is the application of Language Technologies to Information Retrieval, including (but not limited to): * the role of lexical-syntactic information in mono- and multilingual IR, including morphology, phrase detection and treatment, word sense disambiguation adapted to IR needs, acquisition and use of lexical resources, etc. * empirical evidence regarding the use of NL techniques in different retrieval scenarios, typification of such scenarios, and the discussion of evaluation measures beyond precision/recall variants. * interaction between NLP and IR techniques in topics related to both areas such as Cross-Language and Interactive Text Retrieval, Question Answering, Information Extraction, Text Summarization, Text Data Mining, etc. The growing research and application possibilities provided by the increased amount of networked information have motivated new attempts to explore the relationship between NLP and IR. For researchers in IR, a compelling challenge is to move from (monolingual) document retrieval within controlled text collections, to actually retrieving information, rather than individual documents, from multilingual, heterogeneous and dynamic webs of interlinked documents and online services. The reciprocal challenge for NLP research is to scale up, adapt and possibly reshape techniques and resources to help bridge the gap between document and information retrieval in practical applications. Papers describing pragmatic, empirically tested approaches facing these issues are especially welcome. Instructions for submissions - -------------------------- The format of submissions is identical to the one used for the main conference, which can be found at http://www.cs.ust.hk/acl2000/fcfp.html. Authors should fill the "paper ID" field in to specify: "IR&NLP workshop". The "Topic Area" and "session" fields should be left blank. Papers must be submitted electronically, in postscript or pdf formats, to both Program Chairs: Judith Klavans, Columbia University klavansMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuecs.columbia.edu Julio Gonzalo, UNED julio
ieec.uned.es No hardcopy submission is required. Program Committee - --------------- Judith Klavans, Columbia University (Co-Chair) Julio Gonzalo, UNED (Co-Chair) Jamie Callan (CMU) Bruce Croft (CIIR) Eric Gaussieur (Rank Xerox Grenoble) Eduard Hovy (ISI/USC) Christian Jacquemin (LIMSI) Noriko Kando (NII Tokio) Bob Krovetz (NEC Princeton) Mun-Kew Leong (Kent Ridge Digital Labs) Carol Peters (IEI-CNR) Mark Sanderson (Univ. of Sheffield) Tomek Strlkowski (GE) Evelyne Tzoukermann (Lucent Technologies) Felisa Verdejo (UNED) Nina Wacholder (Columbia University) Important dates - ------------- Deadline for submissions: July 15, 2000 Notification of acceptance: August 7, 2000 Camera-ready version: September 1, 2000 Workshop: October 7 or 8, 2000 Further Information - ----------------- Updated information about the workshop can be found at http://sensei.ieec.uned.es/IRNLP-2000