Editor for this issue: Jody Huellmantel <jody
linguistlist.org>
CALL FOR PAPERS Workshop during the LSA Summer Institute "Perspectives on Clitic and Agreement Affix Combinations" University of Illinois, Champaign-Urbana July 28, 1999 We cordially invite abstracts for a one-day workshop on syntactic approaches to clitic and agreement combinations. Please submit FIVE copies of a one-page anonymous abstract (maximum 500 words plus bibliography/figures) in English. Include with your submission the following information on a 4 x 6 index card: name(s), affiliation(s), title of your paper, mailing address, e-mail address, phone and fax numbers, and the name. E-mail submissions will be accepted as attached word documents. Include your name, affiliation, and the title of your paper in the body of the e-mail. Please do not place the abstract in the body of the e-mail. Send e-mail submissions to lheggieMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueilstu.edu. TOPICS TO BE EXPLORED The aspects of clitic/agreement affix combinations to be explored are the following: 1) Are there universals in the ordering of clitic and agreement affixes in natural languages? Why are some orders more common than others? Do some orders imply other orders? 2) Should clitics/agreement be analyzed entirely within the morphology component (Bonet 1991, Halle and Marantz 1993, Harris 1995) or can syntax still illuminate certain aspects of the restrictions on their combinations? (Terzi (forthcoming), Ormazabal and Romero 1998, Franks 1998) 3) Recently, this topic has been the focus of optimality theory (Gerlach 1997, Grimshaw 1997, Heap 1996). Does optimality theory capture all the aspects of clitic/affix combinations we want to account for? Presentation time for papers will be limited to 20 minutes plus 10 minutes for discussion. THE DEADLINE FOR RECEIPT OF ABSTRACTS IS MAY 17, 1999. Presenters will be notified by JUNE 1, 1999. Submissions should be sent to: Francisco Ordonez Department of Spanish, Italian and Porguese 4080 FLB University of Illinois at Urbana-Champaign Urbana, IL 61801 e-mail: fordanez
uiuc.edu
Below are 1) a new ACL'99 workshop announcement on Unsupervised Learning in NLP, and 2) a slightly revised announcement for the joint EMNLP and WVLC ACL'99 workshop. These are separated by asterisks (*). - -------------------------------------------------------------------- ACL-99 Workshop Unsupervised Learning in Natural Language Processing University of Maryland, College Park, MD, USA June 21st, 1999 http://www.ai.sri.com/~kehler/unsup-acl-99.html Endorsed by the Association for Computational Linguistics (ACL) Special Interest Group on Natural Language Learning (SIGNLL) WORKSHOP DESCRIPTION Many of the successes achieved from using learning techniques in natural language processing (NLP) have utilized the supervised paradigm, in which models are trained from data annotated with the target concepts to be learned. For instance, the target concepts in language modeling for speech recognition are words, and thus raw text corpora suffice. The first successful part-of-speech taggers were made possible by the existence of the Brown corpus (Francis, 1964), a million-word data set which was laboriously hand-tagged a quarter of a century prior. Finally, progress in statistical parsing required the development of the Penn Treebank data set (Marcus et al. 1993), the result of many staff years of effort. While it is worthwhile to utilize annotated data when it is available, the future success of learning for natural language systems cannot depend on a paradigm requiring that large, annotated data sets be created for each new problem or application. The costs of annotation are prohibitively time and expertise intensive, and the resulting corpora are too susceptible to restriction to a particular domain, application, or genre. Thus, long-term progress in NLP is likely to be dependent on the use of unsupervised and weakly supervised learning techniques, which do not require large annotated data sets. Unsupervised learning utilizes raw, unannotated data to discover underlying structure giving rise to emergent patterns and principles. Weakly supervised learning uses supervised learning on small, annotated data sets to seed unsupervised learning using much larger, unannotated data sets. Because these techniques are capable of identifying new and unanticipated correlations in data, they have the additional advantage of being able to feed new insights back into more traditional lines of basic research. Unsupervised and weakly supervised methods have been used successfully in several areas of NLP, including acquiring verb subcategorization frames (Brent, 1993; Manning, 1993), part-of-speech tagging (Brill, 1997), word sense disambiguation (Yarowsky, 1995), and prepositional phrase attachment (Ratnaparkhi, 1998). The goal of this workshop is to discuss, promote, and present new research results (positive and negative) in the use of such methods in NLP. We encourage submissions on work applying learning to any area of language interpretation or production in which the training data does not come fully annotated with the target concepts to be learned, including: * Fully unsupervised algorithms * `Weakly supervised' learning, bootstrapping models from small sets of annotated data * `Indirectly supervised' learning, in which end-to-end task evaluation drives learning in an embedded language interpretation module * Exploratory data analysis techniques applied to linguistic data * Unsupervised adaptation of existing models in changing environments * Quantitative and qualitative comparisons of results obtained with supervised and unsupervised learning approaches Position papers on the pros and cons of supervised vs. unsupervised learning will also be considered. FORMAT FOR SUBMISSION Paper submissions can take the form of extended abstracts or full papers, not to exceed six (6) pages. Authors of extended abstracts should note the short timespan between notification of acceptance and the final paper deadline. Up to two more pages may be allocated for the final paper depending on space constraints. Authors are requested to submit one electronic version of their papers *or* four hardcopies. Please submit hardcopies only if electronic submission is impossible. Submissions in Postscript or PDF format are strongly preferred. If possible, please conform with the traditional two-column ACL Proceedings format. Style files can be downloaded from ftp://ftp.cs.columbia.edu/acl-l/Styfiles/Proceedings/. Email submissions should be sent to: kehlerMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueai.sri.com Hard copy submissions should be sent to: Andrew Kehler SRI International 333 Ravenswood Avenue EK272 Menlo Park, CA 94025 TIMETABLE Paper submission deadline: March 26 Notification of acceptance: April 16 Camera ready papers due: April 30 ORGANIZERS Andrew Kehler (SRI International) Andreas Stolcke (SRI International) PROGRAM COMMITTEE Michael Brent (Johns Hopkins University) Eric Brill (Johns Hopkins University) Eugene Charniak (Brown University) Michael Collins (AT&T Laboratories) Moises Goldszmidt (SRI International) Andrew Kehler (SRI International) Andrew McCallum (Carnegie-Mellon University and Just Research) Ray Mooney (University of Texas, Austin) Srini Narayanan (ICSI, Berkeley) Fernando Pereira (AT&T Laboratories) David Powers (Flinders University of South Australia) Adwait Ratnaparkhi (IBM Research) Dan Roth (University of Illinois at Urbana-Champaign) Andreas Stolcke (SRI International) Dekai Wu (Hong Kong University of Science and Technology) David Yarowsky (Johns Hopkins University) *************************************************************************** Second Call For Papers (EMNLP/VLC-99) JOINT SIGDAT CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND VERY LARGE CORPORA Sponsored by SIGDAT (ACL's Special Interest Group for Linguistic Data and Corpus-based Approaches to NLP) June 21-22, 1999 University of Maryland In conjunction with ACL'99: the 37th Annual Meeting of the Association for Computational Linguistics This SIGDAT-sponsored joint conference will continue to provide a forum for new research in corpus-based and/or empirical methods in NLP. In additionto providing a general forum, the theme for this year is "Corpus-based and/or Empirical Methods in NLP for Speech, MT, IR, and other Applied Systems" A large number of systems in automatic speech recognition(ASR) and synthesis, machine translation(MT), information retrieval(IR), optical character recognition(OCR) and handwriting recognition have become commercially available in the last decade. Many of these systems use NLP technologies as an important component. Corpus-based and empirical methods in NLP have been a major trend in recent years. How useful are these techniques when applied to real systems, especially when compared to rule-based methods? Are there any new techniques to be developed in EMNLP and from VLC in order to improve the state-of-the-art of ASR, MT, IR, OCR, and other applied systems? Are there new ways to combine corpus-based and empirical methods with rule-based systems? This two-day conference aims to bring together academic researchers and industrial practitioners to discuss the above issues, through technical paper sessions, invited talks, and panel discussions. The goal of the conference is to raise an awareness of what kind of new EMNLP techniques need to be developed in order to bring about the next breakthrough in speech recognition and synthesis, machine translation, information retrieval and other applied systems. Scope The conference solicits paper submissions in (and not limited to) the following areas: 1) Original work in one of the following technologies and its relevance to speech, MT, or IR: (a) word sense disambiguation (b) word and term segmentation and extraction (c) alignment (d) bilingual lexicon extraction (e) POS tagging (f) statistical parsing (g) dialog models (h) others (please specify) 2) Proposals of new EMNLP technologies for speech, MT, IR, OCR, or other applied systems (please specify). 3) Comparetive evaluation of the performance of EMNLP technologies in one of the areas in (1) and that of its rule-based or knowledge-based counterpart in a speech, MT, IR, OCR or other applied system. Submission Requirements Submissions should be limited to original, evaluated work. All papers should include background survey and/or reference to previous work. The authors should provide explicit explanation when there is no evaluation in their work. We encourage paper submissions related to the conference theme. In particular, we encourage the authors to include in their papers, proposals and discussions of the relevance of their work to the theme. However, there will be a special session in the conference to include corpus-based and/or empirical work in all areas of natural language processing. Submission Format Only hard-copy submissions will be accepted. Reviewing of papers will not be blind. The submission format and word limit are the same as those for ACL this year. We strongly recommend the use of ACL-standard LaTeX (plus bibstyle and trivial example) or Word style files for the preparation of submissions. Six opies of full-length paper (not to exceed 3200 words exclusive of references) should be received at the following address before or on March 31, 1999. EMNLP/VLC-99 Program Committee c/o Pascale Fung Department of Electrical and Electronic Engineering University of Science and Tehnology (HKUST) Clear Water Bay, Kowloon Hong Kong Important Dates March 31 Submission of full-length paper April 30 Acceptance notice May 20 Camera-ready paper due June 21-22 Conference date Program Chair Pascale Fung Human Language Technology Center Department of Electrical and Electronic Engineering University of Science and Tehnology (HKUST) Clear Water Bay, Kowloon Hong Kong Tel: (+852) 2358 8537 Fax: (+852) 2358 1485 Email: pascale
ee.ust.hk Program Co-Chair Joe Zhou LEXIS-NEXIS, a Division of Reed Elsevier 9555 Springboro Pike Dayton, OH 45342 USA Email: joez
lexis-nexis.com Program Committee (partial list) Jiang-Shin Chang (Behavior Design Corp.) Ken Church (AT&T Labs--Research) Ido Dagan (Bar-Ilan University) Marti Hearst (UC-Berkeley) Huang, Changning (Tsinghua University) Pierre Isabelle (Xerox Research Europe) Lillian Lee (Cornell University) David Lewis (AT&T Research) Dan Melamed (West Law Research) Masaaki Nagata (NTT) Steve Richardson (Microsoft Research) Richard Sproat (AT&T Labs--Research) Andreas Stolcke (SRI) Ralph Weischedel (BBN) Dekai Wu (Hong Kong University of Science & Technology) David Yarowsky (Johns Hopkins University)