Editor for this issue: Karen Milligan <karen
linguistlist.org>
************************************************************** FINAL CALL FOR PAPERS DEADLINE MARCH 26, 1999 ************************************************************** ACL-99 Workshop Unsupervised Learning in Natural Language Processing University of Maryland, College Park, MD, USA June 21st, 1999 http://www.ai.sri.com/~kehler/unsup-acl-99.html Endorsed by the Association for Computational Linguistics (ACL) Special Interest Group on Natural Language Learning (SIGNLL) WORKSHOP DESCRIPTION Many of the successes achieved from using learning techniques in natural language processing (NLP) have utilized the supervised paradigm, in which models are trained from data annotated with the target concepts to be learned. For instance, the target concepts in language modeling for speech recognition are words, and thus raw text corpora suffice. The first successful part-of-speech taggers were made possible by the existence of the Brown corpus (Francis, 1964), a million-word data set which was laboriously hand-tagged a quarter of a century prior. Finally, progress in statistical parsing required the development of the Penn Treebank data set (Marcus et al. 1993), the result of many staff years of effort. While it is worthwhile to utilize annotated data when it is available, the future success of learning for natural language systems cannot depend on a paradigm requiring that large, annotated data sets be created for each new problem or application. The costs of annotation are prohibitively time and expertise intensive, and the resulting corpora are too susceptible to restriction to a particular domain, application, or genre. Thus, long-term progress in NLP is likely to be dependent on the use of unsupervised and weakly supervised learning techniques, which do not require large annotated data sets. Unsupervised learning utilizes raw, unannotated data to discover underlying structure giving rise to emergent patterns and principles. Weakly supervised learning uses supervised learning on small, annotated data sets to seed unsupervised learning using much larger, unannotated data sets. Because these techniques are capable of identifying new and unanticipated correlations in data, they have the additional advantage of being able to feed new insights back into more traditional lines of basic research. Unsupervised and weakly supervised methods have been used successfully in several areas of NLP, including acquiring verb subcategorization frames (Brent, 1993; Manning, 1993), part-of-speech tagging (Brill, 1997), word sense disambiguation (Yarowsky, 1995), and prepositional phrase attachment (Ratnaparkhi, 1998). The goal of this workshop is to discuss, promote, and present new research results (positive and negative) in the use of such methods in NLP. We encourage submissions on work applying learning to any area of language interpretation or production in which the training data does not come fully annotated with the target concepts to be learned, including: * Fully unsupervised algorithms * `Weakly supervised' learning, bootstrapping models from small sets of annotated data * `Indirectly supervised' learning, in which end-to-end task evaluation drives learning in an embedded language interpretation module * Exploratory data analysis techniques applied to linguistic data * Unsupervised adaptation of existing models in changing environments * Quantitative and qualitative comparisons of results obtained with supervised and unsupervised learning approaches Position papers on the pros and cons of supervised vs. unsupervised learning will also be considered. FORMAT FOR SUBMISSION Paper submissions can take the form of extended abstracts or full papers, not to exceed six (6) pages. Authors of extended abstracts should note the short timespan between notification of acceptance and the final paper deadline. Up to two more pages may be allocated for the final paper depending on space constraints. Authors are requested to submit one electronic version of their papers *or* four hardcopies. Please submit hardcopies only if electronic submission is impossible. Submissions in Postscript or PDF format are strongly preferred. If possible, please conform with the traditional two-column ACL Proceedings format. Style files can be downloaded from ftp://ftp.cs.columbia.edu/acl-l/Styfiles/Proceedings/. Email submissions should be sent to: kehlerMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueai.sri.com Hard copy submissions should be sent to: Andrew Kehler SRI International 333 Ravenswood Avenue EK272 Menlo Park, CA 94025 TIMETABLE Paper submission deadline: March 26 Notification of acceptance: April 16 Camera ready papers due: April 30 ORGANIZERS Andrew Kehler (SRI International) Andreas Stolcke (SRI International) PROGRAM COMMITTEE Michael Brent (Johns Hopkins University) Eric Brill (Johns Hopkins University) Rebecca Bruce (University of North Carolina at Asheville) Eugene Charniak (Brown University) Michael Collins (AT&T Laboratories) Marie desJardins (SRI International) Moises Goldszmidt (SRI International) Andrew Kehler (SRI International) John Lafferty (Carnegie-Mellon University) Lillian Lee (Cornell University) Chris Manning (University of Sydney) Andrew McCallum (Carnegie-Mellon University and Just Research) Ray Mooney (University of Texas, Austin) Srini Narayanan (ICSI, Berkeley) Fernando Pereira (AT&T Laboratories) David Powers (Flinders University of South Australia) Adwait Ratnaparkhi (IBM Research) Dan Roth (University of Illinois at Urbana-Champaign) Andreas Stolcke (SRI International) Janyce Wiebe (New Mexico State University) Dekai Wu (Hong Kong University of Science and Technology) David Yarowsky (Johns Hopkins University)
CHRONOS A Conference on Tense, Aspect, and Mood The 1999 Thermi International Summer School in Linguistics, sponsored by GLOW, will be hosting a conference on the syntax and semantics of tense, aspect, and mood, to be held on July 16-17 1999. Thermi is located approximately 10km from Mitilini, on the island of Lesbos, in Greece. The conference will last for only two days, and each speaker will be allotted approximately one hour so as to allow for a substantive presentation and discussion period; consequently, we anticipate that it will only be possible to accommodate around 15 speakers. All of these slots are open; there will be no invited lectures. It is possible that funding will be available to subsidize speakers' ground expenses in Thermi, though we expect that speakers will have to find alternative funding sources to cover the cost of air travel. (Further information about accommodation and funding will be made available prior to the announcement of the program.) The selection of papers for will be based on reviews of anonymous abstracts, which are hereby solicited. Abstracts should be from one to two pages long(double-spaced, no smaller than 10 pt font), including cited examples. Authors are strongly encouraged to submit their abstracts by e-mail to Sabine Iatridou at the following address: iatridouMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueMIT.EDU Abstracts may also be submitted in hard-copy format, by mailing them to: CHRONOS, c/o Sabine Iatridou Linguistics and Philosophy MIT E39-236 Cambridge, MA 02139 USA The deadline for submission of abstracts is May 3, 1999. We expect to be able to complete the reviews and announce the program by May 17. In addition to the lectures, the conference will include a business meeting to discuss the possibility of establishing an international scholarly organization devoted to the promotion of scholarship on tense, aspect, mood, and related issues, which would be responsible (among other things) for organizing a conference to be held at regular intervals (either yearly or every other year). In recent years, there have been several conferences dealing with these themes, including those held at Cortona (1993), Tel Aviv (1993), Lake Arrowhead (1998), and Bergamo (1998). In the most recent meetings, a number of participants have expressed a desire to have a conference held at regular intervals, under the auspices of a scholarly society, rather than being sponsored solely by individual universities on an ad hoc basis. We hope that the business meeting at Thermi will set the stage for bringing such plans to fruition. Following the tradition established at Cortona, it is our hope that a wide range of perspectives will be represented. However, given that the number of papers presented at Thermi will be rather small, and given the relatively short lead time, we recognize that many important research scholars working in these areas may not be able to attend. For this reason, we invite all researchers with an interest in such an organization to contact us in advance, both to indicate their willingness to participate in such an organization (in any capacity), and to provide written suggestions about the form that it should take. Questions about the Thermi conference, including the abstact guidelines, as well as suggestions about the proposed scholarly organization, should be sent to Tim Stowell at the following e-mail address: stowell
ucla.edu or at the following mailing address: CHRONOS, c/o Tim Stowell UCLA Linguistics Dept. 405 Hilgard Ave. Los Angeles, CA, 90095-1543 Tim Stowell Dept. of Linguistics, UCLA, Los Angeles, CA 90095-1543 Telephone: 1-310-825-0634; Fax: 1-310-206-5743. E-mail: stowell
ucla.edu http://www.humnet.ucla.edu/humnet/linguistics/people/stowell/stowell.htm