LINGUIST List 10.354

Fri Mar 5 1999

Calls: LSA Workshop, Natural Language Processing

Editor for this issue: Jody Huellmantel <>

As a matter of policy, LINGUIST discourages the use of abbreviations or acronyms in conference announcements unless they are explained in the text.


  1. Francisco Ordonez, Clitic/agreement affix combinations
  2. Priscilla Rasmussen, ACL'99 new and revised workshop calls

Message 1: Clitic/agreement affix combinations

Date: Fri, 5 Mar 1999 17:09:43 -0600 (CST)
From: Francisco Ordonez <>
Subject: Clitic/agreement affix combinations


Workshop during the LSA Summer Institute

"Perspectives on Clitic and Agreement Affix Combinations"

University of Illinois, Champaign-Urbana
July 28, 1999

We cordially invite abstracts for a one-day workshop on syntactic
approaches to clitic and agreement combinations. Please submit FIVE copies
of a one-page anonymous abstract (maximum 500 words plus
bibliography/figures) in English. Include with your submission the
following information on a 4 x 6 index card: name(s), affiliation(s), title
of your paper, mailing address, e-mail address, phone and fax numbers, and
the name. E-mail submissions will be accepted as attached word documents.
Include your name, affiliation, and the title of your paper in the body of
the e-mail. Please do not place the abstract in the body of the e-mail.
Send e-mail submissions to

The aspects of clitic/agreement affix combinations to be explored are the

1) Are there universals in the ordering of clitic and agreement affixes in
natural languages? Why are some orders more common than others? Do some
orders imply other orders?

2) Should clitics/agreement be analyzed entirely within the morphology
component (Bonet 1991, Halle and Marantz 1993, Harris 1995) or can syntax
still illuminate certain aspects of the restrictions on their combinations?
(Terzi (forthcoming), Ormazabal and Romero 1998, Franks 1998)

3) Recently, this topic has been the focus of optimality theory (Gerlach
1997, Grimshaw 1997, Heap 1996). Does optimality theory capture all the
aspects of clitic/affix combinations we want to account for?

Presentation time for papers will be limited to 20 minutes
plus 10 minutes for discussion.

Presenters will be notified by JUNE 1, 1999.

Submissions should be sent to:

Francisco Ordonez
Department of Spanish, Italian and Porguese
4080 FLB
University of Illinois at Urbana-Champaign
Urbana, IL 61801

Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: ACL'99 new and revised workshop calls

Date: Fri, 5 Mar 99 15:54:42 EST
From: Priscilla Rasmussen <>
Subject: ACL'99 new and revised workshop calls

Below are 1) a new ACL'99 workshop announcement on Unsupervised Learning 
in NLP, and 2) a slightly revised announcement for the joint EMNLP and
WVLC ACL'99 workshop. These are separated by asterisks (*).
- --------------------------------------------------------------------

			ACL-99 Workshop
 Unsupervised Learning in Natural Language Processing

 University of Maryland, College Park, MD, USA
		 June 21st, 1999

 Endorsed by the Association for Computational Linguistics (ACL)
 Special Interest Group on Natural Language Learning (SIGNLL)


Many of the successes achieved from using learning techniques in
natural language processing (NLP) have utilized the supervised
paradigm, in which models are trained from data annotated with the
target concepts to be learned. For instance, the target concepts in
language modeling for speech recognition are words, and thus raw text
corpora suffice. The first successful part-of-speech taggers were
made possible by the existence of the Brown corpus (Francis, 1964), a
million-word data set which was laboriously hand-tagged a quarter of a
century prior. Finally, progress in statistical parsing required the
development of the Penn Treebank data set (Marcus et al. 1993), the
result of many staff years of effort. While it is worthwhile to
utilize annotated data when it is available, the future success of
learning for natural language systems cannot depend on a paradigm
requiring that large, annotated data sets be created for each new
problem or application. The costs of annotation are prohibitively
time and expertise intensive, and the resulting corpora are too
susceptible to restriction to a particular domain, application, or

Thus, long-term progress in NLP is likely to be dependent on the use
of unsupervised and weakly supervised learning techniques, which do
not require large annotated data sets. Unsupervised learning utilizes
raw, unannotated data to discover underlying structure giving rise to
emergent patterns and principles. Weakly supervised learning uses
supervised learning on small, annotated data sets to seed unsupervised
learning using much larger, unannotated data sets. Because these
techniques are capable of identifying new and unanticipated
correlations in data, they have the additional advantage of being able
to feed new insights back into more traditional lines of basic

Unsupervised and weakly supervised methods have been used successfully
in several areas of NLP, including acquiring verb subcategorization
frames (Brent, 1993; Manning, 1993), part-of-speech tagging (Brill,
1997), word sense disambiguation (Yarowsky, 1995), and prepositional
phrase attachment (Ratnaparkhi, 1998). The goal of this workshop is
to discuss, promote, and present new research results (positive and
negative) in the use of such methods in NLP. We encourage submissions
on work applying learning to any area of language interpretation or
production in which the training data does not come fully annotated
with the target concepts to be learned, including:

 * Fully unsupervised algorithms
 * `Weakly supervised' learning, bootstrapping models from small sets
 of annotated data 
 * `Indirectly supervised' learning, in which end-to-end task
 evaluation drives learning in an embedded language interpretation
 * Exploratory data analysis techniques applied to linguistic data
 * Unsupervised adaptation of existing models in changing environments
 * Quantitative and qualitative comparisons of results obtained with
 supervised and unsupervised learning approaches 

Position papers on the pros and cons of supervised vs. unsupervised
learning will also be considered.


Paper submissions can take the form of extended abstracts or full
papers, not to exceed six (6) pages. Authors of extended abstracts
should note the short timespan between notification of acceptance and
the final paper deadline. Up to two more pages may be allocated for
the final paper depending on space constraints. 

Authors are requested to submit one electronic version of their papers
*or* four hardcopies. Please submit hardcopies only if electronic
submission is impossible. Submissions in Postscript or PDF format are
strongly preferred.

If possible, please conform with the traditional two-column ACL
Proceedings format. Style files can be downloaded from

Email submissions should be sent to:

Hard copy submissions should be sent to:

 Andrew Kehler
 SRI International
 333 Ravenswood Avenue
 Menlo Park, CA 94025


Paper submission deadline: March 26
Notification of acceptance: April 16
Camera ready papers due: April 30


Andrew Kehler (SRI International)
Andreas Stolcke (SRI International)


Michael Brent (Johns Hopkins University)
Eric Brill (Johns Hopkins University)
Eugene Charniak (Brown University)
Michael Collins (AT&T Laboratories)
Moises Goldszmidt (SRI International)
Andrew Kehler (SRI International)
Andrew McCallum (Carnegie-Mellon University and Just Research)
Ray Mooney (University of Texas, Austin)
Srini Narayanan (ICSI, Berkeley)
Fernando Pereira (AT&T Laboratories)
David Powers (Flinders University of South Australia)
Adwait Ratnaparkhi (IBM Research)
Dan Roth (University of Illinois at Urbana-Champaign)
Andreas Stolcke (SRI International)
Dekai Wu (Hong Kong University of Science and Technology)
David Yarowsky (Johns Hopkins University)


			Second Call For Papers


 Sponsored by SIGDAT (ACL's Special Interest Group for Linguistic
 Data and Corpus-based Approaches to NLP)

 June 21-22, 1999
 University of Maryland

 In conjunction with
 ACL'99: the 37th Annual Meeting of the
 Association for Computational Linguistics

This SIGDAT-sponsored joint conference will continue to provide a forum
for new research in corpus-based and/or empirical methods in NLP. In 
additionto providing a general forum, the theme for this year is 
"Corpus-based and/or Empirical Methods in NLP for Speech, MT, IR, and
other Applied Systems" 

A large number of systems in automatic speech recognition(ASR) and
synthesis, machine translation(MT), information retrieval(IR), optical 
character recognition(OCR) and handwriting recognition have become 
commercially available in the last decade. Many of these systems use 
NLP technologies as an important component. Corpus-based and
empirical methods in NLP have been a major trend in recent years. How
useful are these techniques when applied to real systems, especially when 
compared to rule-based methods? Are there any new techniques to be 
developed in EMNLP and from VLC in order to improve the state-of-the-art 
of ASR, MT, IR, OCR, and other applied systems? Are there new ways to 
combine corpus-based and empirical methods with rule-based systems? 

This two-day conference aims to bring together academic researchers and
industrial practitioners to discuss the above issues, through technical 
paper sessions, invited talks, and panel discussions. The goal of the 
conference is to raise an awareness of what kind of new EMNLP techniques 
need to be developed in order to bring about the next breakthrough in 
speech recognition and synthesis, machine translation, information
retrieval and other applied systems. 


The conference solicits paper submissions in (and not limited to) the
following areas: 

1) Original work in one of the following technologies and its relevance
to speech, MT, or IR: 
 (a) word sense disambiguation 
 (b) word and term segmentation and extraction 
 (c) alignment 
 (d) bilingual lexicon extraction 
 (e) POS tagging 
 (f) statistical parsing 
 (g) dialog models 
 (h) others (please specify) 

2) Proposals of new EMNLP technologies for speech, MT, IR, OCR, or other
applied systems (please specify). 

3) Comparetive evaluation of the performance of EMNLP technologies in 
one of the areas in (1) and that of its rule-based or knowledge-based 
counterpart in a speech, MT, IR, OCR or other applied system. 

Submission Requirements 

Submissions should be limited to original, evaluated work. All papers
should include background survey and/or reference to previous work. The 
authors should provide explicit explanation when there is no evaluation in 
their work. We encourage paper submissions related to the conference theme. 
In particular, we encourage the authors to include in their papers, 
proposals and discussions of the relevance of their work to the
theme. However, there will be a special session in the conference to
include corpus-based and/or empirical work in all areas of natural language

Submission Format 

Only hard-copy submissions will be accepted. Reviewing of papers will
not be blind. The submission format and word limit are the same as those 
for ACL this year. We strongly recommend the use of ACL-standard LaTeX 
(plus bibstyle and trivial example) or Word style files for the 
preparation of submissions. Six opies of full-length paper (not to exceed
3200 words exclusive of references) should be received at the following
address before or on March 31, 1999. 

EMNLP/VLC-99 Program Committee 
c/o Pascale Fung 
Department of Electrical and Electronic Engineering 
University of Science and Tehnology (HKUST) 
Clear Water Bay, Kowloon 
Hong Kong 

Important Dates 

March 31 Submission of full-length paper 
April 30 Acceptance notice 
May 20 Camera-ready paper due 
June 21-22 Conference date 

Program Chair

Pascale Fung 
Human Language Technology Center 
Department of Electrical and Electronic Engineering 
University of Science and Tehnology (HKUST) 
Clear Water Bay, Kowloon 
Hong Kong 
Tel: (+852) 2358 8537 
Fax: (+852) 2358 1485 

Program Co-Chair 
Joe Zhou 
LEXIS-NEXIS, a Division of Reed Elsevier 
9555 Springboro Pike 
Dayton, OH 45342 

Program Committee (partial list) 

Jiang-Shin Chang (Behavior Design Corp.) 
Ken Church (AT&T Labs--Research) 
Ido Dagan (Bar-Ilan University) 
Marti Hearst (UC-Berkeley) 
Huang, Changning (Tsinghua University) 
Pierre Isabelle (Xerox Research Europe) 
Lillian Lee (Cornell University) 
David Lewis (AT&T Research) 
Dan Melamed (West Law Research) 
Masaaki Nagata (NTT) 
Steve Richardson (Microsoft Research) 
Richard Sproat (AT&T Labs--Research) 
Andreas Stolcke (SRI) 
Ralph Weischedel (BBN) 
Dekai Wu (Hong Kong University of Science & Technology) 
David Yarowsky (Johns Hopkins University) 
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue