LINGUIST List 3.950

Fri 04 Dec 1992

Confs: Message Understanding

Editor for this issue: <>


Directory

  1. Beth M. Sundheim, 5th Message Understanding Conference--Call for Participation

Message 1: 5th Message Understanding Conference--Call for Participation

Date: Tue, 1 Dec 92 16:09:40 -085th Message Understanding Conference--Call for Participation
From: Beth M. Sundheim <sundheimcod.nosc.mil>
Subject: 5th Message Understanding Conference--Call for Participation

-------
 * * * CALL FOR PARTICIPATION * * *
 FIFTH MESSAGE UNDERSTANDING SYSTEM EVALUATION
 AND MESSAGE UNDERSTANDING CONFERENCE (MUC-5)

 1 MARCH - 27 AUGUST, 1993
 Preparation: 1 March - 23 May
 29 May - 25 July
 Evaluations: 24-28 May (dry run)
 26-30 July (formal run)
 Conference: 25-27 August

 Sponsored by:
 Defense Advanced Research Projects Agency
 Software and Intelligent Systems Technology Office
 (DARPA/SISTO)

 The Message Understanding Conferences have provided on ongoing
forum for assessing the state of the art and practice in text analysis
technology and for exchanging information on innovative computational
techniques. They have also encouraged experimentation in the context
of fully implemented systems that perform the realistic task of
extracting factual information from free text. The first two
conferences focused on short naval messages; the two most recent
conferences challenged the systems with longer and stylistically
varied terrorism news stories. The four conferences have seen the
application of a wide variety of approaches to the information
extraction task.
 There is a growing appreciation of the potential utility of the
technologies. At the same time, performance constraints attributed to
inadequate computational methods are becoming serious issues for the
more highly developed systems. The Fifth Message Understanding
Conference (MUC-5) will continue the technology assessment cycle, with
new information extraction tasks in new domains. MUC-5 will also
continue the effort to define an insightful, objective set of
performance evaluation criteria.
 DARPA sponsors the Message Understanding Conferences as part of
the TIPSTER Text program. Participation in MUC-5 is actively sought
from both new and veteran organizations. Veteran evaluation
participants will be able to measure their progress in designing
robust, end-to-end information extraction systems and to continue the
fruitful interchange of ideas about systems and evaluation. New
participants will also contribute to and benefit from such
interactions, while learning to manage the challenges posed by the
evaluation task. In this process, all organizations enjoy some
advantages and suffer from some disadvantages in the evaluation.
These differing circumstances are recognized by the evaluators and
should not deter organizations from participating.
 The conference itself will consist primarily of presentations and
discussions of test results, system design, and innovative techniques.
Attendance at the conference is limited to evaluation participants and
to guests invited by DARPA. A conference proceedings, including all
test results, will be published.
 Modest amounts of financial support will be made available to
selected participants in an effort to maximize the number of
participants and to attract the widest possible variety of technical
approaches and system architectures. This funding is intended only as
a supplement to other support. Both U.S. and non-U.S. participants
are eligible for this funding.

SCHEDULE:
 3 January 1993 Deadline for applications that include funding
 requests
 15 January 1993 Final application deadline (no funding requests)
 1 February 1993 Notification of acceptance and funding
 1 March 1993 Release of system development corpus and
 evaluation software
 24-28 May 1993 Performance evaluation (dry run) on test corpus
 26-30 July 1993 Performance evaluation (formal run) on new test
 corpus
 25-27 August 1993 Fifth Message Understanding Conference

DATA AND TASK DESCRIPTION:
 Subject to successful completion of negotiations to obtain proper
permissions concerning the data, the data and task to be used for
MUC-5 will be the same as those already in use for the data extraction
portion of the DARPA/SISTO TIPSTER Text program. There are two
languages, English and Japanese, and two domains, joint ventures and
microelectronic chip fabrication. These form four separate corpora.
The texts are newswire articles selected to produce the desired mix of
relevant and nonrelevant texts, and they were blindly divided into
pools of development (training) and test data.
 The task is to extract information about the nature and status of
activities in the domain, the entities involved, etc. Analysts have
been doing software-assisted manual generation of the "key" templates
against which the system-generated templates will be evaluated. The
template design is object oriented, and each slot in the template has
its own fill specifications for data type, valency, etc. The fill
specifications in each domain vary slightly between English and
Japanese, reflecting differences in language usage; however, the
general design of the template is the same for both languages.
 An English and a Japanese sample text and corresponding template
in the joint ventures domain are available from the program chair
(address at end of this announcement). Please specify which
language(s) you are interested in. A microelectronics example may be
available shortly. The total amount of data that will be available in
March to support system development is expected to be between 200 and
1,000 templates and corresponding texts. This number will vary
according to the corpus and the data rights that are obtained. To
receive the data, participants will be required to acknowledge its
copyright status by signing agreements to safeguard the data and to
use it for research purposes only.

TEST PROTOCOL AND EVALUATION CRITERIA:
 MUC-5 participants may elect to do either language or both
languages; they are limited to selecting just one domain.
Participants will have access to TIPSTER Government-Furnished
Information and shared resources such as the training texts and
templates, task documentation, gazetteers, and evaluation software.
TIPSTER data extraction contractors will be participating in MUC-5,
for which previously unseen test data will be used.
 Each test set will consist of 100-300 texts, depending on
language and domain. A dry-run test will be conducted about three
months after the release of the training data; the formal test will be
conducted about two and one-half months after the dry run. Each test
will be carried out by the participants at their own sites in
accordance with a prepared test procedure and the results submitted to
NRaD for official scoring by domain analysts.
 Systems will be evaluated using the criteria applied to the
TIPSTER Text data extraction systems. These criteria, which are still
under development, are likely to use the scoring categories (correct,
partially correct, incorrect, spurious, missing, and noncommittal) to
support not only the measures used for MUC-4 (recall, precision,
overgeneration, fallout, and F-measure) but also new measures
(probability of detection, probability of false alarm, and a measure
that combines them). MUC-5 participants will be able to familiarize
themselves with the evaluation criteria through usage of the
evaluation software, which will be released along with the training
data.

INSTRUCTIONS FOR RESPONDING TO THE CALL FOR PARTICIPATION:
 Organizations within and outside the U.S. are invited to respond
to this call for participation. Minimal requirements include
development before the dry-run test of a system that can accept texts
without manual preprocessing, process them without human intervention,
and output templates in the expected format. Organizations should
plan on allocating at least three person-months of effort for
participation in the evaluation and conference; a substantially greater
level of effort is likely to be needed in order to achieve relatively
high performance. It is understood that organizations will vary with
respect to experience with information extraction, domain
expertise/engineering, resources, contractual demands/expectations,
etc. Recognition of such factors will be made in any analyses of the
results.
 Organizations wishing to participate in the evaluation and
conference must respond by submitting a summary of their text analysis
approach and a system architecture description, not to exceed five
pages in total. The summary should include the strengths of
the approach and highlight its innovative aspects. Acceptance or
rejection of each application will be determined on the basis of a
technical assessment by the program committee. The body of the
application will serve as the basis for an article in the conference
proceedings. Participants will have the opportunity to make revisions
prior to publication.
 The application must also include the following information:
 1. Domain (choose only one)
 a. Joint ventures
 b. Microelectronics
 2. Language (choose one or two)
 a. English
 b. Japanese
 3. An estimate of the degree of coverage and/or length of time
 under development of existing software to be applied to the
 MUC-5 task in the selected language(s) and domain.
 4. Primary point of contact for notification of
 acceptance/rejection of application. Please include name,
 surface and email addresses, and phone and fax numbers.
 Those organizations wishing to request funding to supplement
their own resources must provide a second statement, not to exceed two
pages. This statement should include an estimate of the amount of
funding available from other sources to support participation in this
work and a specification of the amount of funding desired and the
minimal acceptable amount. In addition, it should describe any
software to be used for MUC-5 that the organization is willing to
deliver to NRaD and MUC participants for possible redistribution.
Please indicate clearly whether the organization is interested in
participating in MUC-5 even if no funding is available. Evaluators of
funding requests will not include any MUC system developers.
 RESPONSES THAT INCLUDE FUNDING REQUESTS MUST BE SUBMITTED BY
JANUARY 3, 1993. THE DEADLINE FOR OTHER RESPONSES IS JANUARY 15,
1993. All participants are expected to have Internet access and to
be able to do electronic file transfer via anonymous FTP. All
responses should be submitted to the program chair via email to
sundheimnosc.mil. If Internet access is currently unavailable,
responses may be sent via surface mail to Beth Sundheim, NCCOSC/NRaD,
Code 444, San Diego, CA 92152-5000, and if a quick reply to questions
is needed, the program chair may be reached by phone at 619/553-4145.

PROGRAM COMMITTEE:
 Beth Sundheim, NCCOSC/NRaD, program chair
 Sean Boisen, BBN Systems and Technologies
 Lynn Carlson, U.S. Department of Defense
 Nancy Chinchor, Science Applications International
 Jim Cowie, New Mexico State University
 Ralph Grishman, New York University
 Jerry Hobbs, SRI International
 Joe McCarthy, University of Massachusetts, Amherst
 Mary Ellen Okurowski, U.S. Department of Defense
 Boyan Onyshkevych, U.S. Department of Defense
 Lisa Rau, General Electric R&D Center
 Carl Weir, Paramax Systems Corporation

REFERENCE: _Proceedings_of_the_Fourth_Message_Understanding_Conference_
 (MUC-4)_, Morgan Kaufmann, June, 1992. To order, call
 (800)745-7323 (toll free in North America) or (415)578-9928
 (direct), send fax to (415)578-0672 or email to
 morganunix.sri.com. Please refer to ISBN 1-55860-273-9.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue