Editor for this issue: Marie Klopfenstein <marie
linguistlist.org>
!!SUBMISSION DEADLINE EXTENDED TO MARCH 7, 2003!! HLT-NAACL Text Summarization Workshop and Document Understanding Conference (DUC 2003) May 31 and June 1, 2003 Edmonton, AB, Canada http://www.umich.edu/cl/hlt-naacl-duc03/ Given that the ACL'03 deadline is tomorrow and that most other HLT-NAACL'03 workshop deadlines are not until early March, the submission deadline for the HLT-NAACL'03 has been extended by a week to March 7. REVISED SCHEDULE - March 7, 2003 - submissions due - March 28, 2003 - authors notified - April 10, 2003 - camera-ready papers due Please visit the workshop site for submissions details and additional information. - The co-chairs, Dragomir Radev (radevMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueumich.edu) Simone Teufel (Simone.Teufel
cl.cam.ac.uk)
Second Sighan Workshop on Chinese Language Processing July 11-12, 2003 Sapporo Convention Center Sapporo, Japan Second CALL FOR WORKSHOP PAPERS The SIGHAN, a Special Interest Group of the Association for Computational Linguistics, invites the submission of papers for its second workshop to be held in conjunction with ACL-03 in Sapporo, Japan. Papers are invited on substantial, original, and unpublished research on all aspects of Chinese language processing, including, but not limited to: word segmentation, POS tagging, and parsing; discourse, dialogue, and natural language interfaces; lexical semantics, word sense disambiguation, and lexicon acquisition; generation and summarization; cross-lingual information retrieval and machine translation. Papers should describe original work; they should emphasize completed work rather than intended work, and should indicate clearly the state of completion of the reported results. Wherever appropriate, concrete evaluation results should be included. The reviewing of the papers will be blind. Submissions will be judged on correctness, originality, technical strength, significance and relevance to the workshop, and interest to the attendees. A paper accepted for presentation at this workshop cannot be presented or have been presented at any other meeting with publicly available published proceedings. We allow simultaneous paper submission to the workshop and the ACL main conference. If a paper is accepted by both the conference and the workshop, the paper will be presented at the conference, rather than at the workshop. The author(s) should notify the workshop chairs by May 1 so that proper arrangement can be made. Submissions should follow the same style as the ones for regular ACL paper. For details about formatting, go to http://www.ec-inc.co.jp/ACL2003/ and click on "Call for Papers". Submissions should not exceed 8 pages including the reference. Submissions should be done online by going to the website http://www.sighan.org/swclp2/submit. In case that you have trouble submitting your paper online, please email the pdf and/or postscript version of the paper to qmaMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuecrl.go.jp AND feixia
us.ibm.com. Note that the pdf/ps file should NOT include authors' names and affiliations as the reviewing process will be blind. IMPORTANT DATES: .Paper submission deadline: March 10, 2003 .Notification of acceptance: April 20, 2003 .camera-ready paper deadline: May 25, 2003 .workshop: July 11-12, 2003 FURTHER INFORMATION: Please watch the web site http://www.sighan.org/swclp2 for developments. You may also contact Qing Ma (qma
crl.go.jp) or Fei Xia (feixia
us.ibm.com) with questions regarding the workshop. For people who need visas to come to Japan, please go to ACL-03's website (http://www.ec-inc.co.jp/ACL2003/) and click on "Applying for visas to Japan" on the left side of the page for more information. PROGRAM COMMITTEE: Qing Ma - Communications Research Lab, Japan (co-chair) Fei Xia - IBM, USA (co-chair) Joyce Chai - Michigan State Univ, USA Keh-Jiann Chen - Academia Sinica, Taiwan Zhendong Dong - Hownet designer, China Tom Emerson - Basis Technology Corp, USA Changning Huang - Microsoft, China Chu-ren Huang - Academia Sinica, Taiwan K.L.Kwok - Queens College, USA Tom Lai - City Univ. of Hong Kong Dekang Lin - Univ of Alberta, Canada Kim-Teng Lua - National University of Singapore Masaki Murata - Communications Research Laboratory, Japan Martha Palmer - Univ. of Pennsylvania, USA Shimei Pan - IBM, USA Fuji Ren - Tokushima University, Japan Bangalore Srinivas - ATT, USA Keh-Yih Su - Behavior Design Corporation, Taiwan Maosong Sun - Tsinghua University, China Bing Swen - Peking University, China Tan Chew Lim - National University of Singapore Banjamin Tsou - City Univ. of Hong Kong Amy Weinberg - Univ of Maryland, USA Andi Wu - Microsoft, USA Dekai Wu - Hong Kong Science and Technology University Nianwen Xue - Univ. of Pennsylvania, USA Jin Yang - Systran, USA Shiwen Yu - Peking University, China Qiang Zhou - Tsinghua University, China - -------------------------------------------------------------------------- ANNOUNCING THE FIRST INTERNATIONAL CHINESE WORD SEGMENTATION BAKEOFF to be held as part of the Second Meeting of SIGHAN (the ACL Special Interest Group on Chinese Language Processing), July 11-12, 2003 (in conjunction with ACL 2003) in Sapporo, Japan. MOTIVATION There has been a large literature on the topic of segmenting Chinese text into words, and many approaches have been proposed. However, one problem has been that it is very difficult to compare the results of different approaches, since researchers have not been testing their systems on common test corpora. While it is recognized that there is no single correct segmentation, and different applications may require different segmentations, it is nonetheless desirable to be able to compare different segmentation algorithms on common datasets so that one can understand which algorithms are most promising, independent of a particular application. We aim to address this issue by inviting researchers who work on Chinese word segmentation to put their systems to the test on a common set of training and test corpora. The results of this competition will be published and it is hoped that the results will provide fodder for future work in this area. DETAILS Training and test corpora will come from four sources: 1) The Academia Sinica (Taiwan) treebank (Taiwan Big5 encoding). 2) The Beijing University Institute of Computational Linguistics Corpus (GB encoding). 3) The Penn Chinese treebank (GB encoding). 4) Hong Kong City University corpus (HK Big5 encoding). Each of these corpora has been hand-segmented according to its own standard. Sizes of training and test corpora are to be determined and will depend upon the amounts available from the four sources. Participants will be able to elect to be tested on any or all of the corpora, except that participants from the institutions providing the corpora will not be allowed to test on their own corpus. In addition to electing one or more corpora, participants will also be able to participate in either or both of a Restricted Track or an Unrestricted Track. For the Restricted Track, the participant will be allowed to use ONLY the materials from the training corpus corresponding to each elected test corpus. For the Unrestricted Track, the participants may use any resources they choose, including proprietary dictionaries; however, participants will be required, in their summaries (see below), to provide documentation on which of their segmentation decisions were based on material other than the training corpus or what their systems inferred algorithmically from the training corpus. The training and testing materials will be made available according to a strict schedule as outlined below. Specific instructions on the format of the segmented test data will be provided, and these instructions must be followed exactly. After the results are reported back to the participants, the participants will be asked to provide a two-page summary of their system for inclusion in the SIGHAN Workshop proceedings. The bakeoff instructions (in both English and Chinese) can be found at http://www.sighan.org/bakeoff2003/bakeoff_instr.html More details of the process will be posted in due course to the web page listed at the end of this message. IMPORTANT DATES MARCH 15, 2003: Training materials and complete instructions available at the website (see below), along with information on and references to the various segmentation standards. APRIL 22, 2003: Testing materials available at the website. APRIL 25, 2003: Segmented test materials due back to ftp site by 5:00 PM, U.S. Eastern Daylight Time. The format of the returned segmentations must adhere to the guidelines that will be posted March 15. MAY 10, 2003: Bakeoff results announced privately to participants. MAY 25, 2003: Two-page system descriptions due. JULY 11, 2003: Full results published at the SIGHAN Workshop FURTHER INFORMATION Please watch the web site http://www.sighan.org/bakeoff2003 for developments. You may also contact Richard Sproat (rws
research.att.com) with questions regarding the contest.