Editor for this issue: <>
No. 1 (93-05-28) SEALLIG 28 May 1993 SEALLIG: South East Asian Languages & Linguistics Interest Group Moderator: Brian Migliazza <brianMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueipied.tu.ac.th> Linguistics Department Thammasat University Bangkok, Thailand Asst. Editors: to be announced ******************************************* FIRST NOTICE OF EMAIL INTEREST GROUP Welcome to a new email forum for South East Asian Languages and Linguistics Interest Group. There has been a lot of response from various people around the world for a language interest group focused on the languages of SEA -- so I will send out this notice and see what y'all think. If you like this idea please let me know. Let me know also your ideas on how to organize this list. Thailand has been on the email nets now for a while and there are quite a few universities here that are being added to Internet. What this means for us, is that many of the Thai linguists and other academics are now accesible via email. Other countries in SEA are also coming online with Internet, so that the potential for fruitful academic interaction is now possible being those of you working outside of SEA and the local scholars within SEA. This SEALLIG list is designed to facilitate quick interaction between all of us around the world who are interested in the languages and linguistics of this region. My idea would be to "LOOSELY" define the SEA region -- both in terms of the geography and in terms of the languages. Thus, I would consider SEA to run from Southern China to Indonesia, and from Eastern India to Philippines -- and including all the languages in between. I am willing to serve as moderator -- meaning that I would compile and collate all messages sent to me and then send them to you all. As in the LINGUIST file, it would be preferable for people to respond directly to the person making the query. Then that person should compile the responses and send them to me for distribution to the entire group. As for topic areas, probably it would be good to organize the comments/ messages by major language families -- probably like TB (Tibeto-Burman), AN (Austronesian), AA (Austroasiatic), MY (Mien-Yao), and TK (Tai-Kadai). If messages overlap these areas, we can put them in a general area. Also we can have a "Notices" section for information on programs in these languages, books published, journals available, upcoming conferences. If you are interested, we could also maintain a directory of scholars and the languages they study. Send in any feedback that you may have. I am open to suggestions. Hopefully soon we will have assistant editors for this list, from the universities here in Thailand (Chulalongkorn, Thammasat, and Mahidol). Brian Migliazza <brian
ipied.tu.ac.th>
RELEASE NOTE CONCERNING THE ENGLISH CONSTRAINT GRAMMAR PARSER DEVELOPED AT THE UNIVERSITY OF HELSINKI As of June 1, 1993, the English Constraint Grammar Parser ENGCG, developed at the Department of General Linguistics and the Research Unit for Computational Linguistics, University of Helsinki, is released for non-commercial academic use. ENGCG is released in collaboration with Helsinki University Licensing Ltd. The various parts of the system were written by the following persons: * ENGTWOL lexicon (c) Atro Voutilainen, Juha Heikkila * Grammar for morphological disambiguation (c) Atro Voutilainen * Grammar for syntactic functions (c) Arto Anttila * Two-level program (c) Kimmo Koskenniemi and Lingsoft, Inc. Constraint Grammar parser, academic version Bart Jongejan ((c) CRI A/S, Denmark) Constraint Grammar parser, production version (c) Pasi Tapanainen The system is shipped as a fully compiled run-time version for Sun SparcStations (2 or 10). (Depending on customer requirements, it may become available for other machines as well.) ENGCG is based on the Constraint Grammar framework originally proposed by Fred Karlsson. The theoretical background as well as the English description is documented as a book to appear under the title: Karlsson, F., Voutilainen, A., Heikkila, J. & Anttila, A. (forthcoming). ``Constraint Grammar: a Language--Independent System for Parsing Unrestricted Text''. To be published by Mouton de Gruyter. A short description of the main modules of ENGCG: Preprocessor * sentence boundary determination * normalisation of typographical conventions * detection of fixed expressions, e.g. multiword prepositions and compounds Morphological description: -- ENGTWOL, a TWOL-style morphological description * 56,000 entries * accounts for all inflected and central derived forms Morphological heuristics * a heuristic module that assigns ENGTWOL-style descriptions to those words not recognised by ENGTWOL. English Constraint Grammar (i) grammar for morphological (e.g. part-of-speech) disambiguation * 1,100 `grammar-based` constraints * 99.7--100% of all words retain the appropriate morphological reading * 3--6% of all words remain (partly) ambiguous * 200 `heuristic' constraints * resolves some 50% of remaining ambiguities * after heuristic disambiguation, 99.5% or more retain the appropriate morphological reading (ii) grammar for determining syntactic functions * 250 syntactic constraints for syntactic ambiguity resolution * some 75--85% of all words become syntactically unambiguous * some 95.5--98% of all words retain the appropriate syntactic-function tag Speed of analysis on a Sun SparcStation 2: -- There are two C implementations of the Constraint Grammar Parser. With the `academic' version, written by Bart Jongejan, CRI A/S, analysis speed is: * preprocessing, morphological analysis, morphological disambiguation: 35--55 words per second * preprocessing, morphological analysis, morphological disambiguation, syntactic analysis: 15--25 words per second This offer concerns non-commercial academic research purposes. ENGCG is distributed on a sublicence basis to academic departments. If your department wants to obtain the right to use ENGCG, please request a copy of the requisite Licence Agreement by sending the name and address of your department and the responsible person to Atro Voutilainen Dept. of General Linguistics P.O. Box 4, University of Helsinki FIN-00014 University of Helsinki Finland e-mail: avoutilaMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueling.Helsinki.FI fax: +358 0 191 3598 A Licence Agreement form will be sent to you promptly. When the form has been properly completed and returned, and the fee of 1,500 US dollars paid, the software will be shipped immediately. The package contains the following items: - ENGCG on a 3.5 inch HD diskette - book manuscript - a short User's Manual Contact Atro Voutilainen (avoutila
ling.helsinki.fi) or Fred Karlsson (fkarlsso
ling.helsinki.fi) for further details. There is also a production version of the parser, written by Pasi Tapanainen. It is some 20--25 times faster than the academic version. (Those interested in non-academic use of ENGCG should contact Mr. Krister Linden (klinden
ling.helsinki.fi).) For the time being, texts of up to 300 words can be analysed with ENGCG, free of charge, for testing purposes, by sending the text as an e-mail message to engcg
ling.helsinki.fi. The analysis is sent as return mail. -- More specific instrictions about testing ENGCG can be obtained by sending a mail message to engcg-info
ling.helsinki.fi.