Editor for this issue: Martin Jacobsen <marty
linguistlist.org>
EUROPEAN LANGUAGE RESOURCES ASSOCIATION ELRA Focus ===================================== MLCC Multilingual Corpora for Co-operation A collection of newspaper articles from financial newspapers in 6 languages (Dutch, English, French, German, Italian and Spanish) and a set of parallel texts in the 9 European Union official languages (as of 1993) ===================================== The current catalogue of ELRA consists of more than 500 language resources (!) available for speech, written or terminology works. This electronic message aims to remind of the availability of one of them, namely the MLCC Multilingual Corpora for Co-operation. The MLCC text corpus has two main components - one set to allow comparable studies to be carried out in different languages and one set as the basis for translation studies. The first set is referred as the Polylingual Document Collection (ELRA-W0006), a collection of newspaper articles from financial newspapers in 6 languages (Dutch, English, French, German, Italian and Spanish). It consists of the following sub-corpora: Dutch - "Het Financieele Dagblad" - 1992-1993 The corpus contains articles from the Dutch financial newspaper "Het Financieele Dagblad" editions of 2nd January 1992 through to 24th December 1993. It contains around 8.5 million words of text. English - "The Financial Times" - 1993 The corpus contains articles from the British financial newspaper "The Financial Times" editions from the year 1993. The corpus contains around 30 million words. French - "Le Monde" - 1992-1993 A corpus of articles from the French newspaper "Le Monde", consisting of two years worth (1992-1993) of articles on financial subjects, approximately 10 million words. German - "Handelsblatt" - 1986-1988 This subcorpus consists of articles from the period 02.01.1986 to 15.06.1988. It contains some 33 million words. It may be possible to obtain more recent articles from "Handelsblatt". Italian - "Il Sole 24 Ore" - 1992-1993 The corpus described here contains articles from the Italian financial newspaper "Il Sole 24 Ore" from the year 1992. This corpus contains some 1.88 million words. The SGML-markup was done by the University of Edinburgh. Spanish - "Expansion" - 1994 This subcorpus contains articles from the Spanish financial newspaper "Expansion" editions from 21.10.1991 to 24.10.1991 and 14.05.1994 to 27.12.1994. It contains some 10 million words. Price for ELRA members: for research use: 360 ECU for commercial use: 1500 ECU Price for non-members: for research use: 750 ECU for commercial use: 3200 ECU The second set is a Multilingual Parallel Corpus (ELRA-W0007) consisting of translated data in nine European languages: Danish, Dutch, English, French, German, Greek, Italian, Portuguese and Spanish. The parallel data, provided by the European Commission, comprises two sub-corpora from the Official Journal of the European Communities: Official Journal of the European Commission, C Series: Written Questions 1993 Records of questions and answers regarding European Community matters. The data is regularly published as one section of the C Series of the Official Journal of the European Community in all official languages (previously nine). This corpus contains written questions asked by members of the European Parliament and corresponding answers from the European Commission in 9 parallel versions. The total size of the corpus is approximately 10.2 million words (ca. 1.1 million words per language). Official Journal of the European Commission, Annex: Debates of the European Parliament 1992-1994 This parallel corpus is the records of Parliamentary sitting published as an annex to the Official Journal of the European Community Debates of the European Parliament. The Parliamentary Debates are a record of what was said by members of the meeting as well as written input provided to the meeting. The original data from which the translations are produced consist of a transcript of the sittings, each member speaking in the language of his choice. The final version consists of nine parallel versions of the material. The texts delivered comprise the Debates of Parliament from January 1992 to July 1994. This sub-corpus contains some 5 to 8 million words per language. Price for ELRA members: for research use: 120 ECU for commercial use: 480 ECU Price for non-members: for research use: 200 ECU for commercial use: 800 ECU ******************************************** For more information, please contact: ELRA/ELDA 55-57 rue Brillat Savarin 75013 PARIS Tel: +33 1 43 13 33 33 Fax: +33 1 43 13 33 30 E-mail: info-elraMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuecalva.net http://www.icp.grenet.fr/ELRA/home.html ********************************************
We would like to draw your attention to the release of the new version of the Link Grammar Parser, version 3.0. The Link Grammar Parser is a syntactic parser of English, based on link grammar, an original theory of English syntax. Given a sequence of words, the system assigns to it a syntactic structure, composed of a set of arcs or "links" of different kinds, connecting pairs of words. The parser has a dictionary of about 60000 word-forms; it has coverage of a wide variety of syntactic constructions, many idioms, and capitalization and punctuation phenomena. It is able to make guesses about the syntactic categories of unknown words based on context. It is also robust, and can assign structure to sentences even when it cannot parse them completely. The system is written in C, and runs under unix and windows. Since our last version (version 2.0, in Fall 1995), we have made a number of improvements to the parser. Its speed is greatly enhanced; its coverage is significantly improved. We have also incorporated a "panic mode", which allows the parser to recover some structure on long sentences in a short amount of time. We have also developed an API for the system. This allows the parser to be easily integrated into your own applications. At the Link Parser website (http://www.link.cs.cmu.edu/link/) you can try the parser out for yourself. This website also contains more information and detailed documentation of the parser. You are welcome to download the system from the website and use it for personal or academic purposes. If you intend to use it for commercial purposes, please contact us. Contact information, and information on the Link Group at Carnegie Mellon, can be found off the Link Group home page at http://www.link.cs.cmu.edu/ Davy Temperley Daniel Sleator John Lafferty dt3Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuecolumbia.edu sleator
cs.cmu.edu lafferty
cs.cmu.edu
The Oxford Text Archive is launching a state-of-the-art web service later in the year, reflecting our new status as a Service Provider for the UK's national Arts and Humanities Data Service. Before this web site goes live, we need feedback from all types of user. So whether you are new to electronic text or an expert in the field, we invite you to visit our site and use our feedback form to tell us what you think. As always, the OTA's homepage remains http://ota.ahds.ac.uk/ but throughout this period of testing, users will have the option to visit either our current site, or our new experimental service. NB.in order to fully appreciate this service, we recommend that you use either Netscape Navigator 4 or IE 3 (or better). Features of the new OTA site include: - an online catalogue of all our texts, whether online or offline - a facility to create a corpus of texts - a download facility for TEI encoded texts that allows you to choose from a variety of different formats - online tools to help you preparing your texts in SGML - a listing of future events, as well as papers from previous workshops and conferences. - a FAQ, based on the OTA's 22 years of operation. - a search tool and site map to help you find your way around the site - an SGML software repository - "Guides to Good Practice" on the creation and documentation of electronic texts (in preparation) - ----- Oxford Text Archive http://ota.ahds.ac.uk infoMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueota.ahds.ac.uk +44-1865-273 238 13 Banbury Road, Oxford, OX2 6NN, UK