Editor for this issue: <>
Postal address: School of English University of Birmingham Birmingham B15 2TT UK fax: (int +) 44 21 414 3600 e-mail: goutsosdMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueuk.ac.birmingham 27 November 1992 Dear Colleague, We have recently become aware of the lack of communication between researchers on Modern Greek and the need for exchange of information, and so we are taking the initiative to distribute this survey of machine-readable corpora of Modern Greek. Its aim is to collect information about the nature and structure of collections of text in machine-readable form and the specifications of hardware and software tools. This information will be available to interested researchers and is intended to provide a basis for discussion and exchange of information on the future of Modern Greek corpora. By corpus, we mean broadly a text collection, comprising texts to be studied individually, not linked in any coordinated way, collected works of an author, texts selected to study a particular author, textbanks, databases or bibliographies. If you are not personally involved in the compilation of such a machine- readable corpus, could you pass the survey to others or suggest their names to us. We would hope to complete the results of the survey by March 1993; depending on the extent of the response we may come back to you for more detail. We would like to thank you in advance for your help and we'd be happy to hear any suggestions from you. Dionysis Goutsos Rania Hatzidaki Philip King Modern Greek Corpus Initiative Survey of machine-readable corpora of Modern Greek A. CORPUS PROFILE A1. By what name is the corpus known? A2. Who compiled the corpus? A3. Where was it compiled? (Institution) A4. Contact Address Telephone Fax E-mail A5. When did the compilation start? A6. What was the incentive for starting the compilation? B. COMPUTER FACILITIES AND SOFTWARE B1. How are texts entered? (word-processor, text-editor, typesetting tapes, optical scanning, other) B2. How is the corpus stored and in what format? B2.1.What computer facilities do you use? (IBM Personal Computer or compatible, Apple Macintosh - workstation - mainframe) B2.2. What software do you use for corpus processing? (please specify item and function: word frequency, concordancing of selected items etc.) B2.3. Do you use ready-made or customized software? B2.4. If you use your own software, which programming language do you use? B3. How do you handle the special problem of Greek characters? - in input processing - in screen output - in printing B4. Do you have software for linguistic annotation (tagging, parsing, lemmatization)? If yes, specify C. TEXT DETAILS C1. How was the text acquired? C2. How is the corpus organized? C3. Can you give some details of the content? C3.1. Written texts: C3.1.1. What genres are included in your collection? C3.1.2. What are the media of the original texts? (printed book, periodical, manuscript, ephemera, other) C3.1.3. Do you encode typographic and layout information? If so, specify C3.2. Spoken texts (transcriptions): C3.2.1. What genres are included in your collection? C3.2.2. What is the medium of the original source? (TV, radio, telephone, direct: talk, conversation, other) C3.2.3. Is the material spontaneous or not, surreptitious or not? C3.2.4. Do you encode information about speakers (e.g. age, sex) or about the recording? C3.2.5. What transcription system do you use? (phonetic, phonological, enhanced orthographical, orthographical) C4. What period do the texts in the corpus represent? from _____________ to ____________ C5. What is the total amount of data stored in your collection? - in bytes - in words - in minutes of spoken text recording C6. What use is made of the corpus? (specify, where appropriate) - to build up a multifunctional linguistic corpus - for lexicographic purposes - for literary research - for stylistic research - for preparation of a scholarly edition - for research in linguistics - for research in language learning/ teaching - for commercial applications - for natural language processing applications - other C7. Is it available to other interested parties? If so, under what conditions? D. VIEWS AND PERSPECTIVES: D1. Do you plan any changes in the composition of your corpus? D2. Are you planning to develop new text-handling software? D3. Are there any specialized areas of Modern Greek for which a corpus approach would be particularly useful? D4.1. What are your views on the development of a general corpus of Modern Greek (such as the Brown Corpus of English or the Birmingham English Corpus)? D4.2. What would you consider to be the optimal size of it? D5. Do you prefer a 'clean text' strategy (i.e. plain orthographic files) as opposed to annotated, phonologically coded, parsed etc. text? D6. Do you think that multilingual corpora or corpora containing 'parallel texts' are needed? D7. Do you have any other views on the development of Modern Greek corpora and software for processing them? E. PUBLICATIONS: Please list any publications that you are aware of that were based on the electronic text you describe