Editor for this issue: Andrew Carnie <carnie
linguistlist.org>
Simon Philip Botley, Anthony Mark McEnery and Andrew Wilson (2000) Multilingual Corpora in Teaching and Research. Rodopi: Amsterdam - Atlanta. Binding: Paperback. ISBN: 90-420-0551-3 (bound). Pages: 220. Price - ??? Reviewed by Niladri Sekhar Dash, Indian Statistical Institute, Calcutta, India. Synopsis The publication of this volume implies the maturity of the sub-area within linguistics and focuses towards multiple applications of corpora in research and education. This is an important addition to other volumes of corpus linguistics by Aarts and Meijs (1984, 1986) Sinclair (1991), Svartvik (ed.) (1992), Barnbrook (1996), McEnery and Wilson (1996), Graeme Kennedy (1998), Biber et. al. (1998), Ooi (1998) and many others. Though most of the books related with corpus linguistics deal with different aspects of corpus design and development, types of corpora, annotation schemes, corpus tools, and general applications of corpus, some are highly object-oriented, application-intended and target-specific. Among these, the volume of Boguraev and Pustejovsky (1996) deals with corpus-processing for lexical acquisition, that of Ooi (1998) investigates corpus-based lexicography, and that of Oakes (1998) explores statistical techniques and corpus applications. This volume is also designed with a specific purpose: to show how multilingual parallel corpora can be used in teaching and research. However, almost equal emphasis is given on design and application of alignment technology, an essential tool for extracting appropriate information form parallel corpora. In the introductory Chapter (pp. 1-37) Michael Oakes and Tony McEnery of Lancaster University, UK sum up the core of different methods anchored for bilingual text alignment referring in some details to various statistical and linguistic techniques used by different scholars engaged to the task. The chapter presents a general, up-to-date overview of current works in the development of alignment technology, and provides a thematic classification of the contents of the following chapters. In Chapter 2 (pp. 38-64) Michel Simard, George Foster, Marie-Loise Hannan, Elliott Macklovitch and Pierre Plamondon of Centre d'Innovation en Technologies de l'Information, Canada, deal with bilingual text alignment as a part of translation analysis (TA) referring to the reconstruction of correspondences between segments of a source text and the segments of its translation. In introduction they identify the recent upsurge of text alignment technology and locate the application areas both in academic and commercial sectors. After differentiating among the methods used by Brown et al. (1991), Gale and Church (1991, 1993), Dagan et al. (1993) and Melamed (1996) and others they describe JACAL (Just Another cognate Alignment Program) which is developed by them to look for similar patterns of characters to produce a reliable bi-text map independent of the texts' logical divisions. However, it maps only a fraction of the characters (first 4 characters) of cognate words in a pair of texts. As a stand-alone program it's robustness is evaluated by running on a collection of bilingual texts with satisfactory results. The next part describes SALIGN - a program for sentence alignment based on model of Brown et al. (1993). The search engine (called TransSearch) allows a user to search a large corpora of bilingual texts for specific expressions in one or both languages. It offers two advantages: "first, it guarantees that the system's bi-text output contains both the query and its translation; second, it provides a coherent context for the presentation of results, enabling the user to evaluate the relevance of each resulting translation in terms of the problem at hand" (p. 48). The program is tested on its own and with JACAL on a collection of bilingual texts to find that "SALIGN consistently records much higher recall (a traditional evaluation measure for information retrieval systems) figures than the other programs" (p.54). Despite encouraging robustness the overall result of the program is quite disappointing pointing to its inability to account for translation omissions, insertions or segmentation errors. The next part describes TMALIGN - a word alignment program designed after Brown et al.'s stochastic translation model (1993) to calculate bi-textual correspondences at the word level. It is run over a bilingual corpus drawn from the Canadian Hansards. The result is significantly better probably because the input data is quite large and clean, and the program is composed on the sentence-level aligned corpora. Thus they describe their effort to align linguistic units in bilingual texts at character, sentence and word level. Finally, they direct towards some future works related with proper segmentation of input texts into paragraphs, sentences and words; advent of a number of translation support tools (e.g. translation memories, bilingual concordances, translation checkers, translation dictation machines etc.), and the development of more elaborate and robust models to account for full range high quality machine translation. In Chapter 3 (pp. 65-69) Pernilla Daneilsson and Daniel Ridings of Gteborgs Universitet, Sweden, briefly inform how they conduct a course for getting students ready to select appropriate terminology for translation purpose. The corpus they use includes both translated and non-translated texts of Swedish and English covering same domains and genres. Using Gale and Church's (1993) alignment algorithm they align texts at sentence level; store them in TEI format; and use Normalised SGML Library for alignment annotation. As their aim is to train the students for searching out appropriate terms from the source and target texts for translation, they decide that in both texts "words or phrases having a special meaning or frequent use in a specific domain/genre/area" (p. 68) should be searched. However, their experience shows that (i) a term in the source text may not have an appropriate corresponding term in the target text, (ii) translation 'per se' is probably impossible since a word in one culture may not correspond to a single word in another culture, (iii) a translated term may be a hyponym to the term in the source text, and (iv) culturally specific terms in the source language may not have correspondence in the target language. The students are, therefore, enabled to identify the correct domain of use of each term and locate it in Swedish contexts in parallel with its translation in English contexts. In case of problematic terms they are advised to browse through the record of previous translations when required. In Chapter 4 (pp. 73-85) Carol Peters, Eugenio Picchi and Lisa Biagini of Istituto di Linguistica Computazionale, Pisa, Italy, describe how a Bilingual Corpus System is implemented for processing and searching both parallel and comparable text archives for language teaching and learning. Though they use both parallel text corpora (consisting sets of translationally equivalent texts) and comparable text corpora (comprising sets of texts from pairs of languages potential to contrastive and comparative study), they feel that "parallel text corpus is most likely to provide useful data for the average second language student, who is mainly looking for information on ways in which a given word or phrase can be translated acceptably in another language" (pp. 74). To achieve their goal they use an Italian/English bilingual lexical database and morphological analysers and generators to process and search parallel corpora. Their system operates in two distinct stages: in the first stage "a bilingual electronic dictionary and morphological components are used to link pairs of English/Italian texts on the basis of L1/L2 translation equivalents" (p. 76), and in a second stage the "L1/L2 links are used by the bilingual text query system to construct parallel contexts for any form or co-occurrence of forms searched in either of the two sets of texts" (p. 76). After processing parallel bilingual text archives by synchronisation procedure all the links are obtained and memorised for parallel query system. For searching each source text word the parallel concordances for the target text are constructed and associated links, if any, are searched. With these, the user can search for a single word; can search for all the words of a given lemma by using the morphological generator; and can extract relevant information on the translation of idioms and collocations. In the next part of the chapter they describe how they apply a different approach in processing comparable corpus. For this they consider texts from the same domain or on the same topic from Italian and English mainly focusing on nouns with the conviction that "in domain-specific corpora it is mainly the nouns that bear the weight of topic-specificity, i.e. technical message, the verbs tend to have a more general meaning" (p. 80). Given a particular term or set of terms found in the texts of the source language, the aim is to identify contexts which treat the same argument in the texts of the target language. To do this they attempt to isolate the vocabulary or context related to that term in the source corpus assuming that the word will be surrounded by a similar vocabulary or context in target language. Next using their lexical tools (morphological analysers and generators, a bilingual lexical database etc.) they construct equivalent vocabulary for both the languages and create sets of translation equivalents. Finally, the system searches the target corpus in order to identify words and expressions that can be considered as in some way lexically equivalent to the selected term in the source language. Finally, they try to implement a function that will allow them "to query on the combinations of more than one term (collocates and compounds), essential for studies on terminology as terms generally appear in the form of multiword units" (p. 83). They also intend to refine the search criteria; increase the efficiency of the algorithm in order to improve performance and to increase precision of retrieval eliminating noise; and test alternative methods of the MI (Mutual Information) index to see whether the results change substantially. In Chapter 5 (pp. 86 - 91) Rene Meyer, Mary Ellen Okurowski and Thrse Hand of New Mexico State University, USA describe how their adult-centred language training approach is effective by using authentic corpora and language tools. For this purpose they use OLEADA (meaning, 'tidal wave' in Spanish), an indigenously developed multilingual software environment integrating on-line multilingual text corpora, information retrieval and language analysis tools. A single user interface allows smooth access to the texts and tools in ten languages. Users are enabled to study the language of retrieved texts by using OLEADA's different language analysis tools such as on-line dictionaries and references, XConcord, parallel text alignment, segmentor, frequency count and user-generated annotations. In future some more tools such as lexical collocation identification, part-of-speech tagging, parsing and entity identification are supposed to be incorporated in the system. The system has strong application potential for language trainers, classroom instructors and independent learners as it provides "just-in-time training with texts on demand, tasks that parallel professional needs, and comparative feedback for self evaluation on the part of the learner"(p. 87). The classroom instructors can use it to respond to learner needs, to reply queries, to retrieve topical information, to find reference, and to many other similar tasks of teaching and instructions. The trainers can use it for frequency count to asses which elements should be emphasised and taught first and to compare the frequency of words within a document to the frequency of the same words within a corpus or user-specified sub-corpus. By KWIC they can locate and display the terms and phrases with all their contexts of use in corpora. With XConcord they can present, compare and discuss the contexts of important items to improve students' expectancy skills by studying actual content domains of items. The parallel search tools enable them to access foreign language texts and the corresponding English translations with applications of all other tools inbuilt in OLEADA. Moreover, the annotation facility provides clues, definitions, grammatical explanations and similar information about the text to the learners while they work through particular tasks. For independent learners OLEADA serves in the same way as it does for other two types of users. Moreover, it enables them to identify work-related texts, prepare relevant assignments and discover and study new language phenomena. On-line dictionaries provide them language specific linguistic resources; segmentors help them in automatic segmentation of written texts into paragraphs, sentences or words; and XConcord support them to examine context-specific samples and grammatical structures. Thus OLEADA resources support the learners through all stages of learning process including the final feedback. The OLEADA corpora also contain written speech read aloud as well as transcribed colloquial speech with accompanying audio to enhance listening comprehension. All these utilities put together enable the learners to study as they work and work as they study. In Chapter 6 (pp. 92-105) Jennifer Pearson of Dublin City University, Ireland, describes an approach to the teaching of terminology by using electronic resources. For terminological research the resources available to students are of three types: a large collection of full texts (articles from newspapers and journals and encyclopaedias), large collection of abstracts (of articles from science, computing and business) and single texts (in more than one language with a similar communicative function dealing with a particular topic: mostly lectures as an aid to specialised translation). For selection of resources a number of criteria are under consideration such as author-reader relationship (expert-expert or expert-naive communication), published texts (because they ensure some degree of acceptance of the terminology within particular field), text origin (product of an individual or a collaborative venture), constitution of texts (single or composite), factuality (text must be factual), technicality (text may be technical or semi-technical), and intended outcome of the text (informative as an article in newspaper, didactic as used in teaching of a subject, and stipulative (as standard or regulatory texts prescribing and defining terms used in particular subject domains). While searching a term students look for a number of different categories of information. To situate the term in conceptual hierarchy they look for evidence of genus-species relations (term preceded or followed by its superordinate term or general language equivalent), part-whole relations (terms is either a part or whole of the preceding or following term), quasi-synonymous relations (terms explained using an equivalent term or phrase) and similar other relations. Moreover, they require to know the characteristics of the term such as its purpose, origin, function, inputs, outputs, properties etc. Once they are equipped with all these clues they are asked to identify the meaning of a term (or two culture-specific terms); to establish equivalence across language; and to identify appropriate collocations and related terms. Their experience in identifying the meaning of culture-specific terms reveals that terms "which occur quite frequently in financial, political and economic texts are particularly problematic for translators" (p. 99). To them the best solution for this is to "use the document itself to find clues to the meaning of the term and, if this fails, to consult other documents on the subject" (p. 99). In case of establishing equivalence across languages they propose (as they did for English and French) for identification of meaning of a term in the source language as well as identification of meaning of a similar term in the target language with the help of a dictionary. Once potential equivalents are identified they can be confirmed of their appropriateness by the use of a parallel corpora (bilingual texts dealing with the same subject area and with similar communicative function). In conclusion they underline the importance of finding related terms (as they may provide basis for additional glossary entries) and collocations (as this information is rarely found in dictionaries) for retrieving terminological definitions from texts. In Chapter 7 (pp. 106-115) Michael Barlow of Rice University, USA describes how parallel texts can be used in language teaching. In his opinion by using a corpora and text analysis program students can learn language in a better way than using a dictionary, thesaurus or grammar because corpora provide learner "a rich and adaptable research environment in which the data are selected examples of language use, embedded in their linguistic context" (p. 106). He cites some case studies on the treatment of reflexive forms and the use of certain lexical items in English corpora to substantiate that corpus-based investigations are more competent to reveal the complexities and fine-grained patterns of use of lexical items in language. He postulates that "it is likely that the bulk of language acquisition is the result of inductive rather than deductive learning mechanisms, a fact which, if true, has far-reaching consequences for the teaching of languages" (p. 109). In the second part of the chapter he describes the research based on the analysis of parallel texts, some uses of parallel texts in the language classroom, and the ParaConc: a simple parallel text concordance program to search words and phrases in parallel corpora. Searching through parallel texts he finds that some of the reflexives in English are not translated with a reflexive in French. Similarly, collocations and polysemy structure of particular lexical item of English strongly contrast with their French correlates. He is, however, able to locate the areas of dispute and identify the reasons of such differences. Finally, he argues that students can use parallel corpora in classroom 'for the feel of a second language'; to obtain some concrete knowledge of correspondences; to explore the richness of context of a particular lexical item not available in bilingual dictionary; to gather important information concerning the relative frequency of different constructions and collocations; to understand the distinctions of meaning expressed by particular terms in both source and target language; to know how the context in terms of discourse and genre can provide clues to the appropriate meanings etc. In Chapter 8 (pp. 116-133) David Woolls of Birmingham University, UK describes the development of a user-driven multilingual parallel concordancer as a tool for use in the classroom. The system, developed as a part of Lingua project, works with Danish, English, French, German, Greek and Italian. The corpus includes a set of texts covering children's literature, fiction, non-fiction and general scientific writings, and is conformed to the guidelines of TEI (Text Encoding Initiative). It uses Minmark - a highly reduced mark-up program designed after SGML (Standard Generalised Mark-up Language) as detailed marked up corpora posit problems in handling the texts and translations. The alignment algorithm of Gale and Church (1993) is simplified to make user-friendly and work by reference to paragraph and sentence boundaries. The advantage of the method is that whatever the dual linear/length relationship exists between languages, the algorithm can be considered language independent. It is considerably simpler and yet produces results parallel to Gale and Church algorithm. The program has an impact on the users at the time of searching, sorting and testing of the data. They are able to move between any pair of languages, study in contrastive translations, and select languages and files quite easily. While searching for contexts of particular items it is assumed that the contexts should appear within a proximity of up to six words to the left, right or either side of the search item, or anywhere in the same sentence, or anywhere in the same paragraph. The proximity option is equally advantageous like other concordancers, the sentence option is useful "where examples of linguistic features prone to extremely distribution are sought" (p.130), and paragraph option is useful "to identify paragraphs where two characters in a text are interacting" (p.130). For quick and practical classroom operation the system needs good search speed and reasonable accuracy leading to the remodification of the standard concept of alignment and encoding. In Chapter 9 (pp. 134-147) Stig Johansson of University of Oslo, Norway and Knut Hofland of University of Bergen, Norway describe their current works on contrastive analysis and translation studies with the English-Norwegian parallel corpus, and focus on the new directions of research. The size of the corpus is approximately 2.5 million words, consisting comparable original texts of each language and their translations into the other language. It is encoded according to the TEI guidelines and aligned at sentence level. Presently the corpus is subjected to study the "presentative constructions in English and Norwegian, word order in English and Norwegian, expressions of possibility in English and Norwegian, Norwegian discourse particles and their English correspondences" (p. 134). However, the chapter only presents some comparative studies of occurrence of some linguistic items in parallel corpora and their respective translations and "the expansion of the corpus to other languages for use in multilingual research" (p. 135). Navigating through the parallel corpus they find that the Norwegian modal auxiliary 'skal' is far more widely used than the etymologically related English 'shall', while the Norwegian modal particle 'nok' may correspond to a wide range of forms (adverbs, verbs and clauses) in English. The plausible interpretation of these results may be the availability or non-availability of appropriate terms in translations, or some language or culture specific factors. The second part of the chapter deals with analysis and contrastive studies of multilingual corpora comprising six English fiction texts aligned with their translations into German and Norwegian. Using Dice similarity measure (McEnery and Oakes 1995) they extract cognates form the English original texts and the translations to show that "while Norwegian shares a lot of vocabulary both with English an German, the latter have far less in common" (p.141). They also study specific text-based constructions (e.g. equative structures, cleft constructions, analogous one-clause constructions, non-analogous one-clause constructions, initial 'which' etc.) across three languages to show that "English reversed pseudo-clefts are almost always conveyed by other types of constructions in German and Norwegian translations" (p.146). In conclusion they opine that the emergence of parallel multilingual corpora would give new insight into language research; supply important input for the production of teaching materials and the writing of contrastive grammars and bilingual dictionaries; and provide a bridge between language description and language use. In Chapter 10 (pp. 148-156) Raphael Salkie of University of Brighton, UK describes how they compile a small and medium-sized multi-lingual corpora (SMEMUCs: a new acronym coined by the author) in the INTERSECT project and use it as a source of language research and teaching. For the task at hand they give the comparable corpora (texts from English, French and German) a computer-readable form; take some decisions regarding editing of texts for easy handing, storage and retrieval; make suitable alignment of texts at sentence and paragraph level; correct typographical (spelling) errors form the aligned texts; and save files in text-only (ASCII) format. The corpora, prepared thus, are used for contrastive linguistic researches (e.g. use of epistemic modality in French and English, use of English 'but' vs. French 'mais', or English 'allege' Vs, German 'sollen' etc.); for studying different aspects of grammar and vocabulary; for "comparing corpus data with the entries in bilingual dictionaries" (p.154); and for teaching translations. In Chapter 11 (pp. 157-176) Josef Schmied and Barbara Fink of University of Chemnitz, Germany describe a contrastive lexicological study based on an English-German parallel texts and translations. The chapter highlights the use of English 'with' and its German translation equivalents in a sub-corpus comprising texts from tourist brochures, publications by European Union, scientific textbooks and literary texts. In the first half they identify the prepositional and prototypical use of 'with' from the corpus texts, note it's semantic diversity, and search for its syntactic categories. They also observe the distribution of 'with' across text types and syntactic functions in English corpus to show that "whereas adnominal and clausal 'with' are particularly frequent in tourist texts, literature uses more adverbial 'with'" (p. 163). For English preposition 'with', the German has many translation equivalents among which 'mit' is mostly used followed by other prepositions like 'bei', adjectives with an adnominal function like 'beschmt', or zero-translations etc. In some cases in German either the entire sense element is omitted or other solutions are solicited to express the sense of 'with'. Most of these changes are caused due the content of translations, text types, translator's language choice, besides other language-specific grammatical or syntactic factors. They argue that the prototypical equation 'with' = 'mit' is unsatisfactory in many respects, quantitatively and qualitatively. Therefore, it is better to try a functional grammar approach as a primary classification of 'with' because meaning-based categorisation leaves so many cases in between. In conclusion they hope that contrastive corpus linguistics can (i) show that simple word-class based equivalents found in bilingual dictionaries are not sufficient for translations; (ii) expose how different innovations spread across text-types until they permeate the entire language structure; or (iii) provide more detailed empirical description of a language in a typological perspective. So far the volume emphasises on Western European languages which are mostly genetically related. It has little exposure to the possibility of developing parallel corpora or text alignment algorithms for genetically or typologically non-related languages. This part is, however, taken into consideration in the concluding Chapter (pp. 177-191) where Tony McEnery, Scott Piao and Xu Xin of Lancaster University, UK present some works on experimental corpus building in two 'un-related' languages, English and Chinese comprising texts from general science, letters, poetry, fiction and social service leaflets. They also try to design an annotation scheme for parts-of-speech encoding, and develop an algorithm based upon bi-variate distributions to align sentences of parallel texts at word level. With some necessary modifications on the existing techniques or innovating some new techniques for the problem at hand, they are successful in their effort to demonstrate that their new alignment technique on the correlation between English and Chinese pairs is effective and the results are quite stable in the corpora. However, their attempt raises a demand for more such work among various language pairs so that exiting "alignment technology can be tested and refined, enabling a wide range of work between large number of languages" (p. 189). A critical evaluation It is a rare experience to come across a volume where all the chapters are so well written with ample scope for laymen to get acquainted with this new area of language research and training. In fact such a difficult area would not have been so nicely handled if the authors were not well versed in their respective fields. The book shows how corpora (either monolingual or bilingual or parallel or comparable or aligned) can be used for teaching as well as for new research and understanding the language. As a reviewer I am delighted to read this book, and I believe anyone interested in applying corpora in language teaching and research can gather from this book many novel and exciting ideas for exploiting corpora: a treasure house of linguistic properties. However, only a few observations which I like to cite here may be considered in the next edition of the book. (i) Barring the last chapter, in all other chapters the discussion or experiment are centred within the language pairs. Because of their genealogical, typological, orthographic and many other similarities, the corpora of these languages are probably easier to align, which may not be so for corpora of language strongly different in their respective features. It would be interesting to see what are the new approaches are to be employed for aligning corpora belonging to English-Hindi, English-Japanese, Bangla-Chinese or Arabic-Japanese. (ii) The application of multilingual corpora is not confined within teaching and research as focused in the title of the book. In fact, almost all writers have identified many more application areas of corpora. The multi-functional utility of corpora is probably the best perceived by Svartvik (1986) who visualises that corpora can be used in "lexicography, lexicology, syntax, semantics, word-formation, parsing, question-answer synthesis, software development, spelling checkers, speech synthesis and recognition, text-to-speech conversion, pragmatics, text linguistics, language teaching and learning, stylistics, machine translation, child language, psycholinguistics, sociolinguistics, theoretical linguistics, corpus clones in other languages such as Arabic and Spanish - well, even language and sex". The application scope of corpora is further expanded in observations of Atkins et al. (1992), Leech and Fligelstone (1992), McEnery and Wilson (1996), Rundell (1996), Barlow (1996), Biber at al. (1998), Kennedy (1998), Teubert (2000) and many other experts in this area. (iii) A few minor mistakes in orthography such as in page 62, line 39 'tem' should be 'term', in page 179, line 22 'in noted' should be deleted, 'Section 3.1.1' mentioned in page 179, line 23 is not found in chapter 3 etc. (iv) In some cases the full forms of the abbreviated terms like LL (page 14, line 10), LDB (page 76, line 17), OLEADA (page 86, line 3), INTERSECT (page 149, line 6), are not given in the texts. (v) Probably a glossary of different terms (mostly new) used in the book would have been a fine attribute to the volume as well as a good help to the readers. However, the book with comprehensive introduction to the subject is a good reference work to all corpus users as well as to language researchers, instructors and teachers. The general readers with a liking for language and linguistics also can find this book interesting. The quality of paper, printing and binding is of international standard. E)Bibliography Aarts, J. and Meijs, W. (eds.) (1984) Corpus Linguistics. Amsterdam: Rodopi. Aarts, J. and Meijs, W. (eds.) (1986) Corpus Linguistics II. Amsterdam: Rodopi. Atkins, S., J. Clear and N. Ostler. (1992) "Corpus Design Criteria", Literary and Linguistic Computing. 7(1): 1-16. Barlow, M. (1996) "Corpora for Theory and Practice", International Journal of Corpus Linguistics. 1(1): 1-38. Barnbrook, G. (1996) Language and Computers. Edinburgh: Edinburgh University Press. Biber, D., S. Conrad, and R. Reppen (1998) Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press. Boguraev, B. and J. Pustejovsky (1996) Corpus Processing for Lexical Acquisition. Cambridge, Mass.: MIT Press. Brown, P. F., Lai, J. and Mercer, R. (1991) "Aligning Sentences in Parallel Corpora", in Proceedings of ACL-91, Berkeley. Brown, P. F., Della Pietra, S. A., Della Pietra, V. J, and Mercer, R. L. (1993) "The Mathematics of Statistical Machine Translation: Parameter Estimation", Computational Linguistics, 19(2): 263-312. Dagan, I., Church, K.W. and Gale, W. A. (1993) "Robust Bilingual Word Alignment for Machine Aided Translation", in Proceedings of the Workshop on Very Large Corpora: Academic and Industrial Perspectives, Columbus Ohio. Gale, W. A. and Church, K. W. (1991) "A Program for Aligning Sentences in Bilingual Corpora", in Proceedings of ACL-91, Berkeley. Gale, W. A. and Church, K. W. (1993) "A Program for Aligning Sentences in Bilingual Corpora", Computational Linguistics, 19(1): 75-102. Kennedy, G. (1998) An Introduction to Corpus Linguistics. London: Addison-Wesley Longman. Leech, G. and S. Fligelstone (1992) "Computers and Corpus Analysis" in C. S. Butler (ed.) Computers and Written Texts. Oxford: Blackwell Publishers. 115-140. McEnery, A. M. and Oakes, M. P. (1995) "Sentence and word alignment in the CRATER project: Methods and assessment", in S. Armstrong-Warwick and E. Tzoukerman (eds.) Proceedings of the EACL-SIGDAT Worshop, Dublin, pp. 77-86. McEnery, T. and A. Wilson (1996) Corpus Linguistics. Edinburgh: Edinburgh University Press. Melamed, I. D. (1996) "A Geometric Approach to Mapping Bi-text Correspondence", in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Philadelphia. Oakes, M. P. (1998) Statistics for Corpus Linguistics. Edinburgh: Edinburgh University Press. Ooi, V. B. Y. (1998) Computer Corpus Lexicography. Edinburgh: Edinburgh University Press. Rundell, M. (1996) "The Corpus of the Future and the Future of the Corpus". Talk at a special conference on New Trends in Reference Science at Exeter, UK (a hand out). Sinclair, J. (1991) Corpus Corpus, Concordance, Collocation. Oxford: Oxford University Press. Svartvik, J. (1986) "For Nelson Francis", ICAME News. No. 10: 8-9. Svartvik, J. (ed.) (1992) Directions in Corpus Linguistics: Proceedings of Nobel Symposium 82. Berlin: Mouton de Gruyter. Teubert, W. (2000) "Corpus Linguistics - A Partisan view", International Journal of Corpus Linguistics. 4(1): 1-16. A short biography of the reviewer ================================== Niladri Sekhar Dash passed MA in Linguistics from Calcutta University in 1991. In 1994 he completed ANLP from Indian Institute of Technology, Kanpur. From 1992 to 1995 he worked as Language Analyst in the TDIL (Text Development in Indian Languages) project of the Ministry of Information and Technology, Govt. of India. From 1995 to 1997 he worked as Technical Assistant in Computational Linguistics and NLP at Computer Vision and Pattern Recognition Unit of Indian Statistical Institute, Calcutta. From 1997 he works as Scientific Assistant in the same institute. He has submitted his thesis on corpus design and development for language processing for the Ph.D. degree to Calcutta University. His present areas of research are: corpus design and development, word processing, parts-of-speech tagging, morphological processing, Word Sense Disambiguation etc. His contact address is: Computer Vision and Pattern recognition Unit. Indian Statistical Institute. 203. B. T. Road. Calcutta 700035. India. Emails: <niladriMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueisical.ac.in> (Off.), <niladrisekhar
hotmail.com> (Res.).