LINGUIST List 8.1670

Sat Nov 22 1997

FYI: SEMCOM: Rav-Milim Project, NorFa Summer School

Editor for this issue: Martin Jacobsen <martylinguistlist.org>


Directory

  • alan harris, SEMCOM: Rav-Milim Project (fwd)
  • Juhani Jarvikivi, NorFa Summer School: Languages, Minds and Brains

    Message 1: SEMCOM: Rav-Milim Project (fwd)

    Date: Fri, 14 Nov 1997 10:04:20 -0800 (PST)
    From: alan harris <vcspc005email.csun.edu>
    Subject: SEMCOM: Rav-Milim Project (fwd)


    SEMCOM

    Online bulletin of the Commission on Semiotics and Communication, National Communication Association// [If you would like to be included in the SEMCOM list, please reply or send a note to alan.harriscsun.edu with the command, "add SEMCOM", in the body.

    ============================================================== Alan C. Harris, Ph. D. TELNOS: main off: 818-677-2853 Professor, Communication/Linguistics direct off: 818-677-2874 Speech Communication Department California State University, Northridge home: 818-366-3165 SPCH CSUN FAX: 818-677-2663 Northridge, CA 91330-8257 INTERNET email: ALAN.HARRISCSUN.EDU WWW homepage: http://www.csun.edu/~vcspc005 =============================================================== From: Humanist Discussion Group <humanistkcl.ac.uk>

    "Rav-Milim" (Multi-Words)

    A Computerized Infrastructure for Intelligent Processing of Modern Hebrew

    Principal Investigator: Yaacov Choueka

    (Highlights)

    "Rav-Milim" is a broad, comprehensive, robust and integrated computerized infrastructure for the intelligent processing of modern Hebrew, developed in the years 1989-1996 at the Center for Educational Technology in Tel-Aviv. Large teams of programmers. linguists, computational linguists, lexicographers and editors were involved in this project, which was initiated, directed and supervised by Prof. Y. Choueka from Bar-Ilan University. Yoni Ne'eman was in charge of the linguistic algorithms as well as chief programmer of the project. The names of some of the other major team members are given at the end. A few papers on the system and its various components are now under preparation.

    The basic modules of the system, from which scores of products and applications (both computerized and printed) have been derived, are as follows:

    - "Milim": A complete, accurate, comprehensive and portable morphological analyzer and lemmatizer for modern Hebrew (there is an estimated 70 million of word-forms in Hebrew). The program takes as input any word (string of characters) in Hebrew and outputs the set of all its (linguistically correct) grammatical analyses, including: root, dictionary entry, part-of-speech, gender-number for nouns and adjectives, mode-tense-person-gender-number for verbs, attached prepositions, attached pronouns (including person-gender-number of the pronoun), and more. Milim recognizes all common modes of Hebrew spelling (defective - "hasser" and plene - "male") and also some extra-linguistic units such as acronyms (abundant in Hebrew), abbreviations, and frequent proper nouns (of persons, places, products). The program, a library of subroutines in C, takes a few hundred K's, and can analyze about 1,000 words per second on a Pentium PC.

    - "Katvan" (spelling checker): Unlike English, an adequate spelling checker for Hebrew can not consist of long lists of words with some rudimentary suffix stripping, and has to be based on a morphological analyzer. Katvan is an accurate and comprehensive Hebrew spelling-checker based on Milim, that recognizes both the "defective" and "plene" spellings, and can correctly convert from one mode to the other (it also suggests corrections to flawed strings). Katvan was chosen by Microsoft and Word Perfect to be the standard spelling-checker for their Hebrew word-processors.

    - "Nakdan" (Vocalizer): A program that, given a word-form and its grammatical analysis, will output its (unique) vocalization (including long and short vowels, stresses, etc.) according to the rules of grammatical Hebrew vocalization. Given any word in Hebrew (without context), the program will activate "Milim" to get all its possible morphological analyses, and will attach to each of them the appropriate vocalization, thus producing as output the set of all (linguistically correct, context-free) possible vocalizations of that word.

    - "Nakdan-Text" (Text Vocalizer): Given a sentence in Hebrew, this program will vocalize it, by first activating "Nakdan" to find all possible morphological analyses and attached vocalizations of every word in the sentence, then choosing, for every such word, the "correct" context-dependent one, using short-context syntactical rules as well as some probablistic and statistical modules. The program works with a 95% accuracy, and is available, e.g., as an on-the-shelf add-on to Microsoft Hebrew Word. After installation, any Word document (or even book), can be vocalized by just marking it and clicking on the pertinent icon; the vocalization is done online, and the document can be printed with the diacritic vocalization points on any (Word-supported) printer. Proofreading and correcting the erroneous vocalizations are very easy and do not require a professional linguist (as is the case generally with manual vocalization). Nakdan-Text is an essential step for Text-to-Speech applications in Hebrew; without such vocalization, computerized "reading" is obviously impossible.

    - "Hamilon" (The Dictionary): A new dictionary of Hebrew, built a-priori on modern lexicographical principles and with an architecture that is easy to use and embed in computerized processing contexts. Radically different in philosophy and approach from the available classical dictionaries of Hebrew, the Rav-Milim dictionary is synchronic (rather than historical), descriptive (rather than normative, although bad usage is clearly tagged as such), comprehensive - covering all registers of the language (from the literary to the slang and vulgar) and all strata (from the biblical to the modern) - but not exhaustive (omitting historical curiosities, discarded inventions, etc) and user-oriented. Following the new sensitivity to meaning-in-context acquired by the extensive processing of large corpora, the full and rich spectrum of the different meanings of an entry is deployed, and usage examples for every (non-encyclopoedic) entry, carefully designed to highlight its appropriate sociolinguistic context, are given. For each entry, the family of its related terms (words with the same root and the same semantic field) is detailed. Special attention is given to collocations (a generic term used here loosely for compound nouns, verbal attachements, fixed phrases, idioms, etc, that deserve a special dictionary heading and explanation): every collocation appears under each of its pertinent entries, and some 8,000 new collocations (out of a total of 20,000), never recorded before, are explained. The printed version of the dictionary was published in April 1997 (by C.E.T., Steimatzky and Miskal) as a 6-volume set, and the computerized version appeared at about the same time, as part of "The Hebrew Language CD", described below.

    - "The Hebrew Language CD": All of the grammatical and lexicographic modules described above, and more, are integrated in this CD-ROM, which is in fact a complete "laboratory" of Hebrew processing (on the word level). Keying any word, the user can spell-check it or ask for its (correct) spelling in the different modes, see its vocalization(s) and its decomposition into meaningful components, look at its complete morphological analysis (or analyses), see the full family (in the sense defined above) of related terms, review all collocations that contain it (there may be hundreds of them) - and for each one that he marks, read its explanation - , ask for the full conjugation table of the corresponding base-form (in both vocalized and non-vocalized forms and spellings), ask for all entries that have the same vocalization pattern, and, of course, ask to see the full dictionary record of the appropriate entry. It should be noted here that looking for a word in a printed Hebrew dictionary can be a frustrating experience even for experienced users, since one has first to reduce the word, in the form encountered, to its base-form (or its root), a task that is not needed here. The user enters the word in any variant encountered, and the program will automatically display the pertinent entry (or, sometimes, entries). This feature also allows the user to mark any string in an explanation or a usage-example, and the appropriate entry and explanations will be displayed, ad infinitum.

    - "Young Rav-Milim - The Dictionary": A dictionary of modern Hebrew (2 vols, 1,000 pgs, same publishers as above) for the young (ages 7-16), with (1000, color) illustrations (the first of its kind ever in Hebrew). All of the dictionary contents (entries and subentries, collocations, explanations, usage examples, etc) reflect the young world of knowledge and associations. A unique feature of the dictionary is the thousands of annotations scattered in it, giving the reader a wealth of additional interesting information on morphological, grammatical, semantical, historical and cultural aspects of the entry. The page layout is reminiscent of a Talmudic page: a rectangular box of basic text, surrounded by related glossaries, commentaries and notes. The dictionary thus functions as an attractive book to read and browse into, in addition to its basic function as a reference book.

    - "Young Rav-Milim - The Multimedia CD-ROM": A multimedia version of the dictionary, that reflects the whole contents of the printed one, and, in addition, pre-taped pronounciation of the entries, typical sounds for appropriate entries (animals, musical instruments, special verbs, etc), linguistic and "dictionary" games, etc.

    Rav-Milim Team (major participants): - ----------------------------------

    Yaacov Choueka, PI and Director Yoni Ne'eman, Chief programmer and in charge of linguisitic algorithms

    Programmers: Avi Danon, Yosi Sarousi

    Linguistics: Rahel Finkel, Hagit Avioz

    The Dictionary:

    Steering Committee: Prof. Yacov Choueka, Prof. M.Z. Kaddari (Vice-President, Academy of Hebrew Language), Prof. R. Nir (Hebrew University), Prof. R. Mirkin (Academy of Hebrew Language), Prof. O.Schwarzwald (Bar-Ilan University), M. Zinger.

    Editor-in-Chief: Uzzi Freidkin Senior Editors: Dr Haym Cohen, Yael Zachi-Yannai Science and Technology Editor: Yakhin Unna Assistant Editors: Rahel Finkel, Hagit Avioz, Sara Choueka

    Dictionary for the Young:

    Steering Committee:

    Prof. R. Berman (Tel-Aviv University), Dr. Zvia Walden (Berl College), Prof R. Nir, Dr. Dorit Ravid, Prof. Maya Fruchtman, Prof. O. Schwarzwald

    Editor: Yael Zachi-Yannai Assistant Editors: Hagit Avioz, Sara Choueka Consultants: Uzzi Freidkin (lexicography), Dr. Haym Cohen (linguistics), Dr Zvia Walden (Educational approach and design).

    Multimedia version:

    Design and supervision: Ofra Razel

    - ---------------------------------------------------------------- Humanist Discussion Group Information at <http://www.kcl.ac.uk/humanities/cch/humanist/>; <http://www.princeton.edu/~mccarty/humanist/>;

    Message 2: NorFa Summer School: Languages, Minds and Brains

    Date: Thu, 20 Nov 1997 15:30:52
    From: Juhani Jarvikivi <Juhani.Jarvikivijoensuu.fi>
    Subject: NorFa Summer School: Languages, Minds and Brains


    First Circular November 1997

    The Department of Linguistics of the University of Joensuu and the Nordic Neurolinguistic Network are pleased to announce that a Nordic Research Course, sponsored by the Nordic Academy for Advanced Study (NorFA), called

    Languages, Minds, and Brains

    will be held at the Mekrijarvi Research Station, University of Joensuu, Ilomantsi, Finland, June 22-29, 1998.

    The Course will consist of the following three components. The components are planned to be joint sessions involving all the participants, students as well as teachers, of the Research Course. This policy is taken in order to maximize the multidisciplinary flow of ideas between the participants.

    (a) Four-hour survey lectures by internationally well-known experts

    Dr. Harald Baayen (Max Planck Institute for Psycholinguistics, Nijmegen): Morphological and Lexical Processes and Representations Prof. Kenneth Hugdahl (Biological and Medical Psychology, Bergen): Neuroimaging and the Brain Prof. Lise Menn (Linguistics, Boulder): Methodological Issues in the Case Study Approach Prof. Michel Paradis (Linguistics, McGill): Grammar, Pragmatics, and the Brain

    (b) Seminars with 30 minute individual presentations by the students and 30 minute post-paper discussions. The seminars will be attended by all the teachers.

    (c) Discussion sessions towards the end of a topic area highlighting on the methodological and theoretical issues shared by the papers presented.

    The criteria for student selection in addition to those defined by NorFA (in regards to country of origin, etc.):

    (a) The participants should have a strong background in one or several of the following disciplines or related areas: linguistics, psychology, neurology, cognitive science, phonetics, logopaedics and special education.

    (b) The topic of the Course (language, mind, and brain) should occupy a significant position in the PhD or post-doctorate studies or study plans of the participants.

    The number of student participants will be restricted to 25.

    Pre-course Requirements in Addition to the General NorFA Requirements:

    (a) The applicants should send, together with their application, a 3-5 page long abstract of their work (planned or ongoing) in the topic area(s) of the Course. The texts of the accepted students will eventually be mailed well in advance to the teachers as well as to the other student participants.

    (b) It is expected that the invited teachers or the organizers will require a set of pre-course readings. A list of the required pre-reading material will be sent to the participants well in advance.

    NorFA will pay for the tuition as well as for board and lodging during the course and for travel as follows. For students originating from Denmark, Iceland and Norway, NorFA will cover the (APEX-type) return flight tickets from the port of exit (e.g. Copenhagen) to Helsinki. The Swedish students will receive the boat fares between Stockholm and Helsinki/Turku from the organizers. Within Finland, only general-public surface travel (i.e., train, bus) tickets will be paid for by the organizers.

    Accommodation at the Research Station will be in double rooms.

    Our web site (Linguistics under http://www.joensuu.fi/fld) will contain e.g. the program. Please visit us there for more information or contact the responsible organizer directly.

    Application procedure: Send a free-form application to Jussi Niemi (below) by March 1, 1998. Please enclose a brief CV and a 3 to 5 page summary of your research interests.

    Those accepted will be notified by April 1.

    Responsible Organizer: Jussi Niemi, Associate Prof., Linguistics, University of Joensuu, FIN-80101 Joensuu, Finland, jussi.niemijoensuu.fi, fax +358-13-251 4211, phone +358-13-251 4306