LINGUIST List 8.1670

Sat Nov 22 1997

FYI: SEMCOM: Rav-Milim Project, NorFa Summer School

Editor for this issue: Martin Jacobsen <martylinguistlist.org>


Directory

  1. alan harris, SEMCOM: Rav-Milim Project (fwd)
  2. Juhani Jarvikivi, NorFa Summer School: Languages, Minds and Brains

Message 1: SEMCOM: Rav-Milim Project (fwd)

Date: Fri, 14 Nov 1997 10:04:20 -0800 (PST)
From: alan harris <vcspc005email.csun.edu>
Subject: SEMCOM: Rav-Milim Project (fwd)


 SEMCOM 

Online bulletin of the Commission on Semiotics and Communication,
National Communication Association// [If you would like to be included
in the SEMCOM list, please reply or send a note to
alan.harriscsun.edu with the command, "add SEMCOM", in the body.

==============================================================
Alan C. Harris, Ph. D. TELNOS: main off: 818-677-2853
Professor, Communication/Linguistics direct off: 818-677-2874
Speech Communication Department
California State University, Northridge home: 818-366-3165
SPCH CSUN FAX: 818-677-2663 
Northridge, CA 91330-8257 INTERNET email: ALAN.HARRISCSUN.EDU 
 WWW homepage: http://www.csun.edu/~vcspc005
===============================================================
From: Humanist Discussion Group <humanistkcl.ac.uk>

		"Rav-Milim" (Multi-Words)

	 A Computerized Infrastructure for 
	Intelligent Processing of Modern Hebrew

	Principal Investigator: Yaacov Choueka

		 (Highlights)

"Rav-Milim" is a broad, comprehensive, robust and integrated
computerized infrastructure for the intelligent processing of modern
Hebrew, developed in the years 1989-1996 at the Center for Educational
Technology in Tel-Aviv. Large teams of programmers. linguists,
computational linguists, lexicographers and editors were involved in
this project, which was initiated, directed and supervised by
Prof. Y. Choueka from Bar-Ilan University. Yoni Ne'eman was in charge
of the linguistic algorithms as well as chief programmer of the
project. The names of some of the other major team members are given
at the end. A few papers on the system and its various components are
now under preparation.

The basic modules of the system, from which scores of products and
applications (both computerized and printed) have been derived, are as
follows:

 - "Milim": A complete, accurate, comprehensive and portable
 morphological analyzer and lemmatizer for modern Hebrew (there is an 
 estimated 70 million of word-forms in Hebrew). The program takes as 
 input any word (string of characters) in Hebrew and outputs the 
 set of all its (linguistically correct) grammatical analyses, 
 including: root, dictionary entry, part-of-speech, gender-number for
 nouns and adjectives, mode-tense-person-gender-number for verbs,
 attached prepositions, attached pronouns (including
 person-gender-number of the pronoun), and more. 
 Milim recognizes all common modes of Hebrew spelling 
 (defective - "hasser" and plene - "male") and also 
 some extra-linguistic units such as acronyms (abundant in Hebrew), 
 abbreviations, and frequent proper nouns (of persons, places, products).
 The program, a library of subroutines in C, takes a few hundred K's,
 and can analyze about 1,000 words per second on a Pentium PC. 

 - "Katvan" (spelling checker): Unlike English, an adequate
 spelling checker for Hebrew can not consist of long lists of
 words with some rudimentary suffix stripping, and has to be based on a
 morphological analyzer. Katvan is an accurate and
 comprehensive Hebrew spelling-checker based on Milim, that recognizes 
 both the "defective" and "plene" spellings, and can correctly convert
 from one mode to the other (it also suggests corrections to flawed
 strings). Katvan was chosen by Microsoft and Word Perfect to be 
 the standard spelling-checker for their Hebrew word-processors. 

 - "Nakdan" (Vocalizer): A program that, given a word-form and its 
 grammatical analysis, will output its (unique) vocalization 
 (including long and short vowels, stresses, etc.) according to
 the rules of grammatical Hebrew vocalization. Given any word in Hebrew 
 (without context), the program will activate "Milim" to get all its 
 possible morphological analyses, and will attach to each of
 them the appropriate vocalization, thus producing as output the set of
 all (linguistically correct, context-free) possible vocalizations of 
 that word.


 - "Nakdan-Text" (Text Vocalizer): Given a sentence in Hebrew, 
 this program will vocalize it, by first activating "Nakdan" to find 
 all possible morphological analyses and attached vocalizations of
 every word in the sentence, then choosing, for every such word, 
 the "correct" context-dependent one, using short-context syntactical 
 rules as well as some probablistic and statistical modules. 
 The program works with a 95% accuracy, and is available, e.g., as an 
 on-the-shelf add-on to Microsoft Hebrew Word. 
 After installation, any Word document (or even book), can be 
 vocalized by just marking it and clicking on the pertinent icon;
 the vocalization is done online, and the document can be printed
 with the diacritic vocalization points on any (Word-supported)
 printer. Proofreading and correcting the erroneous vocalizations
 are very easy and do not require a professional linguist (as is the
 case generally with manual vocalization).
 Nakdan-Text is an essential step for Text-to-Speech applications in
 Hebrew; without such vocalization, computerized "reading" is
 obviously impossible.

 - "Hamilon" (The Dictionary): A new dictionary of Hebrew,
 built a-priori on modern lexicographical principles and with an 
 architecture that is easy to use and embed in computerized processing 
 contexts. Radically different in philosophy and approach from the
 available classical dictionaries of Hebrew, the Rav-Milim dictionary is
 synchronic (rather than historical), descriptive (rather than
 normative, although bad usage is clearly tagged as such), 
 comprehensive - covering all registers of the language (from the 
 literary to the slang and vulgar) and all strata (from the biblical 
 to the modern) - but not exhaustive (omitting historical curiosities,
 discarded inventions, etc) and user-oriented. Following the new
 sensitivity to meaning-in-context acquired by the extensive
 processing of large corpora, the full and rich spectrum of the 
 different meanings of an entry is deployed, and usage examples
 for every (non-encyclopoedic) entry, carefully designed to highlight
 its appropriate sociolinguistic context, are given. For each entry,
 the family of its related terms (words with the same root and the
 same semantic field) is detailed. Special attention is given to 
 collocations (a generic term used here loosely for compound nouns, 
 verbal attachements, fixed phrases, idioms, etc, that deserve a
 special dictionary heading and explanation): every collocation
 appears under each of its pertinent entries, and some 8,000
 new collocations (out of a total of 20,000), never recorded before, 
 are explained.
 The printed version of the dictionary was published in April 1997 
 (by C.E.T., Steimatzky and Miskal) as a 6-volume set, and the
 computerized version appeared at about the same time, as part of 
 "The Hebrew Language CD", described below.

- "The Hebrew Language CD": All of the grammatical and lexicographic
modules described above, and more, are integrated in this CD-ROM,
which is in fact a complete "laboratory" of Hebrew processing (on the
word level). Keying any word, the user can spell-check it or ask for
its (correct) spelling in the different modes, see its vocalization(s)
and its decomposition into meaningful components, look at its complete
morphological analysis (or analyses), see the full family (in the
sense defined above) of related terms, review all collocations that
contain it (there may be hundreds of them) - and for each one that he
marks, read its explanation - , ask for the full conjugation table of
the corresponding base-form (in both vocalized and non-vocalized forms
and spellings), ask for all entries that have the same vocalization
pattern, and, of course, ask to see the full dictionary record of the
appropriate entry. It should be noted here that looking for a word in
a printed Hebrew dictionary can be a frustrating experience even for
experienced users, since one has first to reduce the word, in the form
encountered, to its base-form (or its root), a task that is not needed
here. The user enters the word in any variant encountered, and the
program will automatically display the pertinent entry (or, sometimes,
entries). This feature also allows the user to mark any string in an
explanation or a usage-example, and the appropriate entry and
explanations will be displayed, ad infinitum.

- "Young Rav-Milim - The Dictionary": A dictionary of modern Hebrew (2
vols, 1,000 pgs, same publishers as above) for the young (ages 7-16),
with (1000, color) illustrations (the first of its kind ever in
Hebrew). All of the dictionary contents (entries and subentries,
collocations, explanations, usage examples, etc) reflect the young
world of knowledge and associations. A unique feature of the
dictionary is the thousands of annotations scattered in it, giving the
reader a wealth of additional interesting information on
morphological, grammatical, semantical, historical and cultural
aspects of the entry. The page layout is reminiscent of a Talmudic
page: a rectangular box of basic text, surrounded by related
glossaries, commentaries and notes. The dictionary thus functions as
an attractive book to read and browse into, in addition to its basic
function as a reference book.

- "Young Rav-Milim - The Multimedia CD-ROM": A multimedia version of
the dictionary, that reflects the whole contents of the printed one,
and, in addition, pre-taped pronounciation of the entries, typical
sounds for appropriate entries (animals, musical instruments, special
verbs, etc), linguistic and "dictionary" games, etc.


Rav-Milim Team (major participants): 
- ----------------------------------

Yaacov Choueka, PI and Director
Yoni Ne'eman, Chief programmer and in charge of linguisitic
 algorithms

Programmers: Avi Danon, Yosi Sarousi

Linguistics: Rahel Finkel, Hagit Avioz

The Dictionary:

Steering Committee:
Prof. Yacov Choueka, Prof. M.Z. Kaddari (Vice-President, Academy of
Hebrew Language), Prof. R. Nir (Hebrew University), Prof. R. Mirkin
(Academy of Hebrew Language), Prof. O.Schwarzwald (Bar-Ilan
University), M. Zinger.

Editor-in-Chief: Uzzi Freidkin
Senior Editors: Dr Haym Cohen, Yael Zachi-Yannai
Science and Technology Editor: Yakhin Unna
Assistant Editors: Rahel Finkel, Hagit Avioz, Sara Choueka

Dictionary for the Young:

Steering Committee:

Prof. R. Berman (Tel-Aviv University), Dr. Zvia Walden (Berl College),
Prof R. Nir, Dr. Dorit Ravid, Prof. Maya Fruchtman,
Prof. O. Schwarzwald

Editor: Yael Zachi-Yannai
Assistant Editors: Hagit Avioz, Sara Choueka
Consultants: Uzzi Freidkin (lexicography), Dr. Haym Cohen
(linguistics), Dr Zvia Walden (Educational approach and design).

Multimedia version:

Design and supervision: Ofra Razel

- ----------------------------------------------------------------
 Humanist Discussion Group 
Information at <http://www.kcl.ac.uk/humanities/cch/humanist/>;
 <http://www.princeton.edu/~mccarty/humanist/>;
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: NorFa Summer School: Languages, Minds and Brains

Date: Thu, 20 Nov 1997 15:30:52
From: Juhani Jarvikivi <Juhani.Jarvikivijoensuu.fi>
Subject: NorFa Summer School: Languages, Minds and Brains

 	First Circular
	November 1997

The Department of Linguistics of the University of Joensuu and the
Nordic Neurolinguistic Network are pleased to announce that a Nordic
Research Course, sponsored by the Nordic Academy for Advanced Study
(NorFA), called

Languages, Minds, and Brains

will be held at the Mekrijarvi Research Station, University of
Joensuu, Ilomantsi, Finland, June 22-29, 1998.

The Course will consist of the following three components. The
components are planned to be joint sessions involving all the
participants, students as well as teachers, of the Research
Course. This policy is taken in order to maximize the
multidisciplinary flow of ideas between the participants.


(a) Four-hour survey lectures by internationally well-known experts 

	Dr. Harald Baayen (Max Planck Institute for Psycholinguistics,
Nijmegen): Morphological and Lexical Processes and Representations
	Prof. Kenneth Hugdahl (Biological and Medical Psychology,
Bergen): Neuroimaging and the Brain
	Prof. Lise Menn (Linguistics, Boulder): Methodological Issues
in the Case Study Approach
	Prof. Michel Paradis (Linguistics, McGill): Grammar,
Pragmatics, and the Brain

(b) Seminars with 30 minute individual presentations by the students
and 30 minute post-paper discussions. The seminars will be attended by
all the teachers.

(c) Discussion sessions towards the end of a topic area highlighting
on the methodological and theoretical issues shared by the papers
presented.

The criteria for student selection in addition to those defined by
NorFA (in regards to country of origin, etc.):

(a) The participants should have a strong background in one or several
of the following disciplines or related areas: linguistics,
psychology, neurology, cognitive science, phonetics, logopaedics and
special education.

(b) The topic of the Course (language, mind, and brain) should occupy
a significant position in the PhD or post-doctorate studies or study
plans of the participants.

The number of student participants will be restricted to 25.


Pre-course Requirements in Addition to the General NorFA Requirements:

	(a) The applicants should send, together with their
application, a 3-5 page long abstract of their work (planned or
ongoing) in the topic area(s) of the Course. The texts of the accepted
students will eventually be mailed well in advance to the teachers as
well as to the other student participants.

	(b) It is expected that the invited teachers or the organizers
will require a set of pre-course readings. A list of the required
pre-reading material will be sent to the participants well in advance.


NorFA will pay for the tuition as well as for board and lodging during
the course and for travel as follows. For students originating from
Denmark, Iceland and Norway, NorFA will cover the (APEX-type) return
flight tickets from the port of exit (e.g. Copenhagen) to
Helsinki. The Swedish students will receive the boat fares between
Stockholm and Helsinki/Turku from the organizers. Within Finland, only
general-public surface travel (i.e., train, bus) tickets will be paid
for by the organizers.

Accommodation at the Research Station will be in double rooms.

Our web site (Linguistics under http://www.joensuu.fi/fld) will
contain e.g. the program. Please visit us there for more information
or contact the responsible organizer directly.

Application procedure: Send a free-form application to Jussi Niemi
(below) by March 1, 1998. Please enclose a brief CV and a 3 to 5 page
summary of your research interests.

Those accepted will be notified by April 1.

	Responsible Organizer: Jussi Niemi, Associate Prof.,
Linguistics, University of Joensuu, FIN-80101 Joensuu, Finland,
jussi.niemijoensuu.fi, fax +358-13-251 4211, phone +358-13-251 4306

Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue