LINGUIST List 13.1537

Tue May 28 2002

FYI: Scripts/Unicode, Summer School, Corpora Courses

Editor for this issue: Marie Klopfenstein <>


  1. dwanders, Update on Computer Support for Languages/Scripts
  2. Natasha, International Summer Language School in Siberia 'Mission of Peace'
  3. John Sinclair, June Courses at The Tuscan Word Centre

Message 1: Update on Computer Support for Languages/Scripts

Date: Thu, 23 May 2002 06:48:30 -0700 (PDT)
From: dwanders <dwanderssocrates.Berkeley.EDU>
Subject: Update on Computer Support for Languages/Scripts

Dear Linguistlist,
This is a report on the effort to cover all scripts of the world
(past and present) in the international character encoding standard for
readers of this list and a call for participation. It should
be noted that the LinguistList website is Unicode-compliant, as are the
texts included at the large Indo-European website TITUS, A similar letter has been sent to the
Linguistics Society of America.

As a liaison for the Department of Linguistics at Berkeley to the Unicode
Consortium, I would like to keep linguists informed of ongoing
developments as they relate to coverage of scripts in the character
encoding standard Unicode, why this is important for linguists, and how
linguists might be able to help. Unicode is the international standard,
now the default character set for XML and HTML 4.0. Because it is a
standard, it will preserve the integrity of linguistic data and allow easy
interchange for online teaching materials, texts, online corpora, etc.
Unicode is an open standard and is widely supported by the computer

While 52 scripts are currently included in Unicode (and its ISO sister,
10646), 96 remain outside the standard. These scripts include (a) scripts
used by minorities (i.e., Cham, New Tai Lu, and Vai) and (b) historic
scripts (i.e., Egyptian Hieroglyphics, Avestan, and Tangut).
Unfortunately, these two groups are not of primary interest to
corporations (and some governments), so the main push to cover the
remaining scripts must now come from the universities and professional
societies, and in particular, linguists. Scripts needing proposals (or
scholarly input) are found listed in red and green on "Roadmap to the BMP
(Plane)" and "Roadmap to the SMP (Plane 1)", accessible from

The standards process involves the creation of a proposal that lists the
characters of a particular script (if not already found in Unicode), a
glyph (visual representation) for each character, any comments on the
character's properties, a name, and an introduction to the script for
general users and font and software implementers. Both historic scripts
and those scripts used by minorities can entail significant research and
usually require contact with the user community (be it native speakers
and/or academic researchers).

In order to provide a centralized effort to cover the remaining scripts,
we have begun a Script Encoding Initiative at UC Berkeley so that script
proposals can be prepared and moved along the standards process. The SEI
is run in collaboration with the Unicode Vice President. We currently
seek linguists willing to work on such proposals and funds for those
experts needing support in order to proceed.

In order to support this effort, linguists could be of assistance by:

- endorsing the effort to cover the remaining scripts;

- publicizing the need for expertise from linguists to either prepare
Unicode proposals, review them, or write letters of support on proposals
to the two standardizing bodies (Unicode and ISO WG2);

- suggesting that those linguists with NSF, NEH, or other governmentally
or institutionally supported projects that include an electronic
representation of a language in its native script use Unicode for the
character encoding. For projects using Unicode but which run into
problems -- i.e., find missing characters or errors in the Unicode
Standard, etc. -- linguists should report them by visiting the on-line
error reporting page, linked from the contact page at: If a script is missing from
Unicode, it would be very useful to include a line-item in the grant
application requesting money for a Unicode encoding proposal. The Script
Encoding Initiative can provide guidance (i.e., suggest the approximate
cost for the proposal, what is needed for a successful proposal, etc.).
The SEI can also receive funding, help to locate an encoding expert, and
coordinate work on a proposal. (For those who sit on the review panels for
these organizations, recommending that such projects similarly use Unicode
or include a line-item for a Unicode proposal would also be extremely

For further information on the Script Encoding Initiative, please see: The Unicode website is:

Any support from linguists, their host institutions, or professional
societies would be greatly appreciated, and would serve to draw attention
to the importance of the project. I would be more than happy to answer any
questions or receive any comments.

With best regards,
Deborah Anderson

Researcher, Dept. of Linguistics
UC Berkeley
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: International Summer Language School in Siberia 'Mission of Peace'

Date: Thu, 23 May 2002 21:54:19 +0600
From: Natasha <>
Subject: International Summer Language School in Siberia 'Mission of Peace'

Dear Colleages,

I am writing to bring to your attention and with a request to forward to
your colleagues, teachers and students who might be interested, information
on the International Summer Language School "A MISSION OF
PEACE", which is an educational not-for-profit multi-cultural program that
our International Language School "Cosmopolitan" runs in Siberia, Russia,
during the summer of 2002, with participation of volunteer teachers and
international students from all over the world.

This is an annual program, so those who think that it's too late for them to
commit this summer, can participate next year.

The "MISSION OF PEACE" summer school will take place in four two-week
sessions during the summer of 2002 in a picturesque wooded area outside the
city of Novosibirsk, Siberia, Russia.

The DATES of the summer school sessions for the year 2002 are as follows:

First Session: June 18th until July 2nd, Second Session: July 3rd until July
17th, Third Session: July 18th until August 1st, Fourth Session: August 2nd
until August 16th.
There is no limit of number of sessions attended. Of course participation in
two and more sessions rather than just one gives a person more opportunities
to experience the culture, learn the language, interact with more
people, enjoy a larger excursion, cultural and socializing program.

For the past few years volunteer teachers from the United States, Great
Britain, Canada, Australia, France, Finland, Holland, Denmark, Argentina,
Singapore, Malaysia, and university students and school children from the
USA, Great Britain and Germany have participated in our program.

The International Summer Language School 2002 is themed "A Mission of Peace"
. The program is aimed at creating opportunities in a diverse multicultural
environment for the youngsters to improve their foreign language knowledge,
learn about the ways to live their lives as peacemakers and acquire
peacemaking skills, find creative expression and build relationships based
on honesty, understanding and respect.

English, French, German and Russian classes are scheduled within the
educational program of the Summer School with a variety of techniques
applied, such as conversations grammar in use, language games, discussions,
drama, etc.

The program is also a great opportunity to learn the Russian language and
get a first-hand experience of the Russian culture. The Russian course is
provided for all international participants and covers language studies as
well as Russian culture, history and society.

The program of the Summer School integrates language and peace education
into daily activities which include daily language classes, creativity
workshops, music and drama, games and contests, art and journaling, creative
writing and drawing, cultural fair and the Summer School Olympics,
interactive projects with the use of the Internet technologies. We plan to
complete the Peace Register and submit it to the United Nations. The program
is a great learning experience that encourages people to see with new eyes,
as they explore artistic ways of knowing and ways of learning about
peacemaking, expand their creative
potential, and prepare to deal with the complexities and challenges of life
in an interrelated world.

We seek participation from as wide a geographic distribution of cultures and
nations as possible, so that representatives of a broad range of nations
could have an opportunity to share their hopes, dreams and concerns, express
their thoughts and feelings, and make a lot of good friends to achieve a
greater international understanding and promote a safer, saner world.

We are looking for VOLUNTEER TEACHERS (TEACHERS of English, French,
German and other subjects levels elementary school through University,
SPECIALISTS in other fields such as computing, business, journalism, music,
arts, etc.,
MEDIA and TECHNOLOGY professionals, UNIVERSITY STUDENTS) who are energetic,
enthusiastic, enjoy summer camp experiences and working with teenagers,
possess love for children and the desire to foster a peaceful environment.

The new millennium needs bold, creative men and women who can turn their
dreams into reality. We encourage teachers of your institute to join the
international volunteer teachers' team of our program, and contribute to
filling young people's heads with positive images of peace, preparing our
children for
their leadership roles for the next century, promoting positive

We also seek people worldwide (middle school through university STUDENTS,
and ADULTS) to join this international forum as students, gain in-depth
understanding of
international issues and bring a new depth of appreciation and knowledge
home to their friends and communities.

- Roundtrip airfare to Novosibirsk.
- Obtaining the Russian visa and visa fees.
- Medical insurance.
- Participation fee, which covers accommodation, meals, local
transportation, excursions, etc.

One can also join the "PEACE THROUGH TOURISM" program of the Summer School
by ordering one of the offered excursion packages that include trips to
Moscow, St. Petersburg, Lake Baikal, the Altai Mountains, Novosibirsk,
Krasnoyarsk, TransSiberian Railroad.

We encourage you to contribute to investing in a vision of a better world by
joining people around the globe willing to support ongoing peace through
this worthwhile effort.
It's a wonderful way to show that you value peacemaking and inspire the
world to quest for peace. It is a powerful way of sharing to children that
we are all one world.

I will very much appreciate it if you get interested in the program and
could also assist in forwarding the information on the International Summer
school in Siberia to people who might also be interested.

I look forward to hearing from you and remain hopeful that we could
establish a worthwhile co-operation.


Natasha Bodrova,
Director of International Language School "Cosmopolitan",
Novosibirsk, Russia

Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 3: June Courses at The Tuscan Word Centre

Date: Tue, 28 May 2002 12:27:36 +0200
From: John Sinclair <>
Subject: June Courses at The Tuscan Word Centre

Courses at The Tuscan Word Centre 2002



Still a chance to come on one or both of these courses.
If you are interested, please contact John Sinclair at:

as soon as possible.

17th-20th June inclusive (arrive 16th, depart 21st)

TWC runs such a course every year; it is open to advanced students and 
researchers, and workers in the language industries, and it is relevant to 
all who are interested in the present state of corpus work and the 
potential for the future.

Topic Leaders:

Prof G�ran Kjellmer, University of Gothenburg
Mr Frank Mueller, University of Tuebingen
Dr Pernilla Daniellson, University of Birmingham
Prof Elena Tognini Bonelli, University of Lecce
Prof John Sinclair, The Tuscan Word Centre

Topics include:

Syntax studies with annotated and unannotated corpora
Syntactic class differences as mirrored in corpora.
Corpora in a historical perspective
Functionally-complete Units in a contextual theory of meaning
Corpus-based and corpus-driven linguistics
Shallow parsing:how it's done and what you can do with it
Linguistic data structures in language corpora
Software for corpus access and analysis
Annotation and corpus interrogation
Multilingual and parallel corpora
The lexical item
Lexicogrammar and Phraseology

For further information about the course, about TWC,
its superb location and facilities, refer to the website
or send an e-mail to

The overall cost of this course, including participation, accommodation, 
all meals and local transport is 1500 euros.
There is a strict limit on the number of places, so please apply early.

24th-27th June inclusive (arrive 23rd, depart 28th)
The Tuscan Word Centre

Intensive Course June 17th-20th 2002 inclusive

Corpora are already an important source of ideas, models, examples and 
descriptions for language teaching, and will gradually affect the work of 
all the teachers, and the managers, researchers and materials developers in 
the language teaching industry. Language teachers and their colleagues 
should be aware of the range of activities that are taking place all over 
the world, should be in command of the routines of access to corpora and 
the retrieval of useful results, and should be able to plan uses of corpora 
in their own practice and in their own institutions.
A very successful course on this theme was held at TWC in October 2001, and 
a book will shortly be published recording the main presentations that were 

Topic Leaders:

Prof. Sylviane Granger, University of Louvain
Dr. Susan Hunston, University of Birmingham
Dott.ssa Silvia Bernardini, University of Forl�
Dr. Pernilla Danielsson, University of Birmingham
Prof.ssa Elena Tognini Bonelli, University of Lecce
Prof. John Sinclair, The Tuscan Word Centre

Topics include:

Learner Corpora: design, analysis and applications
Annotation, part-of-speech tagging and error tagging
Pedagogical and NLP applications

The use of corpus evidence
Phraseology, pattern, and meaning
Corpus evidence applied to a text
Paradigm and syntagm in language
Technology serves pedagogic needs?

Corpora in the classroom:
- in the acquisition of new languages
- in the training of translators
data-driven learning
meaning-focused learning tasks

Corpus, text and discourse in LSP
Discourse of subject areas:
Monolingual economics
Bilingual - tourism.

Software for corpus access and analysis
Parallel corpora
Annotation, phrase building
Text-oriented programming

New priorities in theory
Facts and observation
The sheltered classroom and the big wide world

For further information about the course, about TWC,
its superb location and facilities, refer to the website
or send an e-mail to

The overall cost of this course, including participation, accommodation,
all meals and local transport is 1500 euros.
There is a strict limit on the number of places, so please apply early.

John Sinclair
The Tuscan Word Centre
Vellano 409
51010 Pescia (PT)

Telephone: +39 (0)572 409251
	Fax:	 409253
 Office:	 409900

web page <>
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue