LINGUIST List 10.1572

Wed Oct 20 1999

Qs: History of Corpora, Terminology/Y2K & Beyond

Editor for this issue: Karen Milligan <>

We'd like to remind readers that the responses to queries are usually best posted to the individual asking the question. That individual is then strongly encouraged to post a summary to the list. This policy was instituted to help control the huge volume of mail on LINGUIST; so we would appreciate your cooperating with it whenever it seems appropriate.


  1. Smiley, History of Corpora
  2. James Giangola, Terminology for Y2K and beyond

Message 1: History of Corpora

Date: Tue, 19 Oct 1999 21:57:8 +0800
From: Smiley <>
Subject: History of Corpora

Dear all,

Does anyone have or know of sources for information on the history of
corpora either for dictionary-making or for lingustic pursuit?


Gao Yongwei
Fudan University,
Shanghai, China
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: Terminology for Y2K and beyond

Date: Tue, 19 Oct 1999 11:21:24 -0700
From: James Giangola <jamesgNuance.COM>
Subject: Terminology for Y2K and beyond

Has anyone out there done any research on how people will say (more exactly,
will expect to hear) years such as 2001, 2005, 2010, 2015, 2020, 2037, etc.?

I am working on a speech recognition/synthesis application that needs to
"speak back" to the user certain years beyond 2000. Here are the top

(1) 	"two thousand" not followed by "and", e.g. two thousand five, two
thousand thirty-seven
(2)	"two thousand and...", e.g. two thousand AND five, two thousand AND
(3)	"twenty...", e.g. twenty oh five, twenty thirty-seven

To my ear, it seems that the further into the future the year is, the better
way (3) sounds, e.g. "twenty thirty-seven", instead of "two thousand (and)

What about 2001? Will people want to say this date as in the movie title?
Should it be "two thousand one" or "two thousand AND one"? Although I'm a
native speaker of English, I can't make up my mind, and folks here at work
don't agree on this issue.

My own hunch is that people will resort to the shortest way possible, way
(3), but this isn't based on any serious study.

If anyone has done any sort of survey on this topic, your help would be much


James Giangola
Software Engineer, Dialog Research & Design
Nuance Communications
1380 Willow Rd.
Menlo Park, CA 94025
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue