LINGUIST List 7.843

Fri Jun 7 1996

FYI: LDC release, Corpus tools, Ph.D. thesis, Kittredge lecture

  1. "Antoine Ogonowski", New Release from the LDC
  2. Christian Ebert, Survey on Corpus Access Tools
  3. Torbjoern Lager, Announcement: Ph.D. Thesis: Comp. Corpus Linguistics
  4. Bruno Tersago, Lecture Richard Kittredge

Message 1: New Release from the LDC

Date: 05 Jun 1996 10:43:30 +0200
From: "Antoine Ogonowski" <>
Subject: New Release from the LDC

De: LDC Office le Ven 31 Mai 1996 6:57 pm
Objet: New Release from the LDC

 Announcing a NEW RELEASE from the

 Acoustic-Phonetic Continuous Speech Corpus
 Far Field Microphone Recordings


The FFMTIMIT corpus contains the previously-unreleased secondary
microphone waveforms for the TIMIT Acoustic-Phonetic Continuous Speech
corpus. The primary microphone waveforms, which were recorded using a
close-talking noise-cancelling head-mounted Sennheiser microphone
(model HMD-414), are available from the LDC on NIST Speech Disc 1-1.1
(LDC93S1). The secondary microphone used in the recording of the
TIMIT corpus was a Breul & Kjaer 1/2" free-field microphone (model

While the Sennheiser microphone recordings are relatively "clean" with
respect to non-speech noise, the FFMTIMIT recordings includes
significant low frequency noise, which was due to the HVAC system and
mechanical vibration transmitted through the floor of the
double-walled sound booth used in recording. Because it is noiser
than its TIMIT counterpart, the data of FFMTIMIT may be used in the
development of more noise-robust speech recognition systems. In
addition, this data may be of value to researchers involved in vocal
tract modeling because the B&K microphone has extremely flat
free-field frequency response and calibration tones are provided.

Note that the B&K TIMIT data contained with this release has not been
processed through any highpass filter, (e.g., the 1581-point filter
described in the paper "The DARPA Speech Recognition Research
Database" by Fisher, Doddington and Goudie-Marshall in "DARPA TIMIT
Acoustic-Phonetic Continuous Speech Corpus CD-ROM," NISTIR 4930 / NTIS
Order No. PB93- 173938.)

Institutions that have membership in the LDC during the 1996
Membership Year will be able to receive FFMTIMIT at no additional
charge, in the same manner as all other text and speech corpora
published by the LDC.

Nonmembers can receive a copy of FFMTIMIT for research purposes only
for a fee of $100. If you would like to order a copy of this corpus,
please email your request to If you need
additional information before placing your order, or would like to
inquire about membership in the LDC, please send email or call (215)

Further information about the LDC and its available corpora can be
accessed on the Linguistic Data Consortium WWW Home Page at URL Information is also available via ftp
at under pub/ldc; for ftp access, please use
"anonymous" as your login name, and give your email address when asked
for password.
Message 2: Survey on Corpus Access Tools

Date: Mon, 03 Jun 1996 18:48:26 +0200
From: Christian Ebert <>
Subject: Survey on Corpus Access Tools

As part of a 2 semester software project at the Department of
Computational Linguistics, University of Heidelberg, Germany, we
intend to design and implement a general accessing tool for large text
corpora. In order to investigate the user's needs and wishes
concerning such a tool, we provide the following questionnaire, It is
addressed to anyone doing linguistic work or research using text
corpora. Maybe your future work will benefit from our
development. Therefore, we kindly please you to help us in the design
of such an accessing tool by filling out our questionnaire. Feel free
to make any annotations you regard as useful or important to the
subject (including the questionnaire itself).

Our questionnaire is located at

If you have any further questions, don't hesitate to send us a mail:

Thank you in advance for your cooperation!

 Department of Computational Linguistics
 University of Heidelberg, Germany
 Karlstr. 2
 69125 Heidelberg
Message 3: Announcement: Ph.D. Thesis: Comp. Corpus Linguistics

Date: Wed, 05 Jun 1996 18:16:06 +0200
From: Torbjoern Lager <>
Subject: Announcement: Ph.D. Thesis: Comp. Corpus Linguistics

KEY WORDS: Corpus linguistics, Corpus tools, Grammar, Grammar

#### #### Ph.D. Thesis Announcement
#### ####
#### ####
#### #### Torbj=F6rn Lager

This is to announce the availability of my Ph.D. thesis: "A Logical
Approach to Computational Corpus Linguistics". I have prepared a WWW
page dedicated to the approach described in the thesis, from which
machine readable versions of the thesis may be downloaded, and hard
copies ordered. The relevant URL is:

You may also send mail directly to me:


The purpose of this thesis is to build a *corpus theory development
environment* -- to discuss its design, use, and implementation. The
proposed system is based on a logical approach to computational corpus
linguistics where sentences of logic are used to express statements
about texts and logical inference is used to manipulate these
sentences in order to analyse the texts.
 The thesis demonstrates the remarkable ease with which the
functionalities needed in a corpus system can be implemented when
based upon adequate means of representing, querying, and
reasoning. The proposed system implements hand coding, searching,
concordancing, parsing, counting, tabling, collocating, automatic
part-of-speech tagging, lemmatizing, excerpting, interpreting,
treebanking, explanation, and various kinds of learning.
 By linking all this functionality into a common representational
framework characterised by high expressive power, declarativity, and
explicit reasoning strategies, and by embedding the whole concept in a
particular philosophical and methodological context, including an
ontology of text, an analysis of the notion of theory, an explication
of the notion of truth, and other foundational issues, we arrive at an
interactive system which is multi-functional and general, yet simple,
consistent, and highly usable.
 Apart from being interesting from a practical point of view, the
development of such a system raises intriguing philosophical and
methodological questions: What is a corpus text? What is a corpus
theory? What does it mean to develop a corpus theory? What does it
mean for a corpus theory to be true about a corpus text? What is the
link between the truth of such a theory and its usefulness for natural
language processing purposes? These and related questions are discussed
in the thesis.
 The system exists in a prototype implementation and the thesis
contains numerous examples from this implementation in action.

Torbjoern Lager E-mail:
Department of Linguistics Phone: +46 31 7731175
University of Gothenburg Fax: +46 31 7734853
412 98 Gothenburg
Message 4: Lecture Richard Kittredge

Date: Fri, 07 Jun 1996 10:29:31 +0200
From: Bruno Tersago <>
Subject: Lecture Richard Kittredge
