LINGUIST List 12.1373

Fri May 18 2001

FYI: ELRA News, Syntactic Database/Spanish

Editor for this issue: Lydia Grebenyova <lydialinguistlist.org>


Directory

  1. Magali Duclaux, European Language Resources Association (ELRA) News
  2. Paula Santalla del Rio, Syntactic Database of Current Spanish (SDB, release 3.5.1)

Message 1: European Language Resources Association (ELRA) News

Date: Mon, 14 May 2001 10:23:22 +0200
From: Magali Duclaux <duclauxelda.fr>
Subject: European Language Resources Association (ELRA) News

***************************************************************************
ELRA
European Language Resources Association
ELRA News
****************************************************************************
We are happy to announce a new resource available via ELRA:

ELRA S0106 Dutch SpeechDat(II) MDB-250

A description of this database is given below.

The Dutch SpeechDat(II) MDB-250 comprises 250 Dutch speakers (125 males, 
125 females) recorded over the Dutch mobile telephone network. The 
recordings were made at SPEX, the Netherlands, and the recording 
application was developed and run with Show 'N Tel. This database is 
partitioned into 5 CDs The speech databases made within the SpeechDat(II) 
project were validated by SPEX to assess their compliance with the 
SpeechDat format and content specifications.
Speech samples are stored as sequences of 8-bit 8 kHz A-law. Each prompted 
utterance is stored in a separate file. Each signal file is accompanied by 
an ASCII SAM label file which contains the relevant descriptive information.

The following items were recorded:

8 application words (2 optional); 2 isolated digits; 1 sequence of 10 
isolated digits; 3 connected digits: 1 telephone number (1-10 digits), 1 
credit card number (1-16 digits), 1 digit PIN code (6 digits); 3 dates: 1 
spontaneous date, 1 date, 1 relative date expression;
1 embedded application word; 3 spelled words: 1 forename (spontaneous), 1 
city name, 1 word; 1 currency money amount; 1 natural number; 6 directory 
assistance names: 1 forename (spontaneous), 1 city of birth, 1 most 
frequent city, 1 city name, 1 company name, 1 forename surname; 2 yes/no 
questions: 1 predominantly "yes" question, 1 predominantly "no" question; 9 
phonetically rich sentences; 2 time phrases: 1 time of day (spontaneous), 1 
time phrase; 4 phonetically rich words.

The following age distribution has been obtained: 5 speakers are under 16, 
90 are between 16 and 30,
89 between 31 and 45, 56 between 46 and 60, and 10 are over 60. The lexicon 
was created following
the guidelines in SD1.3.1 v4.3.

=====================================
For further information, please contact:
ELRA/ELDA Tel +33 01 43 13 33 33
55-57 rue Brillat-Savarin Fax +33 01 43 13 33 30
F-75013 Paris, France E-mail mapellielda.fr
or visit the online catalogue on our Web site:
http://www.icp.grenet.fr/ELRA/home.html
or http://www.elda.fr
=====================================
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: Syntactic Database of Current Spanish (SDB, release 3.5.1)

Date: Tue, 15 May 2001 09:21:22 +0200 (MET DST)
From: Paula Santalla del Rio <fempsrusc.es>
Subject: Syntactic Database of Current Spanish (SDB, release 3.5.1)

_________________________________________________________

Syntactic Database of Current Spanish (SDB, release 3.5.1)
__________________________________________________________

The research group on Spanish Syntax of the University of
Santiago de Compostela ( http://www.sintx.usc.es ) makes
available for the interested researchers the Syntactic
Spanish Database (SDB), the result of the work carried out
by the group throughout the last ten years. The data can be
looked up in

	http://www.bds.usc.es/busquedas.html

Developed with the financial support of the Direcci�n Xeral
de Educaci�n y Ordenaci�n Universitaria of the Xunta de
Galicia and of the Direcci�n General de Investigaci�n
Cient�fica y T�cnica of the Ministerio de Educaci�n, SDB is
the result of the manual analysis, having in mind
constitutional and functional analysis principles, of the
syntactic characteristics of the almost 160.000 clauses
contained in the contemporary part of the Hispanic Texts
Archive of the University of Santiago, constituted by
approximately million and a half words of texts taken from
all the Hispanic countries and including oral samples as
well as novels, press and theater, all of them published
between 1980 and 1990.

In SDB the primary unit of description has been the clause
and the analysis has been encoded so as to make obvious the
organization of the syntactic functions around the verb form
functioning as the predicate of the clause. For every clause
in the database, the syntactic characteristics considered
relevant in this first stage of development have been
included. In first place, we have recorded general
information about the clauses: clause type, clause function,
voice, modality, verb inflection, syntactic functions found
in the clause and order of them. Next, we have recorded
detailed information about each of the syntactic functions
found in the clause: type of structural unit, determination,
animation, countability, preposition introducing the
syntactic function, etc. Current search possibilities (as
well as other not yet fully available but already on
schedule) are conceived from an internal point of view, that
is, they are devised so as to show the internal structure of
clauses organized around verb forms in the corpus: syntactic
schemes and subschemes of each verb documented in it, verbs
documented with one scheme or subscheme, verbs requiring one
preposition, etc.

These are, however, a web page and a search interface still
under development. For this reason, on the one hand, we
apologize for the fact that certain search options on the
menu are not yet ready, and on the other, we will be very
grateful for any suggestion about search possibilities that
you miss in the system and to which we have not given
priority in this first phase of distribution of results.

Currently, with the financial support of the Secretar�a
Xeral de Investigaci�n e Desenvolvemento of the Xunta de
Galicia, SDB has started a second phase of development in
which we will analyse in depth the syntactic and semantic
characteristics of the approximately 160.000 clauses
constituting the corpus under study.

- ----------------------------------------------------------------------
- ----------------------------------------------------------------------
- ----------------------------------------------------------------------
 Mar�a Paula Santalla del R�o

 Dpto. de Lingua Espa�ola
 Facultade de Filolox�a, Universidade de Santiago de Compostela
 Avda. Burgo das Naci�ns, s/n,
 Santiago de Compostela 15782

 Tfno: (+34) 981 575340/563100, ext. 11908
 Fax: (+34) 981 574646
- ---------------------------------------------------------------------
- ---------------------------------------------------------------------
- ---------------------------------------------------------------------






Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue