Summary Details
| Query: |
Spanish Frequency Counts
|
|
| Author: | Harriet Bowden | |
| Submitter Email: | click here to access email | |
| Linguistic LingField(s): |
Applied Linguistics
Computational Linguistics Language Documentation Text/Corpus Linguistics |
|
| Summary: |
Regarding query http://www.linguistlist.org/issues/15/15-3168.html#1
We posted a query in our search for a frequency dictionary/list for Spanish with the following preferred criteria: (a) on-line or searchable database, (b) counts that distinguish different parts of speech, (c) lemma (root) counts as well as surface frequency counts and (d) based on a large corpus with a variety of international sources, not just from one region. Harriet Wood Bowden Michael Ullman Georgetown University **** First, we?d like to thank everyone who answered our query, posted on CHILDES and LinguistList (please forgive any omissions): Donna Jackson-Maldonado Adelina Est?vez Padraic Monaghan Maria R. Brea-Spahn, M.S., CCC-SLP Miquel Serra i Raventos Ana Codesido David Eddington Carolina Iribarren Adam Albright Sarah Callahan Second, we provide the link to the summary of responses to a similar question, asked in December 2003: http://listserv.linguistlist.org/cgi-bin/wa?A2=ind0312C&L=linguist&P=R13243 Finally, here is a summary of the responses we received. 1. The LEXESP corpus* from the University of Barcelona adheres to some of the listed requirements. It contains approximately 120,000 words. Syllable frequency is available in that software program (CDROM), which if you are interested, you must order directly from Barcelona. For LEXESP you can go to: http://clic.fil.ub.es/ (once here, go to 'demos' > corpus textuales > consulta corpus). The complete reference: Sebasti?n, N., Mart?, M. A., Carreiras, M., & Cuetos, F. (2000). LEXESP: Una base de datos informatizada del espa?ol. Barcelona: Servicio de Publicaciones de la Universitat de Barcelona. http://www.elda.org/catalogue/en/text/L0042.html 2. The Alameda and Cuetos corpus*. This program you can acquire by e-mailing Dr. Alameda. For Cuetos & Alameda you can go to: http://www.psico.uniovi.es/REMA/content.html (just to read something about it. The CD costs ~ 28 eur) *Both of these databases involve Castillian Spanish. In using the databases, you must also be aware that lemmas and their derived versions are included on the same list. Therefore, if your intent is to use these databases to compute the probabilities of sub-syllabic components, you must clean the database out, otherwise your calculations will be inflated. 3. The Diccionario del Espa?ol de M?xico, directed by Luis Fernando Lara at El Colegio de M?xico has such a data base. You could find him through their web page: http://mezcal.colmex.mx/dem/ 4. L0042 : PAROLE Spanish lexicon: The PAROLE Spanish lexicon follows standard PAROLE architecture which includes morphological and syntactic layers. It includes the most frequent words found in a 1 million word corpus, coded according to the PAROLE specifications. The lexicon contains about 22,000 morphological units, of which 12,209 are common nouns, 3,367 verbs, 4,996 adjectives. Closed classed categories are fully covered. The information associated with each morphological unit concerns part-of-speech and subtype, inflection paradigm (with morphosyntactic information for the endings organised in about 132 models), possible stems in relation with the relevant endings, linking with syntactic layer. In the syntactic layer, information regarding subcategorisation for verbs and insertion context for nouns is encoded following the PAROLE model. http://www.elda.org/catalogue/en/text/L0042.html 5. The LDC website might be useful, but I haven't found anything matching your needs there from a quick browse: http://www.ldc.upenn.edu/ 6. A new frequency dictionary of Spanish is in press at Routledge. The author is Mark Davies (Mark_davies@byu.edu). It is lemmatized and tagged. This appears to be the link: www.corpusdelespanol.org 7. An old, but good reference is the A. Juilland and E. Chang-Rodr?guez (1964) Frequency Dictionary of Spanish Words. It has index of frequency and of use for tokens and types. It lists also the most common form of the words studied. It is old and not in electronic form, but it might be useful. |
|
| LL Issue: | 15.3326 | |
| Date Posted: | 29-Nov-2004 | |
| Original Query: | Read original query | |
|
Back |
||
|
|
||
|
Sums main page
|
||


