Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info


New from Oxford University Press!

ad

Linguistic Diversity and Social Justice

By Ingrid Piller

Linguistic Diversity and Social Justice "prompts thinking about linguistic disadvantage as a form of structural disadvantage that needs to be recognized and taken seriously."


New from Cambridge University Press!

ad

Language Evolution: The Windows Approach

By Rudolf Botha

Language Evolution: The Windows Approach addresses the question: "How can we unravel the evolution of language, given that there is no direct evidence about it?"


The LINGUIST List is dedicated to providing information on language and language analysis, and to providing the discipline of linguistics with the infrastructure necessary to function in the digital world. LINGUIST is a free resource, run by linguistics students and faculty, and supported primarily by your donations. Please support LINGUIST List during the 2016 Fund Drive.

Summary Details


Query:   Corpora
Author:  Royle Phaedra
Submitter Email:  click here to access email
Linguistic LingField(s):   Text/Corpus Linguistics

Summary:   I RECENTLY MADE A QUERY ON LINGUIST LIST ABOUT CORPORA FOR
WORD LISTS WITH FREQUENCY COUNTS IN BULGARIAN, POLISH, GREEK, TURKISH
AND ENGLISH (EXCLUDING KUCERA AND FRANCIS). MANY PEOPLE RESPONDED WITH
HELPFUL COMMENTS, WHICH ARE SUMMARISED BELOW. UNFORTUNATELY, NOTHING
WAS FOUND ON GREEK. IF ANY ADDITIONS SEEM NECESSARY, PLEASE WRITE BACK
TO ME.

THANKS,

PHAEDRA
PHD STUDENT
UNIVERSITE DE MONTREAL
CENTRE DE RECHERCHE THEOPHILE ALAJOUANINE

ON ENGLISH:

GAN WEE KEONG

THE BRITISH NATIONAL CORPUS WORD FREQUENCY LISTS GENERATED BY ADAM
KILGARRIFF. AS THE VARIOUS LISTS ARE CATEGORISED IN CERTAIN MANNERS,
READ THE README FILE FIRST BEFORE DOWNLOADING.

TO GET THE LISTS, DO A FTP TO:

FTP.ITRI.BTON.AC.UK/PUB/BNC
- -----------------------------------------------------------------

RICHARD PIEPENBROCK

THE CELEX CD-ROM PRODUCED BY THE DUTCH CENTRE FOR LEXICAL
INFORMATION
IN COLLABORATION WITH THE LINGUISTIC DATA CONSORTIUM

THE SECOND RELEASE OF THE CD-ROM, WHICH CONTAINS THE CELEX LEXICAL
DATABASES OF ENGLISH (VERSION 2..5), DUTCH (VERSION 3.1) AND GERMAN
(VERSION 2.5), IS NOW AVAILABLE FOR RESEARCH PURPOSES FROM THE
LINGUISTIC DATA CONSORTIUM FOR $150. FOR EACH LANGUAGE, THE CD-ROM
CONTAINS DETAILED INFORMATION ON THE ORTHOGRAPHY (VARIATIONS IN
SPELLING, HYPHENATION), THE PHONOLOGY (PHONETIC TRANSCRIPTIONS,
VARIATIONS IN PRONUNCIATION, SYLLABLE STRUCTURE, PRIMARY STRESS), THE
MORPHOLOGY (DERIVATIONAL AND COMPOSITIONAL STRUCTURE, INFLECTIONAL
PARADIGMS), THE SYNTAX (WORD CLASS, WORD-CLASS SPECIFIC
SUBCATEGORISATIONS, ARGUMENT STRUCTURES), AND WORD FREQUENCY (SUMMED
WORD AND LEMMA COUNTS, BASED ON RECENT AND REPRESENTATIVE TEXT
CORPORA) OF BOTH WORDFORMS AND LEMMAS (ENGLISH: 52446 LEMMAS, 160594
WORDFORMS; GERMAN: 51728 LEMMAS, 365530 WORDFORMS; DUTCH: 124136
LEMMAS, 381292 WORDFORMS).

- ------------------------------------------------------------------
LLU=EDS PADR=F3

I HAVE FTP AVAILABLE AN ENGLISH FREQUENCY LIST EXTRACTED FROM 1.1
MILION WORDS
OF WSJ.

FTP ANONYMOUS TO FTP-LSI.UPC.ES
CD PUB/LLUISP
GET WSJ.FREQ
- ------------------------------------------------------------------

LL Issue: 8.363
Date Posted: 16-Mar-1997
Original Query: Read original query