LINGUIST List 2.456

Tue 03 Sep 1991

Sum: French Word Frequencies

Editor for this issue: <>


Directory

  1. Dominique Estival, French Word Frequencies

Message 1: French Word Frequencies

Date: Fri, 30 Aug 1991 14:58:01 +0200
From: Dominique Estival <estival%divsun.unige.chRICEVM1.RICE.EDU>
Subject: French Word Frequencies
Following a request from Dan Kahn <dkthumper.bellcore.com>, I had posted
some time ago on the Langage Naturel list an inquiry about word frequency
data in French.
A number of people replied to give references and pointers to databases,
and I hope I have thanked them all individually. I passed the information
they sent me on to Dan, but as it could prove useful to many others, here
is a digest of all the answers I got.
==================
Bibiliography
BRUNET, Etienne - Le vocabulaire francais de 1789 a nos
jours d'apres les donnes du TLF (Tresor de la langue francaise)
vol. I - IV, Slatkine, Geneve, 1981
CATACH, Nina "Les listes orthographiques de base du francais (LOB)"
[sous-titre: Les mots les plus frequents et leurs formes flechies les
plus frequentes], ed. Nathan Recherche.
GOUGENHEIM, G., G.R. MICHEA, P. RIVENC, A. SAUVAGEOT (1964)
L'elaboration du francais fondamental (1er degr) : Etude sur
l'etablissement d'un vocabulaire et d'une grammaire de base,
Didier, Paris, 302 p.
Tine Greidanus. 1990. _Les constructions verbales en francais parle. Etude
quantitative et descriptive de la syntaxe des 250 verbes les plus frequents_
(_Linguistische Arbeiten_, 243). Tubingen: Niemeyer.
[inclut des indications tirees de cinq listes de frequence differentes]
LACOUTURE, R et G. LAPALME, Une implantation informatique du
francais fondamental. TSI, Vol 7, no 5, 1988, p 465-475
LEBART, Ludovic; SALEM, Andre (1988), Analyse statistique des
donnees textuelles, Bordas, Paris, 209 p.
MULLER, Charles (1973), Initiation aux methodes de la statistique
linguistique, Hachette, Paris, 187 p.
MULLER, Charles (1977), Principes et methodes de statistique
lexicale, Hachette, Paris, 205 p.
PHAL, Andre (1971), Vocabulaire general d'orientation scientifique
et technique : Part du lexique commun dans l'expression
scientifique, CREDIF, Paris, 128 p.
SAVARD et RICHARDS, Les indices d'utilite du vocabulaire fondamental
francais (Quebec: L'universite Laval, 1970)
==================
==================
Databases
ARTFL, a Cooperative Project between the Centre National de la Recherche
Scientifique (TLF, Nancy) and The University of Chicago, is a Textual
Database of 2000 Texts from 17th-20th Centuries in Literature, Philosophy,
Arts, Sciences...
Can provide frequency lists based on the Tresor de la Langue Francaise
(115 million tokens) in a number of authors and periods.
Contact: Mark Olsen <markgide.uchicago.edu>
The TLF ("Tresor de la langue francaise") database contains a series of
word frequency statistics. The frequency of a word according to the
database is given at the end of each entry.
The ARTFL researchers have established lists giving the frequencies of
words in their texts (literature) in 50 years periods.
The lists are in alphabetical order and list everything: typos and nonce
words as well as everyday words.
==================
Works published by the "Francais Fondamental" project contain word
statistics (on which morphology must be done afterwards).
The idea behind the FF project was to delimit the smallest subset of French
allowing one to understand and be understod. The goal was to teach French
this subset to foreigners.
Gougenheim et al. contains the frequency of everyday conversation words,
established from recorded conversations.
The dictionary contains the 1000 most frequent French words.
Savard et Richards contains a list of more than 1000 words, taken
from a "basic vocabulary", each word given with its frequency.
==================
The Hansards (bilingual transcripts of the Canadian Parliament debates)
corpus is currently available through the ACL/DCI.
Counting word frequencies in it is pretty straightforward (says Ken Church).
Contact: ACL/DCI
==================
==================
Thanks to:
Mark Olsen <markgide.uchicago.edu>
Annie Zaenen <Annie_Zaenen.parcxerox.com>
Ken Church <churchISI.edu>
Evelyne Tzoukermann <evelyneklee.research.att.com>
Patrick Drouin <padrouinlavalvm1.bitnet>
Frederique Molines <molinesirit.fr>
Dusko Vitas <xpmfl02yubgss21.bitnet>
Tony Chadwick <chadantkean.ucs.mun.ca>
Guy Lapalme <lapalmeiro.umontreal.ca>
Gary A. Coen <gcoenemx.utexas.edu>
Bert Peeters <peeterstasman.cc.utas.edu.au>
 Dominique Estival
 ISSCO, Universite de Geneve
 54 rte des Acacias
 CH-1227 Geneve
 tel: +41-22-705-7116
 fax: +41-22-300-1086
 <estivaldivsun.unige.ch>
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue