LINGUIST List 30.21

Thu Jan 03 2019

Software: A Toolbox for Phonologizing French Speech Corpora

Editor for this issue: Everett Green <>

Date: 03-Jan-2019
From: Sharon Peperkamp <>
Subject: A Toolbox for Phonologizing French Speech Corpora
E-mail this message to a friend

A toolbox containing an automatic French text phonologizer that transforms orthographic transcriptions of speech into approximate phonological transcriptions taking into account four phonological rules has been made available on GitHub:

The toolbox is tailored for use with CHILDES corpora. That is, it contains a module that takes as its input speech transcriptions written in CHAT format and returns an orthographic transcription of the utterances produced by adults, without any annotations. This in turn serves as the input to the phonologizer module.

All scripts are written in Python 2 and are easy to modify. In particular, phonological rules may be added, removed or modified, according to the user’s needs and insights.

This GitHub repository also contains a short paper describing the toolbox, as well as an example of a phonologized corpus of infant-directed speech. If you use the toolbox and/or the sample corpus, please cite the paper.

Linguistic Field(s): Phonology
                            Text/Corpus Linguistics

Subject Language(s): French (fra)

Page Updated: 03-Jan-2019