LINGUIST List 32.718: Media: Publication of the CorpAGEst Corpus

LINGUIST List 32.718

Thu Feb 25 2021

Media: Publication of the CorpAGEst Corpus - Free Access on Ortolang

Editor for this issue: Everett Green <everettlinguistlist.org>

Date: 16-Feb-2021
From: Catherine Bolly <catherinebollyhotmail.com>
Subject: Publication of the CorpAGEst Corpus - Free Access on Ortolang
E-mail this message to a friend

The CorpAGEst project - ‘A corpus-based multimodal approach to the pragmatic competence of the elderly’ - aims to establish the verbal and gestural profile of very old people, looking at their pragmatic competence in real-world settings (Bolly & Boutet, 2018). The corpus data consist of semi-directed, face-to-face interviews between an adult and a very old subject (75 y. old and more) that were audio-video recorded, transcribed and aligned to the sound signal. All participants are native-speakers of French and healthy persons, that is, without any a priori major injury or cognitive impairment.

The originality of the method – compared to existing multimodal models – lays in its integrative and comprehensive approach, which tends to reach a maximum exhaustivity, systematicity and interoperability between modes and languages. It also adopts an extended view of pragmatics by pushing the boundaries of the so-called ‘pragmatic units’ at their lower limit in speech (including, among others, filled pauses and breathtaking) and gesture (including, among others, adaptors and beats). The two-step annotation procedure has been developed to avoid interpretative bias at every level of analysis: starting from a mono-modal approach to spoken and gestural data, respectively – which is based on the description of linguistic or physiological parameters (e.g. syntactic position of discourse markers, physical configuration of the hand), the analysis then moves to a multimodal functional annotation taking the overall context of interaction into account. The annotation procedure required selecting and sampling the primary audio and video sources.

Contextual independent variables are part of the corpus design, such as the environment type (private vs. residential home), the social tie between the participants (familiar vs. unknown interviewer), and the task type (focusing on past events vs. present-day life). Metadata also provide information about the interaction situation (e.g., date, place, duration, quality of the recordings), the interviewer and the interviewee (e.g., sex, education, profession, mother tongue, geographic origin, living environment, social tie between interlocutors, subjective scale of life quality and health, scores from clinical testing, etc.).

Access to the corpus data (audio, video and transcription), the annotation guidelines, the annotated files and documentation on the Ortolang platform : https://www.ortolang.fr/market/corpora/corpagest

Linguistic Field(s): Applied Linguistics
                            Discourse Analysis
                            Language Acquisition
                            Neurolinguistics
                            Pragmatics
                            Psycholinguistics
                            Semantics
                            Sociolinguistics
                            Syntax
                            Text/Corpus Linguistics

Subject Language(s): French (fra)
Language Family(ies): Indo-European

Page Updated: 25-Feb-2021