LINGUIST List 32.992

Thu Mar 18 2021

Confs: French; Text/Corpus Linguistics/Online

Editor for this issue: Lauren Perkins <laurenlinguistlist.org>



Date: 17-Mar-2021
From: Michela Russo <mrussouniv-paris8.fr>
Subject: Le Nouveau Corpus d’Amsterdam (NCA) et la Base de Français Médiéval (BFM) : états et perspectives philologiques et linguistiques/The New Amsterdam Corpus (NCA) and the Base de Français Médiéval (BFM): philological and linguistic status and perspectives
E-mail this message to a friend

Le Nouveau Corpus d’Amsterdam (NCA) et la Base de Français Médiéval (BFM) : états et perspectives philologiques et linguistiques/The New Amsterdam Corpus (NCA) and the Base de Français Médiéval (BFM): philological and linguistic status and perspectives

Date: 09-Apr-2021 - 09-Apr-2021
Location: Paris/Lyon (Virtual conference), France
Contact: Michela Russo
Contact Email: < click here to access email >
Meeting URL: https://www.sfl.cnrs.fr/journee-detudes-vendredi-9-avril-2021-10h-17h

Linguistic Field(s): Text/Corpus Linguistics

Subject Language(s): French

Meeting Description:

This workshop focuses on two medieval French corpora, the New Amsterdam Corpus (NCA, 299 literary texts and text excerpts, including 57 prose texts), accessible online (TWIC online research https://sites.google.com/site/achimstein/research/resources/nca ) or by TXM in local installation, and the Medieval French database (BFM, 170 texts) accessible on the BFM-TXM textometric analysis portal (http://txm.bfm-corpus.org), but also exploitable by TXM in local installation.

The New Amsterdam Corpus (NCA), edited (revised and lemmatized) by Pierre Kunstmann and Achim Stein, is the new version of the Amsterdam Corpus, a corpus of Old French literary texts created in the early 1980s by Anthonij Dees (Vrije Universiteit Amsterdam) and his collaborators (Piet van Reenen and others). It resulted in the Atlas of the linguistic forms of literary texts of Old French (Dees et al. 1987).

The forms of these texts were manually annotated by Dees’ team with a set of numerical tags encoding parts of speech and other morphological categories. Some texts are electronic versions of existing editions, others are transcriptions of manuscripts made especially for this corpus.

The aim of this workshop is to introduce the digital corpus of literary texts of the New Amsterdam Corpus (NCA), the electronic version of the texts provided by Piet van Reenen (Free University of Amsterdam), which contains about 200 different texts written between the beginning of the twelfth and the end of the fourteenth century (some of them in several manuscripts, giving a total of 299 texts), its type of syntactic annotation, and its morphological labeling.

Dees’team also had a corpus of 3300 local, dated original charters (collected mainly by Anthonij Dees and Piet van Reenen). The result of this work was the Atlas of Forms and Constructions of French Charters of the 13th Century (Dees et al. 1980). Thanks to the Vrije Universiteit Amsterdam a large part of these charters has been digitalized (in its grammatical parts, nominal groups, pronominal groups, etc.).

During this workshop, we will focus on the description of these 13th century charters, Parisian and Anglo-Norman charters, and the Aube charters (made available thanks to Piet van Reenen), and on their morphological annotation (320,000 words, annotated with POS and numerical codes).

As for the BFM, the Base de français médiéval, it has been located at the ENS de Lyon since its inception. Founded in 1989 by Christiane Marchello-Nizia, the BFM is currently managed by Céline Guillot-Barbance, scientific director, and Alexei Lavrentiev, director of digital philology. It contains several digital corpora of French texts written between the 9th and the end of the 15th century. The texts are annotated in morphosyntax, lemmatized and the direct speech passages are encoded. Access to the BFM is open and is done through the TXM textometric analysis platform, which offers several search and analysis functionalities through word concordances and textual patterns, etc.

The NCA and the BFM constitute two valuable resources for medieval French.

The French version is available here: https://www.sfl.cnrs.fr/journee-detudes-vendredi-9-avril-2021-10h-17h

Program Information:

See the scheduled program at : https://www.sfl.cnrs.fr/journee-detudes-vendredi-9-avril-2021-10h-17h

Organizers : Michela Russo / Clémence Jaime / Céline Guillot-Barbance / Alexei Lavrentiev

Conférenciers invités : Achim Stein (Institut für Linguistik/Romanistik, Universität Stuttgart) & Alexei Lavrentiev (ENS/ Lyon)

This workshop includes two sessions on medieval French and the digital sources, open to master and doctoral students. All colleagues and students are cordially invited to participate upon registration.

Contact michela.russocnrs.fr & celine.guillotens-lyon.fr

Abstracts :
Conference by Achim Stein (Institut für Linguistik/Romanistik, Universität Stuttgart)
The New Amsterdam Corpus (NAC): origins, annotation and perspectives
In the first part of this conference, I will present the genesis of the oldest digital corpus of medieval French, from the files established by Anthonij Dees’ team at the Free University of Amsterdam in the 1980s to its re-edition 25 years later. The second part will be devoted to the conversion of the original data and the attempts and challenges of lemmatization. In the final part, I will discuss the position that the NCA occupies today in the landscape of ancient corpora and its usefulness from a philological and technical point of view.

Conference by Alexei Lavrentiev & Céline Guillot-Barbance (IHRIM - CNRS & ENS / Lyon)
The Medieval French Database (BFM= Base de français medieval) in 2021: current status and ongoing developments
This conference/demonstration will focus on the lesser known features of the Medieval French database (BFM = Base de français medieval). It will deal with morphosyntactic labeling (Cattex and UD) and lemmatization (automatic and verified), as well as quantitative analysis tools (progression, specificities, factorial correspondence analysis, co-occurrences) provided by the TXM application and not yet available on the online portal. The novelties of the BFM 2021 corpus, scheduled for publication in June-July, will be presented as a conclusion.

Session/Atelier 1
NCA under linguistic analysis. The example of partitivity in Old French (resp. Achim Stein/Michela Russo/Clémence Jaime)
In this group students will work with the corpus features with the local NCA/TXM installation using syntactic queries from the TigerSearch interface implemented online for GRAAL on the BFM/TXM portal, to the diatopic indications (area code, location used in the atlas) and the original annotation of the Amsterdam Corpus. Achim Stein will show students the differences between the results of manual analysis (with reference to the SRCMF Syntactic Reference Corpus of Medieval French http://srcmf.org/) and automatic analysis (of the NCA). He will also introduce the students to automatic (dependent) syntactic analysis of Old French, by showing for example a treebank and applying it to the NCA.
Clémence Jaime (student in M2 ''Linguistics and dialectology'' at UJM Lyon 3) will illustrate from the online and local BFM/NCA/TXM interface (also through regular expressions) ''The example of partitivity in Old French'', research subject of her master thesis.
[Students are advised to install the TXM software: http://textometrie.ens-lyon.fr/spip.php?rubrique61; the NCA https://sites.google.com/site/achimstein/research/resources/nca as well as TIGERSearch zip archive: nca3-for-tiger.zip]

Abstract Session/Atelier 2: (Resp. Alexei Lavrentiev/Zeina Tmart & Céline Guillot-Barbance)
In this Session/Atelier, Zeina Tmart (PhD student at ENS Lyon) will present her research project on the evolution of coordination between the 12th and 16th century French. The presentation will go from the conception of the corpus to its annotation with TXM and the exploitation of the results. The workshop will allow students to work on the annotation of concordances with the TXM software. This feature allows to correct errors of automatic labeling and annotation and to add additional annotations to the words of the corpus.

Join Zoom Meeting: https://zoom.us/j/94018653392?pwd=TTZzbVNJQTlndk5DaTVYcU8wOGFnZz09
Meeting ID: 940 1865 3392
Passcode: SFL (One tap mobile: Passcode 396522)
Find your local number: https://zoom.us/u/aDZbg0NKx




Page Updated: 18-Mar-2021