LINGUIST List 35.3069

Tue Nov 05 2024

FYI: Updated languages: Looking for linguist native speakers of various languages

Editor for this issue: Joel Jenkins <joellinguistlist.org>



Date: 28-Oct-2024
From: Loretta Gasparini <lgasparinistudent.unimelb.edu.au>
Subject: Updated languages: Looking for linguist native speakers of various languages
E-mail this message to a friend

Dear colleagues,

Apologies for reposting – I have some new languages we could include that I previously did not mention, which I think could be of interest.

A team of us at The University of Melbourne and our industry partner Redenlab (https://redenlab.com/) are working on a pipeline for automated parts-of-speech tagging across different languages. We are looking for linguist native speakers of various languages.

UPDATED LIST OF LANGUAGES FOR WHICH WE ARE SEEKING NATIVE SPEAKERS: Afrikaans, Amharic, Asturian, Belarusian, Bengali, Bulgarian, Danish, Estonian, Finnish, French (Canada), Hebrew, Hungarian, Icelandic, Kazakh, Korean, Kyrgyz, Latvian, Macedonian, Malayalam, Maltese, Marathi, Romanian, Slovak, Slovenian, Spanish (Argentina), Spanish (Cuba), Spanish (Mexico), Spanish (Spain), Swedish, Tagalog, Tamil, Telugu, Thai, Urdu, Uyghur, Welsh, Wolof, Yoruba

I have received enough interest for the following languages, thank you to everyone who already reached out: Arabic, Armenian, Basque, Catalan, Chinese (Traditional), Chinese (Simplified), Croatian, Czech, Dutch, Farsi/Persian, French (France), Galician, German, Greek, Hindi, Indonesian, Italian, Japanese, Lithuanian, Norwegian, Polish, Portuguese (Brazil), Portuguese (Portugal), Russian, Serbian, Spanish (Chile), Turkish, Ukrainian, Vietnamese.

We could additionally include other languages I have not mentioned if:

(1) There is a Natural Language Processing (NLP) library with Parts-of-Speech tagging capability that supports that language. For example, see all the languages currently supported by Stanza (https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-5150)

(2) We could source a translation in the language of the North Wind & Sun fable. Many ‘Illustrations of the IPA’ papers contain a translation (https://scholars.sil.org/kenneth_s_olson/ipa_illustrations). Or you may be able to provide the translation - I can send the English passage and other language translations if helpful.

Apologies that this excludes many languages. Here is information on how to add new languages to Stanza (https://stanfordnlp.github.io/stanza/new_language.html). We would welcome replications of our current study with new languages as they become available in NLP libraries in future.

THE WORK: We're looking for 1-2 linguists to label a 120-word passage for its parts of speech in their native language (estimated max 2 hours). The workload may be longer if we also need to source a translation. We are aiming for the translation (if needed) and POS tagging to be completed by the end of November 2024.

THE PROJECT: We would then compare the manually-labelled parts of speech with available automated methods. This work will be unpaid, but we will be writing the work into a journal article and will include everyone who does any part-of-speech tagging as a co-author as part of a consortium. We are aiming for the parts-of-speech tagging of the 120-word passage to be completed by the end of November 2024, to then write into a paper ready to submit in early 2025.

If you are a linguist (Bachelor's or higher degree in Linguistics) who is a native speaker of any language fulfilling the above criteria, feel free to email me ([email protected]) with 1-2 sentences about your degree and experience in Linguistics and any questions, and I will get back to you with more info and next steps.

Regards,
Loretta (Lottie) Gasparini
PhD Candidate
The University of Melbourne
Email: [email protected]; [email protected]

Linguistic Field(s): Computational Linguistics




Page Updated: 05-Nov-2024


LINGUIST List is supported by the following publishers: