The LINGUIST List is dedicated to providing information on language and language analysis, and to providing the discipline of linguistics with the infrastructure necessary to function in the digital world. LINGUIST is a free resource, run by linguistics students and faculty, and supported primarily by your donations. Please support LINGUIST List during the 2016 Fund Drive.
Get Involved at LINGUIST ListGraduate Assistantships and Student Employment
The LINGUIST Team
Graduate Assistantships and Student Employment
There are various projects, presentations, meetings, and other activities at The LINGUIST List. Let us know, if you would like to participate any of these. See the online calendar for more details.
The LINGUIST List, with the support of donations from subscribers and the publishing community and the hosting institution Indiana University (College of Arts and Sciences), currently provides support for 4 graduate students, who serve as LINGUIST List editors and participate in the ongoing research projects and activities. Graduate assistants are usually enrolled in one of the Indiana University linguistics programs, although some past GAs have been pursuing degrees in computer or cognitive science, informatics, library science, or a related discipline.
If you are interested in pursuing graduate studies at Indiana University in any of the linguistics programs, please consult the Department of Linguistics website and ask the program coordinators for advice and support. If you are interested in getting involved and working at The LINGUIST List, contact us to discuss options and opportunities.
The LINGUIST List offers summer internships for undergraduate students, even senior highschool students. If you are interested in working with The LINGUIST List over the summer, consider applying for our summer internship program!
Interns at LINGUIST List have the opportunity to participate in the daily operations of the LINGUIST List. They also get involved in interdisciplinary research projects and activities under the supervision of local faculty at Indiana University or visiting at The LINGUIST List.
Internship Program at the LINGUIST List during summer 2016!
The duration of the program is three months preferrably between mid of May and mid of August 2016. In specific cases join us earlier or later during the summer as well. Please indicate your preferences in the application.
Detailed application instructions can be found in the LINGUIST List call issue.
During the internship you will work on LINGUIST List tasks, including editing submissions to the LINGUIST List and correspondence with linguists, and you will have the opportunity to work on concrete research projects related to language and STEM sub-disciplines: dealing with language documentation, speech and language data, as well as engineering of software solutions and algorithms, mathematical concepts and methods, and technologies.
Depending on individual interests or skills interns can get involved in the following projects for a certain proportion of their work time:
Language and Location: A Map Accessibility Project (LL-MAP), with Damir Cavar, Malgorzata Cavar, Lwin Moe: This project began as a joint NSF-sponsored project of Eastern Michigan University and Stockholm University. In LL-MAP, language information is integrated with data from the physical and social sciences by means of a Geographical Information System (GIS). This project is being redesigned and restarted at Indiana University. We are working on new components and extensions of the GIS system, as well as the integration of the maps and geo-linguistic data in other language and linguistic information systems.
GeoLing: a GIS-based information service for linguistic events, jobs, institutions worldwide with Damir Cavar and Lwin Moe: This is a service that maps all events and posts on LINGUIST List that have geo-coordinates on a GIS system. There is a modern backend infrastructure to achieve the mapping. We are extending this service with various NLP and HLT components for extraction of information and content from positing on mailing lists, as well as smart classification or clustering of conent of unstructured text.
MultiTree, with Damir Cavar, Malgorzata Cavar, Lwin Moe: The MultiTree project is a digital library of scholarly hypotheses about language relationships and subgroupings. This information is organized in a searchable database with a web interface, and each hypothesis is presented graphically as a diagram of a family tree. At the moment we are not only extending the data content, but also working on the integration of the data and system in other language and linguistic information systems, and designing the new interface.
Interns may also work on the General Ontology for Linguistic Description (GOLD), the Lexical Enhancement via the GOLD Ontology (LEGO) project, or language resource and technology projects related to under-resourced and endangered languages, in addition to assisting with the LINGUIST List mailing list and website. These environments and systems are being reactivated at the new hosting institution. Interns are not only welcome to help us extending and developing the content of these systems and information resources, but also to provide technological assistance and support for redeveloping and engineering of new technologies for these and new language data sets.
Picard speech corpus with Julie Auger and help from Damir Cavar, Malgorzata Cavar. Interns may participate in the preparation of a corpus of oral and written Picard, an endangered Gallo-Romance language spoken in Northern France. The project will involve the transcription of audio and video recordings using ELAN and the preparation of modules for the development of an automatic speech recognition system and forced aligner for Picard, as well as possible participation in original research projects.
Crow Language Documentation and Revitalization and corpus project with The Language Conservancy. This is project related to the job description on the LINGUIST List page. The intern can get involved working on the preparation of the collected language material from fieldwork, the development of wordlists and lexicons, text and speech corpora, and many more related activities. See for more details the Crow Language Consortium website.
Arikara, Hidatsa, Mandan Language Documentation and Revitalization and corpus project with The Language Conservancy. This is project related to the job description on the LINGUIST List page. The intern can get involved working on the preparation of the collected language material from fieldwork, the development of wordlists and lexicons, text and speech corpora, and many more related activities. See for more details the MHA Language Project Mandane website, the MHA Language Project Hidatsa website, and the MHA Language Project Arikara website.
A pipeline for annotating Latin with Kalani Craig and Sandra Kuebler: This project focuses on creating a pipeline of tools that people without a technical background can use to annotate Latin with POS tags and dependencies. The first step in such a pipeline would be spelling normalization, potentially a morphological analysis, then POS tagging and dependency parsing.
The DAPS (Detecting Anomalous Parse Structures) project with Markus Dickinson and Amber Smith. This project seeks to develop ways of detecting errors in dependency parse structures, especially when the parsers are of poor quality, as in the case of lower-resourced langauges and new domains. Our ultimate goal is to assist in building such new annotated corpora quickly and in a way which ensures the best annotation quality. Depending upon interest, interns may: 1. develop a clean, intuitive, and flexible interface for annotators to easily investigate error suggestions and explore the data, as part of developing a workflow to incorporate suggestions for annotators; 2. linguistically investigate properties of the technology and output across a range of data types and languages; 3. improve the current code by making it more efficient, in particular to parallelize some of the procedures; or 4. develop technology to suggestions corrections automatically and incorporate it into the pipeline.
Yiddish speech corpus and ASR with Dov-Ber Kerler (Indiana University), Damir Cavar, Malgorzata Cavar, Lwin Moe. Interns may participate in the preparation of Yiddish speech and text corpora, language data and modules for an automatic speech recognition system, part-of-speech taggers and syntactic parsers, or training of machine translation models from parallel corpora.
Chatino speech corpus and ASR with Hilaria Cruz (UT Austin, AILLA), Damir Cavar, Malgorzata Cavar, Lwin Moe. Interns may participate in the preparation of a speech corpus and language data and modules for the development of an automatic speech recognition system and forced aligner for Chatino.
Phonation-Type Contrast Perception, Typology, and Resourcing with Kelly Harper Berkson. In the Phonetics and Phonology lab at IU we investigate the production and perception of phonation type contrasts. Current work focuses on breathy voiced sonorants in Marathi, breathy vowels in Gujarati, and a nascent investigation pf phonation contrasts in Khams Tibetan. Students interested in Marathi or Gujarati are particularly encouraged to apply, as are speakers of languages featuring uncommon laryngeal or phonation-type contrasts. In addition to annotation projects and the development of speech corpora, summer interns attached to this project will have the opportunity to participate in experimental work. Based on their individual interests, they may choose to gain experience using experiment design and presentation software, pursue high quality audio recording projects, and/or conduct perception experiments.
Chimpoto corpus with Robert Botne (Indiana University) and the LINGUIST List crew. Interns will be able to work with Chimpoto, analyze and transcribe audio files, and potentially enter the material into an ELAN format.
German speech corpus for Texas-German with Hans Boas (UT Austin), Damir Cavar, Malgorzata Cavar, Lwin Moe. Interns may participate in the preparation of a speech corpus and the language data for the development of an automatic speech recognition system and a forced aigner for the Texas-German corpus (by Hans Boas), and related German variants spoken in the USA.
Grammar Engineering (Morphology and Syntax, Finite State and LFG-based) for different languages with Damir Cavar, Malgorzata Cavar, Lwin Moe. Slavic and Asia-Pacific languages with the LINGUIST List crew. Interns are working on formal morphologies using the Lexc and similar frameworks to develop Finite State morphologies using environments like XFST, Foma, OpenFST. These morphologies serve also as the lexical base for formal grammars and parsers.
The projects are related to language data encoding in various formats and based on standard technologies (e.g. XML and various encoding standards and strategies). Interns will have the opportunity to learn how to use common language technologies for the annotation of language corpora,the creation of digital dictionaries and lexicons, or audio and video recordings for language documentation. The projects involve also the use of common Natural Language Processing (NLP) algorithms and Human Language Technology (HLT) applications. The tasks may also include the development (programming or coding) of new algorithms, or online tools and interfaces.
We will work this summer with many languages, currently the languages in focus are for example:
- Yiddish: speech-corpus development, acoustic and language models, Automatic Speech Recognition for the transcription of audio language data.
- Turkic: computational morphology
- Slavic: dialectal corpora and data, morphological analysis, grammar engineering, syntax and parsing
- Chatino: speech-corpus development, acoustic and language models, Automatic Speech Recognition for the transcription of audio language data.
- Chimpoto: analysis, transcription, ELAN file format.
- Asia-Pacific languages: e.g. Burmese, Thai, Lao, Khmer, Vietnamese
- German: corpora of dialects and variants.
- Data encoding standards: XML - TEI, Lift, PLS (Wikipedia: PLS on Wiki
- Data storage: XML, SQL in PostgreSQL and MySQL, Unicode text
- Web development: Apache, PostgreSQL, MySQL, CGI, Node, Django, Ruby on Rails, ColdFusion
- NLP tools: CoreNLP, XFST, Foma, OpenFST, XLE, FLE, HTK, Sphinx, many more
- Development environments: Eclipse, NetBeans, PyCharm, VIM, etc.
Internships are generally available for a three-month period between May and August, 20 to 29 hours per week, and interns can receive a modest stipend. Housing is not provided, although The LINGUIST List team and the local linguistics students and faculty at Indiana University in Bloomington can provide some assistance in locating accommodations. International applicants are encouraged, but must have or obtain an appropriate visa, i.e. one that permits them to work, study, or intern in the US.
The LINGUIST List welcomes volunteers with an aptitude in linguistics and technology. If you are in the Bloomington area and able to commit to at least 10 hours per week, contact us about volunteer opportunities during the entire year!
The LINGUIST Team
Learn more about the current LINGUIST crew!
LINGUIST positions attract students from around the world; past and present LINGUIST team members have joined us from many countries.Read more about LINGUIST Listers' experiences or like the LINGUIST List on Facebook or Google+, and follow us on Twitter!
Email Address: linguistlinguistlist.org
The LINGUIST List
Department of Linguistics
Memorial Hall 322
1021 E. 3rd Street
Bloomington, IN 47405-7005
Telephone Number (with voice box and text): +1 812 855-4617