Get Involved at LINGUIST ListGraduate Assistantships and Student Employment
The LINGUIST Team
Graduate Assistantships and Student Employment
The LINGUIST List, with the support of donations from subscribers and the publishing community and the hosting institution Indiana University (College of Arts and Sciences), currently provides support for 4 graduate students, who serve as LINGUIST List editors and participate in the ongoing research projects and activities. Graduate assistants are usually enrolled in one of the Indiana University linguistics programs, although some past GAs have been pursuing degrees in computer or cognitive science, informatics, library science, or a related discipline.
If you are interested in pursuing graduate studies at Indiana University in any of the linguistics programs, please consult the Department of Linguistics website and ask the program coordinators for advice and support. If you are interested in getting involved and working at The LINGUIST List, contact us to discuss options and opportunities.
The LINGUIST List offers summer internships for undergraduate students, even senior highschool students. If you are interested in working with The LINGUIST List over the summer, consider applying for our summer internship program!
Interns at LINGUIST List have the opportunity to participate in the daily operations of the LINGUIST List. They also get involved in interdisciplinary research projects and activities under the supervision of local faculty at Indiana University or visiting at The LINGUIST List.
Internship Program at the LINGUIST List during summer 2015!
The internship program ahs a core time between mid of May and mid of August 2015. It is possible to join us between mid of June or July, and mid of September or October as well. Please indicate your preferences in the application.
Detailed application instructions can be found in the LINGUIST List call issue.
During the internship you will work on LINGUIST List tasks, including editing submissions to the LINGUIST List and correspondence with linguists, and you will have the opportunity to work on concrete research projects related to language and STEM sub-disciplines: dealing with language documentation, speech and language data, as well as engineering of software solutions and algorithms, mathematical concepts and methods, and technologies.
Depending on individual interests or skills interns can get involved in the following projects for a certain proportion of their work time:
Language and Location: A Map Accessibility Project (LL-MAP), with Damir Cavar, Malgorzata Cavar, Lwin Moe: This project began as a joint NSF-sponsored project of Eastern Michigan University and Stockholm University. In LL-MAP, language information is integrated with data from the physical and social sciences by means of a Geographical Information System (GIS). This project is being redesigned and restarted at Indiana University. We are working on new components and extensions of the GIS system, as well as the integration of the maps and geo-linguistic data in other language and linguistic information systems.
MultiTree, with Damir Cavar, Malgorzata Cavar, Lwin Moe: The MultiTree project is a digital library of scholarly hypotheses about language relationships and subgroupings. This information is organized in a searchable database with a web interface, and each hypothesis is presented graphically as a diagram of a family tree. At the moment we are not only extending the data content, but also working on the integration of the data and system in other language and linguistic information systems, and designing the new interface.
Interns may also work on the General Ontology for Linguistic Description (GOLD), the Lexical Enhancement via the GOLD Ontology (LEGO) project, or language resource and technology projects related to under-resourced and endangered languages, in addition to assisting with the LINGUIST List mailing list and website. These environments and systems are being reactivated at the new hosting institution. Interns are not only welcome to help us extending and developing the content of these systems and information resources, but also to provide technological assistance and support for redeveloping and engineering of new technologies for these and new language data sets.
A pipeline for annotating Latin with Kalani Craig and Sandra Kuebler: This project focuses on creating a pipeline of tools that people without a technical background can use to annotate Latin with POS tags and dependencies. The first step in such a pipeline would be spelling normalization, potentially a morphological analysis, then POS tagging and dependency parsing.
The DAPS (Detecting Anomalous Parse Structures) project with Markus Dickinson and Amber Smith. This project seeks to develop ways of detecting errors in dependency parse structures, especially when the parsers are of poor quality, as in the case of lower-resourced languages and new domains. Our ultimate goal is to assist in building such new annotated corpora quickly and in a way which ensures the best annotation quality. As part of developing an workflow to incorporate suggestions for annotators, we are looking to develop a clean, intuitive, and flexible annotation interface that allows annotators to easily investigate error suggestions and explore the data.
Yiddish speech corpus and ASR with Dov-Ber Kerler (Indiana University), Damir Cavar, Malgorzata Cavar, Lwin Moe. Interns may participate in the preparation of the language data and modules for an automatic speech recognition system.
Chatino speech corpus and ASR with Hilaria Cruz (UT Austin, AILLA), Damir Cavar, Malgorzata Cavar, Lwin Moe. Interns may participate in the preparation of a speech corpus and language data and modules for the development of an automatic speech recognition system and forced aligner for Chatino.
Phonation-Type Contrast Typology and Resourcing with Kelly Harper Berkson. This project focuses on developing corpora of languages that feature typologically uncommon phonation-type contrasts such as the phonemic use of breathy voice. The current language of investigation is Marathi, with emphasis on its breathy voiced sonorants, and students interested in Marathi are particularly encouraged to apply. Speakers of languages featuring uncommon laryngeal or phonation-type contrasts with a desire to develop corpora or materials related to those languages are also encouraged to apply. In addition to annotation projects and the development of speech corpora, summer interns attached to this project will have the opportunity to participate in experimental work. Based on their individual interests, they may choose to gain experience using experiment design and presentation software, pursue high quality audio recording projects, and/or conduct perception experiments.
Chimpoto corpus with Robert Botne (Indiana University) and the LINGUIST List crew. Interns will be able to work with Chimpoto, analyze and transcribe audio files, and potentially enter the material into an ELAN format.
German speech corpus for Texas-German with Hans Boas (UT Austin), Damir Cavar, Malgorzata Cavar, Lwin Moe. Interns may participate in the preparation of a speech corpus and the language data for the development of an automatic speech recognition system and a forced aigner for the Texas-German corpus (by Hans Boas), and related German variants spoken in the USA.
Grammar Engineering (Morphology and Syntax, Finite State and LFG-based) for different languages with Damir Cavar, Malgorzata Cavar, Lwin Moe. Slavic and Asia-Pacific languages with the LINGUIST List crew. Interns are working on formal morphologies using the Lexc and similar frameworks to develop Finite State morphologies using environments like XFST, Foma, OpenFST. These morphologies serve also as the lexical base for formal grammars and parsers.
The projects are related to language data encoding in various formats and based on standard technologies (e.g. XML and various encoding standards and strategies). Interns will have the opportunity to learn how to use common language technologies for the annotation of language corpora,the creation of digital dictionaries and lexicons, or audio and video recordings for language documentation. The projects involve also the use of common Natural Language Processing (NLP) algorithms and Human Language Technology (HLT) applications. The tasks may also include the development (programming or coding) of new algorithms, or online tools and interfaces.
We will work this summer with many languages, currently the languages in focus are for example:
- Yiddish: speech-corpus development, acoustic and language models, Automatic Speech Recognition for the transcription of audio language data.
- Turkic: computational morphology
- Slavic: dialectal corpora and data, morphological analysis, grammar engineering, syntax and parsing
- Chatino: speech-corpus development, acoustic and language models, Automatic Speech Recognition for the transcription of audio language data.
- Chimpoto: analysis, transcription, ELAN file format.
- Asia-Pacific languages: e.g. Burmese, Thai, Lao, Khmer, Vietnamese
- German: corpora of dialects and variants.
- Data encoding standards: XML - TEI, Lift, PLS (Wikipedia: PLS on Wiki
- Data storage: XML, SQL in PostgreSQL and MySQL, Unicode text
- Web development: Apache, PostgreSQL, MySQL, CGI, Node, Django, Ruby on Rails, ColdFusion
- NLP tools: CoreNLP, XFST, Foma, OpenFST, XLE, HTK, Sphinx, many more
- Development environments: Eclipse, NetBeans, PyCharm, VIM, etc.
Internships are generally available for a three-month period between May and August, 30 to 40 hours per week, and interns receive a modest stipend. Housing is not provided, although The LINGUIST List team and the local linguistics students and faculty at Indiana University in Bloomington can provide some assistance in locating accommodations. International applicants are encouraged, but must have or obtain an appropriate visa, i.e. one that permits them to work, study, or intern in the US.
The LINGUIST List welcomes volunteers with an aptitude in linguistics and technology. If you are in the Bloomington area and able to commit to at least 10 hours per week, contact us about volunteer opportunities during the entire year!
The LINGUIST Team
Learn more about the current LINGUIST crew!
LINGUIST positions attract students from around the world; past and present LINGUIST team members have joined us from many countries.Read more about LINGUIST Listers' experiences or like the LINGUIST List on Facebook or Google+, and follow us on Twitter!
Email Address: linguistlinguistlist.org
The LINGUIST List
Department of Linguistics
Memorial Hall 322
1021 E. 3rd Street
Bloomington, IN 47405-7005
Telephone Number (with voice box and text): +1 812 391-3602
Fax Number: +1 888 908-2629