LINGUIST List 36.793 Confs: The 2025 BabyLM Workshop (China)

LINGUIST List 36.793

Wed Mar 05 2025

Confs: The 2025 BabyLM Workshop (China)

Editor for this issue: Erin Steitz <ensteitzlinguistlist.org>

Date: 05-Mar-2025
From: Aaron Mueller <amuellerbu.edu>
Subject: The 2025 BabyLM Workshop
E-mail this message to a friend

The 2025 BabyLM Workshop

Date: 05-Nov-2025 - 09-Nov-2025
Location: Suzhou, China
Meeting URL: https://babylm.github.io

Linguistic Field(s): Cognitive Science; Computational Linguistics; Language Acquisition; Psycholinguistics

BabyLM aims to bring together multiple disciplines to answer an enduring question: how can a computational system learn language from limited inputs? Cognitive scientists investigate this question by trying to understand how humans learn their native language during childhood. Computer scientists tackle this question by attempting to build efficient machine-learning systems to accomplish this task. BabyLM brings these two communities together, asking how insights from cognitive science can be used to assemble more sample-efficient language models and how language modeling architectures can inspire research in cognitive science.

Previously, BabyLM has been organized as a competition, challenging participants to train a language model on a human-sized amount of data, up to 100 million words. This year, we expand the scope of BabyLM by presenting it as a workshop. While we will still run the competition, we also invite original research papers at the intersection of cognitive science and language modeling without entry into any competition track (see suggested topics below).

*Competition Tracks*
We are keeping the strict, strict-small, and multimodal tracks from previous years. These allow participants to train on 10M words, 100M words, or 100M words and unlimited visual data, respectively.

This year, we introduce the **interaction** track. This track facilitates the exploration of feedback and interaction with LLM agents during pretraining. This track allows pretrained language models to serve as teacher models, generating textual supervision for the student models to use as training signals; however, student models are still required to be trained on 100M words or fewer.

*Workshop Topics*
The BabyLM workshop encourages interdisciplinary submissions at the interface of language modeling, cognitive science, language acquisition, and/or evaluation. To this end, we will accept papers on a variety of topics, including but not limited to the following:

* Data-efficient architectures and training methods
* Data curation for efficient training
* Cognitively and linguistically inspired language modeling and evaluation
* Scaling laws; large and small model comparisons
* Cognitively inspired multimodal modeling or evaluation

*Submission and Key Dates*
We will accept submissions through ACL Rolling Review (ARR) or directly through OpenReview. Paper submissions to the workshop can ignore competition entry deadlines. Exact dates will be determined based on official EMNLP guidelines as they become available.

* Early February: Call for papers released
* End of February: Training data released
* End of April: Evaluation pipeline released
* May 19: ARR submission deadline
* Mid-late July: Direct submission deadline
* Mid-August: Direct submission reviews due; ARR commitment deadline
* Early September: Decisions released
* Mid-September: Camera-ready due
* 5-9 November: Workshop at EMNLP in Suzhou

Submissions will be made through OpenReview. Submissions can be full archival papers (or non- archival upon request) and can be up to eight pages in length. Formatting requirements will follow standards for EMNLP 2025 workshops. This includes length and anonymity requirements upon submission. Reviewing will be double-blind. As before, we will allow dual submission; however, we do not allow dual publication.

Papers submitted to the workshop will be evaluated on merit and relevance. For competition participants, acceptance is lenient; we plan only to reject competition submissions that make incorrect or unjustified claims, that have significant technical issues, that do not reveal enough methodological details for replication, or that demonstrate only minimal time investment. Feedback will largely be directed toward improving submissions.

See the BabyLM website for more details: https://babylm.github.io

*Contact*
For questions and discussions related to the competition or workshop, please join the BabyLM slack channel. A link is provided on the BabyLM website.

Page Updated: 05-Mar-2025

LINGUIST List is supported by the following publishers: