LINGUIST List 29.2133: Diss: English; Applied Linguistics: Valeriia Bogorevich: ''Native and Non-Native Raters of L2 Speaking Performance: Accent Familiarity and Cognitive Processes''

LINGUIST List 29.2133

Thu May 17 2018

Diss: English; Applied Linguistics: Valeriia Bogorevich: ''Native and Non-Native Raters of L2 Speaking Performance: Accent Familiarity and Cognitive Processes''

Editor for this issue: Sarah Robinson <srobinsonlinguistlist.org>

Date: 16-May-2018
From: Valeriia Bogorevich <vb283nau.edu>
Subject: Native and Non-Native Raters of L2 Speaking Performance: Accent Familiarity and Cognitive Processes
E-mail this message to a friend

Institution: Northern Arizona University
Program: Applied Linguistics
Dissertation Status: Completed
Degree Date: 2018

Author: Valeriia Bogorevich

Dissertation Title: Native and Non-Native Raters of L2 Speaking Performance: Accent Familiarity and Cognitive Processes

Linguistic Field(s): Applied Linguistics

Subject Language(s): English (eng)

Dissertation Director:
Soo Jung Youn
Okim Kang

Dissertation Abstract:

The present study used a mixed methods approach (Tashakkori & Teddlie, 1998; Greene, Carcelli, & Graham, 1989) to investigate the potential differences between native English-speaking and non-native English-speaking raters in how they assess L2 students’ speaking performance. Kane’s (2006) argument-based approach to validity was used as the theoretical framework. The study challenged the plausibility of the assumptions for the evaluation inference, which links the observed performance and the observed score and depends on the assumption that the raters apply the scoring rubric accurately and consistently.

The study analyzed raters’ scoring patterns when using a TOEFL iBT speaking rubric analytically. The raters provided scores for each rubric criterion (i.e., Overall, Delivery, Language Use, and Topic Development). Each rater received individual training, practice, and calibration experience. All the raters filled out a background questionnaire asking about their teaching experiences, language learning history, the background of students in their classrooms, and their exposure to and familiarity with the non-native accents used in the study.

For the quantitative analysis, the two groups of raters 23 native (North American) and 23 non-native (Russian) raters graded and left comments for speech samples from Arabic (n = 25), Chinese (n = 25), and Russian (n = 25) L1 background. Students’ samples were in response to two independent speaking tasks; the students’ responses varied from low to high proficiency levels. For the qualitative part, 16 raters (7 native and 9 non-native) shared their scoring behavior through think-aloud protocols and interviews. The speech samples graded during the think-aloud included Arabic (n = 4), Chinese (n = 4), and Russian (n = 4) speakers.

Raters’ scores were examined using the Multi-Faceted Rasch Measurement using FACETS (Linacre, 2014) software to test group differences between native and non-native raters as well as raters who are familiar and unfamiliar with the accents of students in the study. In addition, raters’ comments were coded and also used to explore rater group differences. The qualitative analyses involved thematical coding of transcribed think-aloud sessions and interview sessions using content analysis (Strauss & Corbin, 1998) to investigate the cognitive processes of raters and their perceptions of their rating processes. The coding included such themes as decision-making and re-listening patterns, perceived severity, criteria importance, and non-rubric criteria (e.g., accent familiarity, L1 match). Afterward, the quantitative and qualitative results were analyzed together to describe the potential sources of rater variability. This analysis was done employing side-by-side comparison of qualitative and quantitative data (Onwuegbuzie & Teddlie, 2003).

The results revealed that there were no radical differences between native and non-native raters; however, some different patterns were observed. Non-native raters also showed more lenient grading patterns towards the students with whom their L1 matched. In addition, all raters, regardless of the group, demonstrated several patterns of rating depending on their focus while listening to examinees’ performance and interpretations of the rating criteria during the decision-making process. The findings can motivate professionals who oversee and train raters at testing companies and intensive English programs to study their raters’ scoring behaviors to individualize training to help make exam ratings fair and raters interchangeable.

Page Updated: 17-May-2018