Moving to continuous classifications of bilingualism through machine learning trained on language production

Moreno I. Coco, Guiditta Smith, Roberta Spelorzi, Maria Garraffa

Research output: Contribution to journalArticlepeer-review

3 Downloads (Pure)

Abstract

Recent conceptualisations of bilingualism are moving away from strict categorisations, towards continuous approaches. This study supports this trend by combining empirical psycholinguistics data with machine learning classification modelling. Support vector classifiers were trained on two datasets of coded productions by Italian speakers to predict the class they belonged to (“monolingual”, “attriters” and “heritage”). All classes can be predicted above chance (>33%), even if the classifier's performance substantially varies, with monolinguals identified much better (f-score >70%) than attriters (f-score <50%), which are instead the most confusable class. Further analyses of the classification errors expressed in the confusion matrices qualify that attriters are identified as heritage speakers nearly as often as they are correctly classified. Cluster clitics are the most identifying features for the classification performance. Overall, this study supports a conceptualisation of bilingualism as a continuum of linguistic behaviours rather than sets of a priori established classes.
Original languageEnglish
JournalBilingualism-Language and Cognition
Early online date24 May 2024
DOIs
Publication statusE-pub ahead of print - 24 May 2024

Keywords

  • attrition
  • bilingualism
  • classification
  • heritage speakers
  • support vector machine

Cite this