Abstract
Recent conceptualisations of bilingualism are moving away from strict categorisations, towards continuous approaches. This study supports this trend by combining empirical psycholinguistics data with machine learning classification modelling. Support vector classifiers were trained on two datasets of coded productions by Italian speakers to predict the class they belonged to (“monolingual”, “attriters” and “heritage”). All classes can be predicted above chance (>33%), even if the classifier's performance substantially varies, with monolinguals identified much better (f-score >70%) than attriters (f-score <50%), which are instead the most confusable class. Further analyses of the classification errors expressed in the confusion matrices qualify that attriters are identified as heritage speakers nearly as often as they are correctly classified. Cluster clitics are the most identifying features for the classification performance. Overall, this study supports a conceptualisation of bilingualism as a continuum of linguistic behaviours rather than sets of a priori established classes.
Original language | English |
---|---|
Pages (from-to) | 248-256 |
Number of pages | 9 |
Journal | Bilingualism-Language and Cognition |
Volume | 28 |
Issue number | 1 |
Early online date | 24 May 2024 |
DOIs | |
Publication status | Published - Jan 2025 |
Keywords
- attrition
- bilingualism
- classification
- heritage speakers
- support vector machine
Datasets
-
Moving to continuous classifications of bilingualism through machine learning trained on language production
Coco, M. I. (Creator), Smith, G. (Creator), Spelorzi, R. (Creator) & Garraffa, M. (Creator), Open Science Framework, 26 Mar 2024
Dataset