The recent turn to "big data" from social media corpora has enabled sociolinguists to investigate patterns of language variation and change at unprecedented scales. However, research in this paradigm has been slow to address variable phenomena in minority languages, where data scarcity and the absence of computational tools (e.g., taggers, parsers) often present significant barriers to entry. This article analyzes socio-syntactic variation in one minority language variety, Hasidic Yiddish, focusing on a variable for which tokens can be identified in raw text using purely morphological criteria. In non-finite particle verbs, the overt tense marker tsu (cf. English to, German zu) is variably realized either between the preverbal particle and verb (e.g., oyf-tsu-es-n up-to-eat-INF 'to eat up'; the conservative variant) or before both elements (tsu oyf-es-n to up-eat-INF; the innovative variant). Nearly 38,000 tokens of non-finite particle verbs were extracted from the popular Hasidic Yiddish discussion forum Kave Shtiebel (the 'coffee room'; kaveshtiebel.com). A mixed-effects regression analysis reveals that despite a forum-wide favoring effect for the innovative variant, users favor the conservative variant the longer their accounts remain open and active. This process of rapid implicit standardization is supported by ethnographic evidence highlighting the spread of language norms among Hasidic writers on the internet, most of whom did not have the opportunity to express themselves in written Yiddish prior to the advent of social media.
May 29, 2020
Bleaman, I. L. (2020). Implicit standardization in a minority language community: Real-time syntactic change among Hasidic Yiddish writers. Frontiers in Artificial Intelligence, 3, Article 35. https://doi.org/10.3389/frai.2020.00035