Computational and Experimental Methods

Implicit Standardization in a Minority Language Community: Real-Time Syntactic Change among Hasidic Yiddish Writers

Isaac L. Bleaman
2020

The recent turn to "big data" from social media corpora has enabled sociolinguists to investigate patterns of language variation and change at unprecedented scales. However, research in this paradigm has been slow to address variable phenomena in minority languages, where data scarcity and the absence of computational tools (e.g., taggers, parsers) often present significant barriers to entry. This article analyzes socio-syntactic variation in one minority language variety, Hasidic Yiddish, focusing on a variable for which tokens can be identified in raw text using purely morphological...

The Gettysburg Corpus: Testing the Proposition That All Tense /æ/s Are Created Equal

Isaac L. Bleaman
Daniel Duncan
2021

Corpus studies of regional variation using raw language data from the internet focus predominantly on lexical variables in writing. However, online repositories such as YouTube offer the possibility of investigating regional differences using phonological variables, as well. This article demonstrates the viability of constructing a naturalistic speech corpus for sociophonetic research by analyzing hundreds of recitations of Abraham Lincoln's Gettysburg Address. We first replicate a known result of phonetic research, namely, that English vowels are longer in duration before voiced...

Beguš and Zhou published in ICASSP 2022

April 28, 2022

Congrats to Gašper Beguš and Alan Zhou (undergraduate student in the Berkeley Speech & Computation Lab) who have been published in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing. Their article, titled "Interpreting Intermediate Convolutional Layers In Unsupervised Acoustic Word Classification," is freely available for a month here: https://ieeexplore.ieee.org/document/9746849

Beguš talks this week (including one tomorrow!)

March 31, 2022

Gašper Beguš gave two invited talks this week, and a third is coming up on Friday (in-person in Berkeley):

1. At the Harvard School of Engineering (Soft Math Lab): "Approaching unknown communication systems with unsupervised deep neural networks trained on speech." Link to the event description.

2. At École Normale Supérieure (Cognitive Machine Learning Team): "Deep Learning, Language Acquisition, Auditory Brainstem Response, and Phonology"

This Friday's talk:

3. At the UC Berkeley NLP seminar (hybrid): "Cognitive modeling, neural network interpretability, and GANs"
Friday, April 1, from 11-12 Pacific. This talk will be held in-person in South Hall 202.

Twenty-eight years of vowels

Gahl, Susanne
Baayen, Harald
2019

Research on age-related changes in speech has primarily focused on comparing “young” vs. “elderly” adults. Yet, listeners are able to guess talker age more accurately than a binary distinction would imply, suggesting that acoustic characteristics of speech change continually and gradually throughout adulthood. We describe acoustic properties of vowels produced by eleven talkers based on naturalistic speech samples spanning a period of 28 years, from ages 21 to 49. We find that the position of vowels in F1/F2 space shifts towards the periphery with increasing talker age. Based on...

The processing of pseudoword form and meaning in production and comprehension: A computational modeling approach using linear discriminative learning

Chuang, Y. Y., Vollmer, M. L., Shafaei-Bajestan, E., Gahl, S., Hendrix, P., & Baayen, R. H.
2020

Pseudowords have long served as key tools in psycholinguistic investigations of the lexicon. A common assumption underlying the use of pseudowords is that they are devoid of meaning: Comparing words and pseudowords may then shed light on how meaningful linguistic elements are processed differently from meaningless sound strings. However, pseudowords may in fact carry meaning. On the basis of a computational model of lexical processing, linear discriminative learning (LDL Baayen et al., Complexity, 2019, 1–39,...

Beguš speaks at ICON 2021

December 15, 2021

Gašper Beguš will give an invited lecture at ICON 2021: 18th International Conference on Natural Language Processing during a special session on the "Representation of speech, articulatory dynamics, prosody and language in layers." The talk is titled "Interpreting internal representations of deep convolutional neural networks trained on raw speech." More info is available here. Gašper can provide the link to anyone who would like to attend.

Beguš gives two invited talks

November 29, 2021

Gašper Beguš recently gave two invited talks—one at SRPP at Sorbonne Nouvelle (Paris III) and the other at Kuhl Lab Forum, University of Washington—both titled "Interpretable comparison between auditory brainstem response and intermediate convolutional layers in deep neural networks."

Beguš publishes in TACL

November 9, 2021

Gašper Beguš's paper "Identity-Based Patterns in Deep Convolutional Networks: Generative Adversarial Phonology and Reduplication" has just been published in Transactions of the Association for Computational Linguistics (TACL). It is available as an Open Access download here.

The paper was also presented at EMNLP 2021. The talk is recorded here.

Congrats, Gašper!

Modeling unsupervised phonetic and phonological learning in Generative Adversarial Phonology

Gašper Beguš
2019

This paper models phonetic and phonological learning as a dependency between random space and generated speech data in the Generative Adversarial Neural network architecture and proposes a methodology to uncover the network’s internal representation that corresponds to phonetic and phonological features. A Generative Adversarial Network (Goodfellow et al. 2014; implemented as WaveGAN for acoustic data by Donahue et al. 2019) was trained on an allophonic distribution in English, where voiceless stops surface as aspirated word-initially before stressed vowels except if preceded by a sibilant...