Computational and Experimental Methods

Toward understanding the communication in sperm whales

J. Andreas
Gašper Beguš
M. Bronstein
R. Diamant
D. Delaney
S. Gero
S. Goldwasser
D. Gruber
S. de Haas
P. Malkin
N. Pavlov
R. Payne
G. Petri
D. Rus
P. Sharma
D. Tchernov
P. Tønnesen
A. Torralba
D. Vogt
R. Wood
2022

Machine learning has been advancing dramatically over the past decade. Most strides are human-based applications due to the availability of large-scale datasets; however, opportunities are ripe to apply this technology to more deeply understand non-human communication. We detail a scientific roadmap for advancing the understanding of communication of whales that can be built further upon as a template to decipher other forms of animal and non-human communication. Sperm whales, with their highly developed neuroanatomical features, cognitive abilities, social structures, and discrete...

Distinguishing cognitive from historical influences in phonology

Gašper Beguš
2022

Distinguishing cognitive influences from historical influences on human behavior has long been a disputed topic in behavioral sciences, including linguistics. The discussion is often complicated due to empirical evidence being consistent with both the cognitive and the historical approach. This article argues that phonology offers a unique test case for distinguishing historical and cognitive influences on grammar, and it proposes an experimental technique for testing the cognitive factor which controls for the historical factor. The article outlines a model called catalysis for...

Interpreting Intermediate Convolutional Layers In Unsupervised Acoustic Word Classification

Gašper Beguš
Alan Zhou
2022

Understanding how deep convolutional neural networks classify data has been subject to extensive research. This paper proposes a technique to visualize and interpret intermediate layers of unsupervised deep convolutional networks by averaging over individual feature maps in each convolutional layer and inferring underlying distributions of words with non-linear regression techniques. A GAN-based architecture (ciwGAN [1]) that includes a Generator, a Discriminator, and a classifier was trained on unlabeled sliced lexical items from TIMIT. The training process results in a deep...

Terry Regier

Chair of Linguistics, Professor of Linguistics and Cognitive Science

PhD, UC Berkeley

Language and cognition; semantic variation and universals; computational linguistics

Implicit Standardization in a Minority Language Community: Real-Time Syntactic Change among Hasidic Yiddish Writers

Isaac L. Bleaman
2020

The recent turn to "big data" from social media corpora has enabled sociolinguists to investigate patterns of language variation and change at unprecedented scales. However, research in this paradigm has been slow to address variable phenomena in minority languages, where data scarcity and the absence of computational tools (e.g., taggers, parsers) often present significant barriers to entry. This article analyzes socio-syntactic variation in one minority language variety, Hasidic Yiddish, focusing on a variable for which tokens can be identified in raw text using purely morphological...

The Gettysburg Corpus: Testing the Proposition That All Tense /æ/s Are Created Equal

Isaac L. Bleaman
Daniel Duncan
2021

Corpus studies of regional variation using raw language data from the internet focus predominantly on lexical variables in writing. However, online repositories such as YouTube offer the possibility of investigating regional differences using phonological variables, as well. This article demonstrates the viability of constructing a naturalistic speech corpus for sociophonetic research by analyzing hundreds of recitations of Abraham Lincoln's Gettysburg Address. We first replicate a known result of phonetic research, namely, that English vowels are longer in duration before voiced...

Beguš and Zhou published in ICASSP 2022

April 28, 2022

Congrats to Gašper Beguš and Alan Zhou (undergraduate student in the Berkeley Speech & Computation Lab) who have been published in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing. Their article, titled "Interpreting Intermediate Convolutional Layers In Unsupervised Acoustic Word Classification," is freely available for a month here: https://ieeexplore.ieee.org/document/9746849

Beguš talks this week (including one tomorrow!)

March 31, 2022

Gašper Beguš gave two invited talks this week, and a third is coming up on Friday (in-person in Berkeley):

1. At the Harvard School of Engineering (Soft Math Lab): "Approaching unknown communication systems with unsupervised deep neural networks trained on speech." Link to the event description.

2. At École Normale Supérieure (Cognitive Machine Learning Team): "Deep Learning, Language Acquisition, Auditory Brainstem Response, and Phonology"

This Friday's talk:

3. At the UC Berkeley NLP seminar (hybrid): "Cognitive modeling, neural network interpretability, and GANs"
Friday, April 1, from 11-12 Pacific. This talk will be held in-person in South Hall 202.

Twenty-eight years of vowels

Gahl, Susanne
Baayen, Harald
2019

Research on age-related changes in speech has primarily focused on comparing “young” vs. “elderly” adults. Yet, listeners are able to guess talker age more accurately than a binary distinction would imply, suggesting that acoustic characteristics of speech change continually and gradually throughout adulthood. We describe acoustic properties of vowels produced by eleven talkers based on naturalistic speech samples spanning a period of 28 years, from ages 21 to 49. We find that the position of vowels in F1/F2 space shifts towards the periphery with increasing talker age. Based on...

The processing of pseudoword form and meaning in production and comprehension: A computational modeling approach using linear discriminative learning

Chuang, Y. Y., Vollmer, M. L., Shafaei-Bajestan, E., Gahl, S., Hendrix, P., & Baayen, R. H.
2020

Pseudowords have long served as key tools in psycholinguistic investigations of the lexicon. A common assumption underlying the use of pseudowords is that they are devoid of meaning: Comparing words and pseudowords may then shed light on how meaningful linguistic elements are processed differently from meaningless sound strings. However, pseudowords may in fact carry meaning. On the basis of a computational model of lexical processing, linear discriminative learning (LDL Baayen et al., Complexity, 2019, 1–39,...