Language and Cognition

Twenty-eight years of vowels

Gahl, Susanne
Baayen, Harald

Research on age-related changes in speech has primarily focused on comparing “young” vs. “elderly” adults. Yet, listeners are able to guess talker age more accurately than a binary distinction would imply, suggesting that acoustic characteristics of speech change continually and gradually throughout adulthood. We describe acoustic properties of vowels produced by eleven talkers based on naturalistic speech samples spanning a period of 28 years, from ages 21 to 49. We find that the position of vowels in F1/F2 space shifts towards the periphery with increasing talker age. Based on...

Didn't hear that coming: Effects of withholding phonetic cues to code-switching.

Alice Shen
Gahl, Susanne
Johnson, Keith

Code-switching has been found to incur a processing cost in auditory comprehension. However, listeners may have access to anticipatory phonetic cues to code-switches (Piccinini & Garellek, 2014; Fricke et al., 2016), thus mitigating switch cost. We investigated effects of withholding anticipatory phonetic cues on code-switched word recognition by splicing English-to-Mandarin code-switches into unilingual English sentences. In a concept monitoring experiment, Mandarin–English bilinguals took longer to recognize code-switches, suggesting a switch cost. In an eye tracking experiment, the...

The processing of pseudoword form and meaning in production and comprehension: A computational modeling approach using linear discriminative learning

Chuang, Y. Y., Vollmer, M. L., Shafaei-Bajestan, E., Gahl, S., Hendrix, P., & Baayen, R. H.

Pseudowords have long served as key tools in psycholinguistic investigations of the lexicon. A common assumption underlying the use of pseudowords is that they are devoid of meaning: Comparing words and pseudowords may then shed light on how meaningful linguistic elements are processed differently from meaningless sound strings. However, pseudowords may in fact carry meaning. On the basis of a computational model of lexical processing, linear discriminative learning (LDL Baayen et al., Complexity, 2019, 1–39,...

Berkeley linguists published in PNAS

December 8, 2021

A new article has been published in Proceedings of the National Academy of Sciences, co-authored by four current and former Berkeley linguists (the middle four authors). Congrats, all!

Francis Mollica, Geoff Bacon (PhD 2020), Noga Zaslavsky, Yang Xu, Terry Regier, and Charles Kemp. (2021). The forms and meanings of grammatical markers support efficient communication. Proceedings of the National Academy of Sciences, 118, e2025993118. [Preprint]

Identity-Based Patterns in Deep Convolutional Networks: Generative Adversarial Phonology and Reduplication

Gašper Beguš

This paper models unsupervised learning of an identity-based pattern (or copying) in speech called reduplication from raw continuous data with deep convolutional neural networks. We use the ciwGAN architecture (Beguš, 2021a) in which learning of meaningful representations in speech emerges from a requirement that the CNNs generate informative data. We propose a technique to wug-test CNNs trained on speech and, based on four generative tests, argue that the network learns to represent an identity-based pattern in its latent space. By manipulating only two...

Generative Adversarial Phonology: Modeling Unsupervised Phonetic and Phonological Learning With Neural Networks

Gašper Beguš

Training deep neural networks on well-understood dependencies in speech data can provide new insights into how they learn internal representations. This paper argues that acquisition of speech can be modeled as a dependency between random space and generated speech data in the Generative Adversarial Network architecture and proposes a methodology to uncover the network's internal representations that correspond to phonetic and phonological properties. The Generative Adversarial architecture is uniquely appropriate for modeling phonetic and phonological learning because the network is...

Dąbkowski speaks at CUNY 2021

February 16, 2021

Congrats to Maksymilian Dąbkowski, who will be presenting at the 34th Annual CUNY Conference on Human Sentence Processing (Thursday, March 4, at 3:45pm ET). The title of his talk is "Evidence of accurate logical reasoning in online sentence comprehension" and it is a collaboration with Roman Feiman.

Nichols colloquium

October 8, 2020

The 2020-2021 colloquium series kicks off this coming Monday, October 12, with a talk by Johanna Nichols (UC Berkeley), held via Zoom. The talk is entitled Proper measurement of linguistic complexity (and why it matters), and the abstract is as follows:

Hypotheses involving linguistic complexity generate interesting research in a variety of subfields – typology, historical linguistics, sociolinguistics, language acquisition, cognition, neurolinguistics, language processing, and others. Good measures of complexity in various linguistic domains are essential, then, but we have very few and those are mostly single-feature (chiefly size of phoneme inventory and morphemes per word in text).
In other ways as well what we have is not up to the task. The kind of complexity that is favored by certain sociolinguistic factors is not what is usually surveyed in studies invoking the sociolinguistic work. Phonological and morphological complexity are very strongly inversely correlated and form opposite worldwide frequency clines, yet surveys of just one or the other, or both lumped together, are used to support cross-linguistic generalizations about the distribution of complexity writ large. Complexity of derivation, syntax, and lexicon is largely unexplored. Measuring the complexity of polysynthetic languages in the right terms has not been seriously addressed.
This paper proposes a tripartite metric---enumerative, transparency-based, and relational---using a set of different assays across different parts of the grammar and lexicon, that addresses these problems and should help increase the grammatical sophistication of complexity-based hypotheses and choice of targets for computational extraction of complexity levels from corpora. Meeting current expectations of sustainability and replicability, the set is reusable, revealing, reasonably granular, and (at least mostly) amenable to computational implementation. I demonstrate its usefulness to typology and historical linguistics with some cross-linguistic and within-family surveys.