Computational and Experimental Methods

Beguš speaks at ICBS

February 23, 2021

Gašper Beguš will be giving a seminar talk at UC Berkeley's Institute of Cognitive and Brain Sciences on Friday, March 5, from 11:10am to 12pm. The title of his talk is "Modeling Language with Generative Adversarial Networks" and the abstract is below. Click here for more details. Congrats, Gašper!

Can we build models of language acquisition from raw acoustic data in an unsupervised manner? Can deep convolutional neural networks learn to generate speech using linguistically meaningful representations? In this talk, I will argue that language acquisition can be modeled with Generative Adversarial Networks (GANs) and that such modeling has implications both for the understanding of language acquisition and for the understanding of how neural networks learn internal representations. I propose a technique that allows us to wug-test neural networks trained on raw speech. I further propose an extension of the GAN architecture in which learning of meaningful linguistic units emerges from a requirement that the networks output informative data. With this model, we can test what the networks can and cannot learn, how their biases match human learning biases (by comparing behavioral data with networks’ outputs), how they represent linguistic structure internally, and what GAN's innovative outputs can teach us about productivity in human language. This talk also makes a more general case for probing deep neural networks with raw speech data, as dependencies in speech are often better understood than those in the visual domain and because behavioral data on speech acquisition are relatively easily accessible.

Beguš speaks at UC Davis PhonLab

November 4, 2020

Gašper Beguš will be speaking at the UC Davis PhonLab on Friday, Nov 6 at 10AM on the topic "Encoding linguistic meaning into raw audio data with deep neural networks."

Bacon joins Google

October 8, 2020

Congrats to Geoff Bacon, who recently filed his dissertation Evaluating linguistic knowledge in neural networks and has just taken up a position as a computational linguist at Google!

Nichols colloquium

October 8, 2020

The 2020-2021 colloquium series kicks off this coming Monday, October 12, with a talk by Johanna Nichols (UC Berkeley), held via Zoom. The talk is entitled Proper measurement of linguistic complexity (and why it matters), and the abstract is as follows:

Hypotheses involving linguistic complexity generate interesting research in a variety of subfields – typology, historical linguistics, sociolinguistics, language acquisition, cognition, neurolinguistics, language processing, and others. Good measures of complexity in various linguistic domains are essential, then, but we have very few and those are mostly single-feature (chiefly size of phoneme inventory and morphemes per word in text).
In other ways as well what we have is not up to the task. The kind of complexity that is favored by certain sociolinguistic factors is not what is usually surveyed in studies invoking the sociolinguistic work. Phonological and morphological complexity are very strongly inversely correlated and form opposite worldwide frequency clines, yet surveys of just one or the other, or both lumped together, are used to support cross-linguistic generalizations about the distribution of complexity writ large. Complexity of derivation, syntax, and lexicon is largely unexplored. Measuring the complexity of polysynthetic languages in the right terms has not been seriously addressed.
This paper proposes a tripartite metric---enumerative, transparency-based, and relational---using a set of different assays across different parts of the grammar and lexicon, that addresses these problems and should help increase the grammatical sophistication of complexity-based hypotheses and choice of targets for computational extraction of complexity levels from corpora. Meeting current expectations of sustainability and replicability, the set is reusable, revealing, reasonably granular, and (at least mostly) amenable to computational implementation. I demonstrate its usefulness to typology and historical linguistics with some cross-linguistic and within-family surveys.

Beguš speaks at MIT

September 15, 2020

Gašper Beguš will be giving a talk at the CompLang group at MIT on Tuesday, September 22, at 5pm EDT (2pm Pacific) over Zoom (p/w "Language"). Here is the title and abstract:

Modeling Language with Generative Adversarial Networks

In this talk, I argue that speech acquisition can be modeled with deep convolutional networks within the Generative Adversarial Networks framework. A proposed technique for retrieving internal representations that are phonetically or phonologically meaningful (Beguš 2020) allows us to model several processes in speech and compare outputs of the models both behaviorally as well as in terms of representation learning. The networks not only represent phonetic units with discretized representations (resembling the phonemic level), but also learn to encode phonological processes (resembling rule-like computation). I further propose an extension of the GAN architecture in which learning of meaningful linguistic units emerges from a requirement that the networks output informative data. I briefly present five case studies (allophonic learning, lexical learning, reduplication, iterative learning, and artificial grammar experiments) and argue that correspondence between single latent variables and meaningful linguistic content emerges. The key strategy to elicit the underlying linguistic values of latent variables is to manipulate them well outside of the training range; this allows us to actively force desired features in the output and test what types of dependencies deep convolutional networks can and cannot learn.

The advantage of this proposal is that speech acquisition is modeled in an unsupervised manner from raw acoustic data and that deep convolutional networks output not replicated, but innovative data. These innovative outputs are structured, linguistically interpretable, and highly informative. Training networks on speech data thus not only informs models of language acquisition, but also provides insights into how deep convolutional networks learn internal representations. I will also make a case that higher levels of representation such as morphology, syntax and lexical semantics can be modeled from raw acoustic data with this approach and outline directions for further experiments.

Bleaman to appear in Frontiers in Artificial Intelligence

April 23, 2020

Congrats to Isaac Bleaman, whose article "Implicit standardization in a minority language community: Real-time syntactic change among Hasidic Yiddish writers" has been accepted for publication at Frontiers in Artificial Intelligence. The article will appear in the section Language and Computation as part of the research topic in Computational Sociolinguistics. Read the abstract here!

Berkeley @ Hispanic Linguistics Symposium

October 31, 2019

Ernesto Gutiérrez Topete writes to share news of a number of Berkeley talks recently presented at the Hispanic Linguistics Symposium, University of Texas at El Paso, on Oct. 24-26, 2019:

  • Ben Papadopoulos: Morphological gender innovations in Spanish of non-binary speakers
  • Justin Davidson: La [v]ariebilidad sociofonética en el español de California: Social and Linguistic Underpinnings of the Labiodentalization of /b/
  • Ernesto Gutiérrez Topete: Influence from English on the production of the /tl/ cluster by Mexican Spanish-English bilinguals
  • Gabriella Licata, Annie Helms, Rachel Weiher: Merger in production and perception? Bilingual discrimination of Spanish [β] and [v]
  • Justin Davidson: Navigating the Statistical Tides: An R Tutorial for the Non-Coding-Inclined [workshop]

Congrats all!

Research group meetings & talk series this semester

September 5, 2019
Calques has been made aware of the following research groups and talk series meeting this semester:
  • Experimental Phonology Working Group  --  meeting on Mondays, 10:30-11:30am, in Dwinelle 1226. The first meeting will be Monday, September 9. Contact Jesse Zymet for more information.
  • Fieldwork Forum  -- meeting on Thursdays, 3:40-5:00pm, in Dwinelle 1303. Organized by Edwin Ko, Emily Drummond and Wesley dos Santos. More info on the website: Fieldwork Forum
  • Gesture and Multimodality Group -- meeting certain Fridays, 9-11am. Contact Eve Sweetser for more information.
  • Group in American Indian Languages -- meeting dates and times TBD; contact Zach O'Hagan for more information.
  • Language Revitalization Working Group  -- meeting Thursdays 1-2pm, in Dwinelle 3401. More info on the website: Language Revitalization Working Group
  • Metaphor Group -- meeting times TBD; contact Eve Sweetser for more information.
  • Phorum  -- meeting Mondays 12-1pm, in 1229 Dwinelle. Organized by Emily Grabowski and Yevgeniy Melguy. More info on the website: Phorum
  • Society of Linguistics Undergraduates Students (SLUgS) -- meeting certain Thursdays 5pm
  • Sociolinguistics lab -- meeting on certain Tuesdays, 3:30-5pm, in Dwinelle 1229. The first meeting will be Tuesday, September 10. Contact Isaac Bleaman for more information.
  • Syntax & Semantics Circle  -- meeting on Fridays, 3-4:30pm, in Dwinelle 1303. Organized by Tessa Scott & Schuyler Laparle. More info on the website: Syntax and Semantics Circle