Computational and Experimental Methods

Bacon joins Google

October 8, 2020

Congrats to Geoff Bacon, who recently filed his dissertation Evaluating linguistic knowledge in neural networks and has just taken up a position as a computational linguist at Google!

Nichols colloquium

October 8, 2020

The 2020-2021 colloquium series kicks off this coming Monday, October 12, with a talk by Johanna Nichols (UC Berkeley), held via Zoom. The talk is entitled Proper measurement of linguistic complexity (and why it matters), and the abstract is as follows:

Hypotheses involving linguistic complexity generate interesting research in a variety of subfields – typology, historical linguistics, sociolinguistics, language acquisition, cognition, neurolinguistics, language processing, and others. Good measures of complexity in various linguistic domains are essential, then, but we have very few and those are mostly single-feature (chiefly size of phoneme inventory and morphemes per word in text).
In other ways as well what we have is not up to the task. The kind of complexity that is favored by certain sociolinguistic factors is not what is usually surveyed in studies invoking the sociolinguistic work. Phonological and morphological complexity are very strongly inversely correlated and form opposite worldwide frequency clines, yet surveys of just one or the other, or both lumped together, are used to support cross-linguistic generalizations about the distribution of complexity writ large. Complexity of derivation, syntax, and lexicon is largely unexplored. Measuring the complexity of polysynthetic languages in the right terms has not been seriously addressed.
This paper proposes a tripartite metric---enumerative, transparency-based, and relational---using a set of different assays across different parts of the grammar and lexicon, that addresses these problems and should help increase the grammatical sophistication of complexity-based hypotheses and choice of targets for computational extraction of complexity levels from corpora. Meeting current expectations of sustainability and replicability, the set is reusable, revealing, reasonably granular, and (at least mostly) amenable to computational implementation. I demonstrate its usefulness to typology and historical linguistics with some cross-linguistic and within-family surveys.

Beguš speaks at MIT

September 15, 2020

Gašper Beguš will be giving a talk at the CompLang group at MIT on Tuesday, September 22, at 5pm EDT (2pm Pacific) over Zoom (p/w "Language"). Here is the title and abstract:

Modeling Language with Generative Adversarial Networks

In this talk, I argue that speech acquisition can be modeled with deep convolutional networks within the Generative Adversarial Networks framework. A proposed technique for retrieving internal representations that are phonetically or phonologically meaningful (Beguš 2020) allows us to model several processes in speech and compare outputs of the models both behaviorally as well as in terms of representation learning. The networks not only represent phonetic units with discretized representations (resembling the phonemic level), but also learn to encode phonological processes (resembling rule-like computation). I further propose an extension of the GAN architecture in which learning of meaningful linguistic units emerges from a requirement that the networks output informative data. I briefly present five case studies (allophonic learning, lexical learning, reduplication, iterative learning, and artificial grammar experiments) and argue that correspondence between single latent variables and meaningful linguistic content emerges. The key strategy to elicit the underlying linguistic values of latent variables is to manipulate them well outside of the training range; this allows us to actively force desired features in the output and test what types of dependencies deep convolutional networks can and cannot learn.

The advantage of this proposal is that speech acquisition is modeled in an unsupervised manner from raw acoustic data and that deep convolutional networks output not replicated, but innovative data. These innovative outputs are structured, linguistically interpretable, and highly informative. Training networks on speech data thus not only informs models of language acquisition, but also provides insights into how deep convolutional networks learn internal representations. I will also make a case that higher levels of representation such as morphology, syntax and lexical semantics can be modeled from raw acoustic data with this approach and outline directions for further experiments.

Bleaman to appear in Frontiers in Artificial Intelligence

April 23, 2020

Congrats to Isaac Bleaman, whose article "Implicit standardization in a minority language community: Real-time syntactic change among Hasidic Yiddish writers" has been accepted for publication at Frontiers in Artificial Intelligence. The article will appear in the section Language and Computation as part of the research topic in Computational Sociolinguistics. Read the abstract here!

Berkeley @ Hispanic Linguistics Symposium

October 31, 2019

Ernesto Gutiérrez Topete writes to share news of a number of Berkeley talks recently presented at the Hispanic Linguistics Symposium, University of Texas at El Paso, on Oct. 24-26, 2019:

Ben Papadopoulos: Morphological gender innovations in Spanish of non-binary speakers Justin Davidson: La [v]ariebilidad sociofonética en el español de California: Social and Linguistic Underpinnings of the Labiodentalization of /b/ Ernesto Gutiérrez Topete: Influence from English on the production of the /tl/ cluster by Mexican Spanish-English bilinguals Gabriella Licata, Annie Helms, Rachel Weiher: Merger in production and perception? Bilingual discrimination of Spanish [β] and [v] Justin Davidson: Navigating the Statistical Tides: An R Tutorial for the Non-Coding-Inclined [workshop]

Congrats all!

Research group meetings & talk series this semester

September 5, 2019
Calques has been made aware of the following research groups and talk series meeting this semester: Experimental Phonology Working Group -- meeting on Mondays, 10:30-11:30am, in Dwinelle 1226. The first meeting will be Monday, September 9. Contact Jesse Zymet for more information. Fieldwork Forum -- meeting on Thursdays, 3:40-5:00pm, in Dwinelle 1303. Organized by Edwin Ko, Emily Drummond and Wesley dos Santos. More info on the website: Fieldwork Forum Gesture and Multimodality Group -- meeting certain Fridays, 9-11am. Contact Eve Sweetser for more information. Group in American Indian Languages -- meeting dates and times TBD; contact Zach O'Hagan for more information. Language Revitalization Working Group -- meeting Thursdays 1-2pm, in Dwinelle 3401. More info on the website: Language Revitalization Working Group Metaphor Group -- meeting times TBD; contact Eve Sweetser for more information. Phorum -- meeting Mondays 12-1pm, in 1229 Dwinelle. Organized by Emily Grabowski and Yevgeniy Melguy. More info on the website: Phorum Society of Linguistics Undergraduates Students (SLUgS) -- meeting certain Thursdays 5pm Sociolinguistics lab -- meeting on certain Tuesdays, 3:30-5pm, in Dwinelle 1229. The first meeting will be Tuesday, September 10. Contact Isaac Bleaman for more information. Syntax & Semantics Circle -- meeting on Fridays, 3-4:30pm, in Dwinelle 1303. Organized by Tessa Scott & Schuyler Laparle. More info on the website: Syntax and Semantics Circle

Goldrick colloquium

April 10, 2019

The 2018-2019 colloquium series continues this coming Monday, April 15, with a talk by Matt Goldrick (Northwestern). Same time as always, same place as always: 3:10-5 p.m., 370 Dwinelle Hall. The talk is entitled Integration and Segregation in Bilingual Sound Structure Processing, and the abstract is as follows:

A key question in theories of language structure and processing is the degree to which different aspects of linguistic knowledge are processed independently or interactively. I'll discuss ongoing work that has examined these issues in the context of bilingual sound structure processing. When producing tongue twisters, bilinguals produce more overt, sound-category-changing speech errors than monolinguals, specifically within nonsense words consisting of language-unique sound structures (e.g., for Spanish-English bilinguals, nonce forms with initial /s/-stop clusters, which are found only in English). However, while 'shared' speech sound categories (e.g., initial stops) are less susceptible to overt errors, they are the locus of within-category deviations in phonetic properties -- an effect which may be magnified in cognate forms (e.g., teléfono/telephone for Spanish-English bilinguals). This suggests a model incorporating integration as well as segregation of sound structure and lexical knowledge, both within and across languages.

Open house colloquium

February 27, 2019

This Monday we will have a series of presentations by current graduate students in the colloquium spot -- 3:10-5pm, 370 Dwinelle:

Alice Shen: Pitch cues in the perception of code switching Amalia Skilton:Speaker and addressee in spatial deixis: Experimental evidence from Ticuna and Dutch Emily Clem:The cyclic nature of Agree: Maximal projections as probes Myriam Lapierre:Two types of [NT]s in Panãra: Evidence from production and perception