News

All News

November 4, 2020

Gašper Beguš will be speaking at the UC Davis PhonLab on Friday, Nov 6 at 10AM on the topic "Encoding linguistic meaning into raw audio data with deep neural networks."

November 2, 2020

The 2020-2021 colloquium series continues on Monday, Nov 9, with a talk by David Goldstein (UCLA), held via Zoom from 3:10-5pm. The talk is entitled "Correlated grammaticalization: The rise of articles in Indo-European," and the abstract is as follows:

One of the central empirical goals of historical linguistics is to distinguish probable from improbable changes. This includes not only singleton developments, but also interactions among multiple changes. That is, does one linguistic change become more (or less) likely given the occurrence of some other change? Investigations of this question have been hampered by methodological issues, not the least of which is how exactly correlations between changes should be measured. In this talk, I take up the question of the relationship between the grammaticalization of definite and indefinite articles in Indo-European. Did the emergence of one type of article facilitate (or inhibit) the rise of the other? Using methods developed for the study of correlated evolution in biology (Pagel 1994, 2006), I argue that indefinite articles became more likely to emerge in the wake of the grammaticalization of definite articles. The history of articles in Indo-European is thus an example of correlated grammaticalization. More generally, my results provide further evidence for the view that grammaticalization is not solely a matter of universal principles (e.g., van Gelderen 2011, 2019), but is also crucially conditioned by pre-existing linguistic structure (e.g., Kiparsky 2012, Goldstein 2019).

October 30, 2020

In and around the linguistics department in the next week:

  • Syntax and Semantics Circle - Friday Oct 30 - Zoom - 3-4:30PM
    NELS practice talks:
    Amy Rose Deal (UC Berkeley): 3-on-3 restrictions and PCC typology
    - Khanin Chaipet (Stony Brook) and Peter Jenks (UC Berkeley): Names as complex indices: On apparent Condition C violations in Thai
    Edwin Ko (UC Berkeley): Feeding agreement: Anti-locality in Crow applicatives of unaccusatives
  • Fieldwork Forum - Wednesday Nov 4 - Zoom - 3:10-4pm.
    Ignacio Montoya (University of Nevada): Reflections on Numu language (Northern Paiute) classes at the university level: Decolonial strategies within a colonial context and implications for language revitalization theory
  • Zoom Phonology- Thursday, November 5th- Zoom - 9:00-10:00am PDT
    Larry Hyman (Berkeley): Tone in Runyankore Verb Stem Reduplication
    For the zoom link or to be added to the Zoom Phonology Mailing List, contact karee_garvin@berkeley.edu
  • Phorum - Friday Nov 6 - 3-4pm
    Ana Lívia Agostinho (Universidade Federal de Santa Catarina, Brazil): Word-prosody in Lung’Ie: One system or two? (Collaboration with Larry Hyman)
    Email Anna Björklund or Dakota Robinson for the Zoom link and/or to be added to the mailing list.

October 28, 2020

The 2020-2021 colloquium series continues on Monday, Nov 2, with a talk by Kristen Syrett (Rutgers), held via Zoom from 3:10-4:30. The talk is entitled What partial objects tell us about context in nominal semantics, and the abstract is as follows:

Becoming a proficient speaker requires recognizing that the context in which we deliver our utterances affects meaning, even at the lexical level. This influence of context is well known for indexicals like now or pronouns like I or you, gradable adjectives such as big, or predicates of personal taste such as fun, which encode context directly into their semantic representation. Chierchia 2010, Landman 2011, and Rothstein 2010 have proposed that context also plays a key role in the interpretation of count nouns. These proposals not only have implications for lexical representations, but also for the process of language acquisition: what does it mean for children to know that nouns like cup and ball—which are among the earliest words a child comprehends and produces—are context-dependent, and what are the observable consequences? To date, no systematic experimental work has targeted this position on the semantics of nouns or the developmental implications, despite a growing body of work targeting these other context-dependent expressions.

I present a set of studies from an ongoing collaboration with Athulya Aravind (MIT) targeting children’s and adults’ treatment of partial and whole objects as means to probing nominal semantics. We take as a starting point two separate lines of research, investigating and extending them in parallel. The first is a well-known and oft-replicated finding from Shipley & Shepperson (1990): when young children are presented with a set of partial and whole objects (like forks) and are asked to count or quantify them, they count the partial objects as if they were wholes. The second is experimental research on gradability in the adjectival domain (Syrett, Kennedy, & Lidz, 2010). Integrating our results, we argue that while some researchers have taken children’s non-adult-like counting and quantifying behavior with discrete partial objects to signal either a shift in conceptual development or a lack of knowledge of lexical alternatives that implicates pragmatics, the results are consistent with children’s developing understanding of how nominal semantics shrinks or expands the domain of application, and where children diverge from adults is in the ability to identify speaker intentions in a discourse context. Moreover, while nouns may depend on the context, they do so in a way that is distinctly different from relative gradable adjectives—which encode a context-dependent standard—instead align with absolute gradable adjectives. Taken together, the findings indicate that even the most basic count nouns depend on the discourse context for interpretation. While adults seem to know this, it is something that children gradually come to recognize, as they become increasingly sensitive to the goals of communication.

October 21, 2020

In and around the linguistics department in the next week:

October 19, 2020

The 2020-2021 colloquium series continues on Monday, October 26, with a talk by Juliet Stanton (NYU), held via Zoom from 3:10-4:30. The talk is entitled Rhythm is gradient: evidence from -ative and -ization, and the abstract is as follows:

The rhythmic constraints *Clash and *Lapse are commonly assumed to evaluate syllable-sized constituents: a sequence of two adjacent stressed syllables (óó) violates *Clash, while a sequence of two stressed syllables, separated by two stressless syllables (óooó), violates *Lapse (see e.g. Prince 1983, Gordon 2002 for *Clash; Green & Kenstowicz 1995, Gordon 2002 for *Lapse). In this talk I propose that *Clash and *Lapse can be evaluated gradiently: speakers calculate violations off of a phonetically realized output representation. The closer the two stressed syllables, the greater the violation of gradient *Clash; the further away the two stressed syllables, the greater the violation of gradient *Lapse. Evidence for this claim comes from patterns of secondary stress in Am. English -ative and -ization: in both classes of forms, the inner suffix (-ate and -ize) is more likely to bear stress the further away it is from the rightmost stem stress. Time permitting, we will discuss other sources of evidence for gradient rhythm, including Am. English post-tonic syncope (Hooper 1978), the rhythm rule (e.g. Hayes 1984), and secondary stress in Russian compounds (Gouskova & Roon 2013).

October 16, 2020

We are delighted to announce that Hannah Sande will be joining Berkeley Linguistics in January 2021! Hannah will begin teaching classes in Fall 2021, and will spend spring semester advising (remotely), and doing research.

In and around the linguistics department in the next week:

October 15, 2020

Congrats to researcher Bernat Bardagil, whose article Number morphology in Panará has just appeared in Linguistic Variation 20:2!

Here's the latest from the Survey of California and Other Indian Languages:

October 9, 2020

In and around the linguistics department in the next week:

October 8, 2020

Congrats to Geoff Bacon, who recently filed his dissertation Evaluating linguistic knowledge in neural networks and has just taken up a position as a computational linguist at Google!

The program for the 51th annual meeting of the North East Linguistic Society (to be hosted virtually by the Université de Quebec à Montreal) has just been released, promising the following presentations by current department members and recent alumni:

  • Amy Rose Deal: 3-on-3 restrictions and PCC typology
  • Peter Jenks: Names as complex indices: On apparent Condition C violations in Thai
  • Laura Kalin and Nicholas Rolle (PhD '18): Deconstructing subcategorization: Conditions on insertion vs. position
  • Edwin Ko: Feeding agreement: Anti-locality in Crow applicatives of unaccusatives

Congrats all!

The 2020-2021 colloquium series kicks off this coming Monday, October 12, with a talk by Johanna Nichols (UC Berkeley), held via Zoom. The talk is entitled Proper measurement of linguistic complexity (and why it matters), and the abstract is as follows:

Hypotheses involving linguistic complexity generate interesting research in a variety of subfields – typology, historical linguistics, sociolinguistics, language acquisition, cognition, neurolinguistics, language processing, and others. Good measures of complexity in various linguistic domains are essential, then, but we have very few and those are mostly single-feature (chiefly size of phoneme inventory and morphemes per word in text).
In other ways as well what we have is not up to the task. The kind of complexity that is favored by certain sociolinguistic factors is not what is usually surveyed in studies invoking the sociolinguistic work. Phonological and morphological complexity are very strongly inversely correlated and form opposite worldwide frequency clines, yet surveys of just one or the other, or both lumped together, are used to support cross-linguistic generalizations about the distribution of complexity writ large. Complexity of derivation, syntax, and lexicon is largely unexplored. Measuring the complexity of polysynthetic languages in the right terms has not been seriously addressed.
This paper proposes a tripartite metric---enumerative, transparency-based, and relational---using a set of different assays across different parts of the grammar and lexicon, that addresses these problems and should help increase the grammatical sophistication of complexity-based hypotheses and choice of targets for computational extraction of complexity levels from corpora. Meeting current expectations of sustainability and replicability, the set is reusable, revealing, reasonably granular, and (at least mostly) amenable to computational implementation. I demonstrate its usefulness to typology and historical linguistics with some cross-linguistic and within-family surveys.

October 2, 2020

Updates from the Survey of California and Other Indian Languages:

  • Isabel Lazo Martínez, Efraín Lazo Pérez, Trinidad Martínez Soza, Julia Nee and Celine Rezvani publish Beniit kon xpejigan: Te libr ka didxza kon dixtil le’enin te rului’in dnumbr ('Benita and Her Balloons: A Book Written in Zapotec and Spanish that Teaches Numbers'), the second in our new series Publications in Language Maintenance and Reclamation. (Interested in contributing? Write to us at scoil-ling@berkeley.edu.)
  • New materials from the winter-spring 1971 and 1979 graduate field methods classes on Cochabamba Quechua are available (here and here). The first was taught by James Matisoff, with consultant Jaime Daza; the second was taught by Leanne Hinton, with consultant Ditri Daza. (See here for a summary history of field methods instruction in our department.)
  • We've digitized two manuscripts by Joseph Davidson, Jr. (PhD 1977), his 'special field statement' (1974) and On the Genetic Relationship of Aymara and Inka, indigenous language families of the Andes. Dr. Davidson's dissertation was A Contrastive Study of the Grammatical Structures of Aymara and Cuzco Kechua.
  • Monica Macaulay (PhD 1987) has archived over 1,150 pages of original field notes and 43 cassettes of sound recordings (from 1992) of Chalcatongo and other varieties of Mixtec (Oto-Manguean, Mexico). We added most of the notes to her paper collection, where they join more than 500 pages of typed versions of some of the same notes (everything now scanned and available online), and the recordings to her audio collection, where they join earlier ones done on reel-to-reel tape (from 1982). The remainder of the field notes, which span the period 1981-1992, we added to a new collection documenting the Berkeley field methods classes on the language in 1981 and 1982, with speaker Nicolás Cortés and instructor Leanne Hinton, which were the impetus for Prof. Macaulay's fieldwork in Oaxaca in 1982, 1985, and 1992, primarily with speakers Margarita Cuevas Cortés and Crescenciano Ruiz Ramírez. Sound recordings from the 1985 field trip, done with Prof. Hinton, are in this collection. Macaulay's dissertation was titled Morphology and Cliticization in Chalcatongo Mixtec. The students in the first field methods class were Mariscela Amador-Hernández (PhD 1988), Claudia Brugman (PhD 1988), Nicholas Faraclas (PhD 1989), Gerd Fischer, and Martha Macri (PhD 1988).

October 1, 2020

In and around the linguistics department in the next week:

September 25, 2020

In and around the linguistics department in the next week:

September 18, 2020

In and around the linguistics department in the next week:

September 15, 2020

Gašper Beguš will be giving a talk at the CompLang group at MIT on Tuesday, September 22, at 5pm EDT (2pm Pacific) over Zoom (p/w "Language"). Here is the title and abstract:

Modeling Language with Generative Adversarial Networks

In this talk, I argue that speech acquisition can be modeled with deep convolutional networks within the Generative Adversarial Networks framework. A proposed technique for retrieving internal representations that are phonetically or phonologically meaningful (Beguš 2020) allows us to model several processes in speech and compare outputs of the models both behaviorally as well as in terms of representation learning. The networks not only represent phonetic units with discretized representations (resembling the phonemic level), but also learn to encode phonological processes (resembling rule-like computation). I further propose an extension of the GAN architecture in which learning of meaningful linguistic units emerges from a requirement that the networks output informative data. I briefly present five case studies (allophonic learning, lexical learning, reduplication, iterative learning, and artificial grammar experiments) and argue that correspondence between single latent variables and meaningful linguistic content emerges. The key strategy to elicit the underlying linguistic values of latent variables is to manipulate them well outside of the training range; this allows us to actively force desired features in the output and test what types of dependencies deep convolutional networks can and cannot learn.

The advantage of this proposal is that speech acquisition is modeled in an unsupervised manner from raw acoustic data and that deep convolutional networks output not replicated, but innovative data. These innovative outputs are structured, linguistically interpretable, and highly informative. Training networks on speech data thus not only informs models of language acquisition, but also provides insights into how deep convolutional networks learn internal representations. I will also make a case that higher levels of representation such as morphology, syntax and lexical semantics can be modeled from raw acoustic data with this approach and outline directions for further experiments.