Sexuality, gender, and the voice in (Bay Area) English

Abstract: 

The acoustic signal is the carrier of not only linguistic (as in, what the interlocutor intends to communicate) but also socio-cultural meaning (as in, who a speaker-listener is). The feld of sociophonetics hinges on the inextricability of these two facets of speech sounds. It has been established that speakers can both intentionally and unintentionally communicate aspects of social identity, including gender and sexuality, in the subtle distinctions present in their production of sounds in a given language system. In turn, listeners may draw upon the biases engendered by previous encounters with speech to make assumptions about the speaker that contextualize how linguistic information is received and understood. However, particularly in the realm of sexual identity, with its close relationship to gender, fndings are not yet robust, in either production or perception, in terms of how particular phonetic variables pattern with identity types. The most supported fnding has been a relationship between /s/-fronting and perceptions of sexual orientation, particularly in cisgender men, but notably most work on this topic draws on outgroup perceptions of LGBTQIA+ speech. Additionally, until recently, most work on gender and sexuality confates sex and gender, gender presentation and gender identity, and more, and relies on forced-choice self- (or researcher-driven) identifcations. Crucially, it also centers a narrow demographic where other identities are concerned; namely, most participants are cisgender, monosexual, white, middle class, college-aged, able-bodied, English-speaking individuals, and most studies cen-ter men. Bi+ (bisexual, pansexual, etc.), trans, and non-binary groups are particularly underrepresented. Further, where methodology is concerned, most work on sibilants in particular relies heavily on spectral moments, a treatment of the acoustic signal that has been thoroughly criticized in phonetics; fnally, work in sociophonetics by-and-large hinges on the use of linear mixed-efects modeling, a useful but limited approach to statisical analysis. This dissertation addresses some of these gaps and shortcomings in two studies.

The frst content chapter, a production study, revisits the relationship between sexuality & gender and the production of sibilants and the vowel space, in spontaneous speech from 48 individuals living in the San Francisco Bay Area. These speakers were all college-aged, but came from a range of racio-ethnic, linguistic, geographic, and socioeconomic backgrounds; they self-identifed in each of these areas via open-ended responses and also responded to additional Likert-scale questions related to gender and sexuality. I analyzed their sibilant production using both spectral moments and Major Peaks Analysis (MPA) as a newer alternative, as well as how the sexuality-gender groups (treated together throughout) difered across the vowel space, using all available social information about the speakers to select the best statistical models. Results for /s/ production suggest a robust diference between sapphic non-binary and bi+ feminine identifying groups only, but several pairwise diferences emerged for F2. Bi+ speakers across genders also most frequently difer from other groups. Results also favor a more intentional use of either spectral moments and/or MPA in future work.

The second content chapter, a perception study, tackles the question of how voice quality impacts judgments of gender and sexual identity from the acoustic signal, in LGBTQIA+ listeners living in the Bay Area, for two non-binary speakers whose voices were described as relatively “ambiguous” in these areas. 146 individuals provided both forced-choice and open-ended judgments of the speakers’ gender and sexual orientation labels, across three sentences, each with fve synthetic voice quality manipulations. In analyzing the results, the same open-ended demographic information drawn upon in the production study, here collected for listeners, was incorporated. I also demonstrated the use of random forests and Bayesian multinomial regression as useful alternatives to linear modeling for this kind of data. Results suggest that, while voice quality is one factor infuencing these judgments, other phonetic information in the sentence; the listener’s own gender and sexuality identity; information provided about the speakers’ race and education level; and listener’s degree of interaction with trans people in daily life may all have an even greater impact. Listeners were also generally biased in their responses to cues to the speakers’ assigned sex at birth. Results also suggest open-ended responses result in better models than forced-choice responses. 

Together, these studies afrm the idea that we should take care to avoid generalizing fndings in sociophonetics outside the specifc groups considered in our work. Before asserting a relationship between a phonetic variable and an aspect of identity, it behooves us to ensure we have replicated that study within the given demographic, and that we have compared the results of diferent statistical modeling techniques. Then, we ought to carry out similar studies within a range of other demographic contexts, ideally tailoring the complexity of the social intersections modeled with the scale of the study we can support, to maintain statistical rigor. However, the work presented in this dissertation shows the value of tackling high-complexity social information within a medium-sized participant pool in challenging existing claims and highlighting areas for future work. This kind of work is worth replicating on an even larger scale for better-understood demographics, and later across many demographics once smaller (ideally qualitatively couched) studies have revealed clearer hypotheses across socio-cultural contexts.

Author: 
Publication date: 
May 15, 2026
Publication type: 
Dissertation