Corpus studies of regional variation using raw language data from the internet focus predominantly on lexical variables in writing. However, online repositories such as YouTube offer the possibility of investigating regional differences using phonological variables, as well. This article demonstrates the viability of constructing a naturalistic speech corpus for sociophonetic research by analyzing hundreds of recitations of Abraham Lincoln's Gettysburg Address. We first replicate a known result of phonetic research, namely, that English vowels are longer in duration before voiced obstruents than before voiceless ones. We then compare /æ/-tensing in recitations from the Inland North and New York City dialect regions. Results indicate that there are significant regional differences in the formant trajectory of the vowel, even in identical phonetic environments (e.g., before nasal codas). This calls into question the uniformity of "/æ/-tensing" as a cross-dialectal phenomenon in American English. We contend that the analysis of spoken data from social media can and should supplement traditional methods in dialectology and variationist analysis to generate new hypotheses about socially conditioned speech patterns.
May 1, 2021
Bleaman, I. L., & Duncan, D. (2021). The Gettysburg Corpus: Testing the Proposition That All Tense /æ/s Are Created Equal. American Speech, 96(2), 161–191. https://doi.org/10.1215/00031283-8620511