Gašper Beguš gave a virtual invited hall titled Modeling language from raw speech with GANs” at the CHAI: Chat about AI colloquium at the School of Data Science and AI, Indian Institute of Technology (IIT Guwahati) on September 13, 2023.
Gašper Beguš published a paper titled "Articulation GAN: Unsupervised modeling of articulatory learning" in proceedings of ICASSP 2023 (IEEE International Conference on Acoustics, Speech and Signal Processing) with Alan Zhou, Peter Wu, and Gopala K. Anumanchipalli. The paper is available here.
Video of the presentation scheduled to be given at the conference in Rhodes, Greece on June 9 is available here.
This paper presents a technique to interpret and visualize intermediate layers in generative CNNs trained on raw speech data in an unsupervised manner. We argue that averaging over feature maps after ReLU activation in each transpose convolutional layer yields interpretable time-series data. This technique allows for acoustic analysis of intermediate layers that parallels the acoustic analysis of human speech data: we can extract F0, intensity, duration, formants, and other acoustic properties from intermediate layers in order to test where and how CNNs encode various types of...
Comparing artificial neural networks with outputs of neuroimaging techniques has recently seen substantial advances in (computer) vision and text-based language models. Here, we propose a framework to compare biological and artificial neural computations of spoken language representations and propose several new challenges to this paradigm. The proposed technique is based on a similar principle that underlies electroencephalography (EEG): averaging of neural (artificial or biological) activity across neurons in the time domain, and allows to compare encoding of any acoustic property in the...
Isaac Bleaman and Ronald Sprouse have published a tutorial on speaker diarization at the Linguistics Methods Hub. The process allows researchers to automatically generate ELAN or Praat files for audio recordings with speech segments marked off on the appropriate speaker tiers — an important first step in the transcription workflow.
Gašper Beguš and Alan Zhou (Berkeley Speech and Computation lab alum) published a paper titled "Interpreting Intermediate Convolutional Layers of Generative CNNs Trained on Waveforms" in IEEE/ACM Transactions on Audio, Speech, and Language Processing. The paper is available through Open Access here: https://doi.org/10.1109/TASLP.2022.3209938