Prediction and Enhancement of Speech Intelligibility in Challenging Acoustic Environments
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Speech intelligibility refers to the portion of a spoken message correctly identified by a listener, making it a critical perceptual attribute in challenging acoustic environments, such as low signal-to-noise ratios. This work addresses two key issues in speech intelligibility: prediction and enhancement. In the Speech Intelligibility Prediction (SIP) segment, we introduce two state-of-the-art, reference-based SIP algorithms that utilize Spectro-Temporal Modulation (STM) analysis of input speech. We present a data-driven, interpretable STM weighting function that assigns varying importance to different STM frequencies, with findings contextualized through psychoacoustic modulation transfer functions. Our algorithms achieved state-of-the-art performance across multiple unseen test datasets and under diverse distortion and processing conditions. Additionally, we propose a probabilistic linguistic augmentation method to modify existing SIP algorithms, incorporating the linguistic predictability of sentences. Using next-word probabilities from a pre-trained language model, we estimate contextual predictability for SIP, yielding improved intelligibility across datasets with varying levels of linguistic predictability.
For speech enhancement under challenging acoustic conditions, we investigate bone-conducted speech. Bone-conducted speech captures speech signals transmitted through bone vibration rather than through the air, offering a resilient alternative in noisy environments. This thesis explores personalized enhancement approaches using deep neural networks to improve the intelligibility and quality of bone-conducted speech, with a focus on speaker adaptation.

