Research intern talk: Unified speech enhancement approach for speech degradations & noise suppression

August 18, 2022
Khandokar Md. Nayem, Indiana University, Bloomington

Speech enhancement approaches generally focus on removing additive noise and reverberation that adversely affects the overall speech quality and intelligibility. Another group of signal degradations like clipping, bandwidth limitations, and codec degradation can occur due to poor recording hardware, network transmission, and other pre-processing. These degradations largely impact on intelligibility and speech quality. In this work, we deploy a convolutional recurrent network to remove these speech degradations in conjunction with the noise suppression task and propose cascade and end-to-end approaches. We compare both complex mask and direct spectrum estimation approaches for this task using a small real-time capable DNN. Overall, we propose a cascaded processing approach, addressing the distortion types differently, and enabling a task-tailored modular processing.

- Khandokar Md. Nayem
  
  PhD student
  
  Indiana University, Bloomington
Research Area
- Audio and Acoustics
Research Lab
- Microsoft Research Lab - Redmond
Group
- Audio and Acoustics Research Group

Watch Next

Introducing Muse: Our first generative AI model designed for gameplay ideation
May 20, 2026
Designing Dynamic Measure Transport for Sampling
May 19, 2026
Aimee Maurais
Distant conversational speech recognition: Challenges and Opportunities
October 15, 2025
Dr. Samuele Cornell,

Sunit Sivasankaran
FOA Tokenizer: Learning Discrete Representations of Spatial Audio with Multichannel VQ-GAN
September 3, 2025
Parthasaarathy Sudarsanam,

Hannes Gamper
Spatial Audio Rendering for Speech Live Translation
August 14, 2025
Margarita Geleta
Final intern talk: Distilling Self-Supervised-Learning-Based Speech Quality Assessment into Compact Models
July 18, 2025
Benjamin Stahl
Neural Representation Learning in the Wild: Toward Generalizable Representations and Scalable Citizen Science for Brain-Computer Interfaces
April 17, 2025
Maurice Abou Jaoude,

Chris Aimone ,

Jean-Michel Fournier

, et. al.
Make some noise: Teaching the language of audio to an LLM using sound tokens
August 22, 2024
Shivam Mehta
Final intern talk: Distilling Self-Supervised-Learning-Based Speech Quality Assessment into Compact
July 18, 2024
Benjamin Stahl
Upper Bound 2024: Towards Human-Centered AI in AAA Video Game
June 11, 2024
Raluca Georgescu

Your Privacy Choices