Audio and acoustics

Publication

Sci-Phi: A Large Language Model Spatial Audio Descriptor

Xilin Jiang, Sebastian Braun, Hannes Gamper

IEEE Open Journal of Signal Processing | January 2026

Project

Publication

Towards Real-Time Generative Speech Restoration with Flow-Matching

Tsun-An Hsieh, Sebastian Braun

2026 International Conference on Acoustics, Speech, and Signal Processing | January 2026

Project

Publication

SALAD-VAE: Semantic Audio Compression with Language-Audio Distillation

Sebastian Braun, Hannes Gamper, Dimitra Emmanouilidou

2026 International Conference on Acoustics, Speech, and Signal Processing | January 2026

Video

Spatial Audio Rendering for Speech Live Translation

November 24, 2025 | Margarita Geleta

Language barriers in virtual meetings remain a persistent challenge to global collaboration. While real-time translation technologies offer a promising solution, their integration into conversational interfaces often neglects key perceptual cues. This study explores how spatial…

01:04:38

Publication

Train Short, Infer Long: Speech-LLM Enables Zero-Shot Streamable Joint ASR and Diarization on Long Audio

Mohan Shi, Xiong Xiao, Ruchao Fan, Shaoshi Ling, Jinyu Li

November 2025

Publication

RiTTA: Modeling Event Relations in Text-to-Audio Generation

Yuhang He, Yash Jain, Xubo Liu, Andrew Markham, Vibhav Vineet

2025 Empirical Methods in Natural Language Processing | November 2025

Video

Distant conversational speech recognition: Challenges and Opportunities

October 17, 2025 | Dr. Samuele Cornell, Sunit Sivasankaran

State-of-the-art ASR systems excel on close-talk benchmarks but struggle with far-field conversational speech, where error rates remain above 20%. Current benchmark datasets inadequately assess generalization across domains and real-world conditions, often relying on oracle segmentation…

01:28:41

Video

FOA Tokenizer: Learning Discrete Representations of Spatial Audio with Multichannel VQ-GAN

October 17, 2025 | Parthasaarathy Sudarsanam, Hannes Gamper

Spatial audio captures the directional and environmental characteristics of sound, enabling immersive listening experiences. First-Order Ambisonics (FOA) provides a compact representation of spatial audio by encoding the sound field’s directional components across four channels, allowing…

graphical user interface, text, application

54:08

Publication

Distributed Asynchronous Device Speech Enhancement via Windowed Cross-Attention

Gene-Ping Yang, Sebastian Braun

Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) | October 2025

Project

Publication

FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates

Jiaqi Li, Yao Qian, Yuxuan Hu, Leying Zhang, Xiaofei Wang, Heng Lu, Manthan Thakker, Jinyu Li, Sheng Zhao, Zhizheng Wu

ICLR 2026 | September 2025