Audio and acoustics

Microsoft Research Blog

Breaking cross-modal boundaries in multimodal AI: Introducing CoDi, composable diffusion for any-to-any generation

June 29, 2023 | Zineng Tang, Ziyi Yang, Chenguang Zhu, Michael Zeng, Mohit Bansal

Imagine an AI model that can seamlessly generate high-quality content across text, images, video, and audio, all at once. Such a model would more accurately capture the multimodal nature of the world and human comprehension,…

CoDi can generate any combination of modalities from any, all at once.

Project

VALL-E

Neural codec language models for speech synthesis We introduce a language modeling approach for text-to-speech synthesis (TTS). Specifically, we train a neural codec language model (called VALL-E) using discrete codes derived from an off-the-shelf neural…

Publication

Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization

Dongmei Wang, Xiong Xiao, Naoyuki Kanda, Takuya Yoshioka, Jian Wu

ICASSP 2023 | June 2023

Publication

DasFormer: Deep Alternating Spectrogram Transformer for Multi/Single-Channel Speech Separation

Shuo Wang, Xiangyu Kong, Xiulian Peng, Hesam Movassagh, Vinod Prakash, Yan Lu

ICASSP 2023 | June 2023

Publication

Real-Time Audio-Visual End-To-End Speech Enhancement

Zirun Zhu, Hemin Yang, Min Tang, Ziyi Yang, Sefik Emre Eskimez, Huaming Wang

2023 IEEE International Conference on Acoustics, Speech and Signal Processing | June 2023

Publication

Speech MOS multi-task learning and rater bias correction

Haleh Akrami, Hannes Gamper

IEEE ICASSP | June 2023

Publication

MuseCoco: Generating Symbolic Music from Text

Peiling Lu, Xin Xu, Chenfei Kang, Botao Yu, Chengyi Xing, Xu Tan, Jiang Bian

June 2023

Publication

BEATs: Audio Pre-Training with Acoustic Tokenizers

Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Daniel Tompkins, Zhuo Chen, Furu Wei

ICML 2023 | June 2023

Publication

Leveraging Pretrained Representations with Task-related Keywords for Alzheimer’s Disease Detection

Jinchao Li, Kaitao Song, Junan Li, Bo Zheng, Dongsheng Li, Xixin Wu, Xunying Liu, Helen Meng

ICASSP 2023 | June 2023

Publication

Towards Real-Time Single-Channel Speech Separation in Noisy and Reverberant Environments

Julian Neri, Sebastian Braun

2023 International Conference on Acoustics, Speech, and Signal Processing | June 2023

Project