Insights into the Challenges and Opportunities of Large Multi-Modal Models for Blind and Low Vision Users: CLIP
PARIKSHA: A Scalable, Democratic, Transparent Evaluation Platform for Assessing Indic Large Language Models
Publication Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka Interspeech 2022 | September 2022
Publication Streaming Multi-Talker ASR with Token-Level Serialized Output Training Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka Interspeech 2022 | September 2022
Publication Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training Chengyi Wang, Yiming Wang, Yu Wu, Sanyuan Chen, Jinyu Li, Shujie Liu, Furu Wei Interspeech | September 2022
Publication Separating Long-Form Speech with Group-Wise Permutation Invariant Training Wangyou Zhang, Zhuo Chen, Naoyuki Kanda, Shujie Liu, Jinyu Li, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei Interspeech 2022 | September 2022
Publication Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Zhuo Chen, Gang Liu, Jinyu Li, Jian Wu, Xiangzhan Yu, Furu Wei Interspeech | September 2022
Publication Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data Junyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu, Haizhou Li, Tom Ko, Lirong Dai, Jinyu Li, Yao Qian, Furu Wei Interspeech | September 2022
Publication Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition Zhong Meng, Yashesh Gaur, Naoyuki Kanda, Jinyu Li, Xie Chen, Yu Wu, Yifan Gong Interspeech 2022 | September 2022
Publication VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition Naoyuki Kanda, Jian Wu, Xiaofei Wang, Zhuo Chen, Jinyu Li, Takuya Yoshioka arXiv:2209.04974 | September 2022
Publication Towards Contextual Spelling Correction for Customization of End-to-End Speech Recognition Systems Xiaoqiang Wang, Yanqing Liu, Jinyu Li, Veljko Miljanic, Sheng Zhao, Hosam Khalil IEEE/ACM Transactions on Audio, Speech, and Language Processing | September 2022, Vol 30: pp. 3089-3097
Publication Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation Xiaofei Wang, Dongmei Wang, Naoyuki Kanda, Sefik Emre Eskimez, Takuya Yoshioka INTERSPEECH 2022 | September 2022