{"id":485208,"date":"2018-05-14T11:29:12","date_gmt":"2018-05-14T18:29:12","guid":{"rendered":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/?p=485208"},"modified":"2018-05-16T09:27:35","modified_gmt":"2018-05-16T16:27:35","slug":"sounding-future-microsoft-research-brings-best-icassp-2018-calgary","status":"publish","type":"post","link":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/blog\/sounding-future-microsoft-research-brings-best-icassp-2018-calgary\/","title":{"rendered":"Sounding the Future: Microsoft Research brings its best to ICASSP 2018 in Calgary"},"content":{"rendered":"<h3><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-485820\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/05\/ICASSP_HLT_BHeader_05_2018_1000x400.jpg\" alt=\"ICASSP 2018 Microsoft Research\" width=\"1000\" height=\"400\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/05\/ICASSP_HLT_BHeader_05_2018_1000x400.jpg 1000w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/05\/ICASSP_HLT_BHeader_05_2018_1000x400-300x120.jpg 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/05\/ICASSP_HLT_BHeader_05_2018_1000x400-768x307.jpg 768w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><\/h3>\n<h3>Introduction<\/h3>\n<p>Speech technology has come a long way since Alexander Graham Bell&#8217;s famous Mr. Watson \u2013 Come here \u2013 I want to see you became the first speech to be heard over the telephone in 1876. Today, speech technology has moved into realms such as VoIP, teleconferencing systems, home automation, and so on. Its importance has grown exponentially with the emergence of mobile and wearable devices and many existing and upcoming Microsoft services, devices and algorithms depend on these voice-based interfaces.<\/p>\n<p>As far as things have come along, there is still a lot of inefficiency and the importance of high-performing speech-processing technologies has never been more apparent. Traditional signal processing algorithms that used to be the state-of-the-art \u2013 especially in speech-recognition and computer-vision \u2013 are facing performance plateaus. Also, a new class of algorithms that can learn directly from data and be robust in diverse and adverse application environments has emerged. The development of speech technologies has exploded due to advances in machine learning and AI. These advances have made voice interfaces more practical and useful, leading to easier and more efficient communication with the machines around us. Experts believe that speech applications are approaching a level of reliability at which everyday use will become second nature.<\/p>\n<h3>ICASSP<\/h3>\n<p>The 2018 International Conference on Acoustics, Speech and Signal Processing in Calgary, Canada is the world&#8217;s largest and most comprehensive technical conference focused on signal processing and its applications; ICASSP is the global event for presenting important developments in speech technology. The conference is sponsored by the IEEE Signal Processing Society and has been held annually since 1976. It features world-class speakers, tutorials, exhibits, a show and tell event and over 120 presentation and poster sessions. Microsoft\u2019s presence was significant, with researchers presenting over 25 papers on ground-breaking, novel machine-learning methods for speech processing. This work significantly improves the odds of advancing speech technology quality in many backend services and devices.<\/p>\n<p>At ICASSP, Microsoft offered a glimpse of future speech services \u2013 a world of lightly supervised training, enhanced robustness and more intuitive interaction with machines. Far-field ASR and voice control has become a lot more practical, now working reliably in noisy environments, for example, interacting across a room and being able to handle multiple speakers even when they speak simultaneously. Virtual assistants such as Microsoft Cortana offer a simpler way of accessing information, cueing up songs and building shopping lists, all using just your voice. As part of these applications, multimodal speech processing is gaining more attention. Several of the conference sessions were dedicated to such areas. Microsoft is well-placed, especially when considering the impressive size of the team dedicated to advancing the accuracy of speech recognition and improving the overall conversational interfaces.<\/p>\n<p>It\u2019s also worth noting that more and more research teams are moving away from doing only core ASR, broadening their focus to include areas such as multi-speaker ASR, language ID, and diarization, all of which are required to build end-to-end applications.<\/p>\n<h3>Sounding the Future<\/h3>\n<p>Natural Language understanding and Dialogue Systems are two of the next challenges in AI. The use of speech and image recognition to analyze inflections and facial expressions as part of a dialogue system will make machines interact more naturally with their human users. Although many researchers expect voice interfaces to become more natural, there is still a big challenge for AI because language interfaces are complex and domain-specific intelligence together with knowledge about effective human-machine interaction is required to respond. A number of significant Microsoft papers are being presented at ICASSP that advance the conversation in these areas, including \u201c<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/improving-end-turn-detection-spoken-dialogues-detectin-speaker-intentions-secondary-task-z-aldeneh-d-dimitriadis-e-mower-provost\/\">Improving End-of-Turn Detection in Spoken Dialogues by Detecting Speaker Intentions as a Secondary Task<\/a>\u201d, \u201c<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/1708.06073\">The Microsoft 2017 Conversational Speech Recognition System<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u201d, \u201c<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/04\/ICASSP2018_CortanaAdapt.pdf\">Domain and Speaker Adaptation for Cortana Speech Recognition<\/a>\u201d, \u201cS<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/1707.07048\">equence Modeling in Unsupervised Single-Channel Overlapped Speech Recognition<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u201d and \u201c<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/1711.02207\">Towards Language-Universal End-to-End Speech Recognition<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u201d.<\/p>\n<p>One of the hottest trends in machine learning is Generative Adversarial Networks. These systems consist of one neural network generating artificial data and another network trained to distinguish fake from real data. When combined, these two networks have the power to create realistic synthetic data that can be indistinguishable from real data. Papers like \u201c<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/1804.00644\">Adversarial Teacher-Student Learning for Unsupervised Domain Adaptation<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u201d, \u201c<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/1804.00732\">Speaker-Invariant Training via Adversarial Learning<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u201d, and \u201c<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/1710.11277\">Adversarial Advantage Actor-critic Model for Task-Completion Dialogue Policy Learning<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u201d, attest to Microsoft\u2019s pioneering efforts in GANs as applied to AI.<\/p>\n<p>As previously noted, ICASSP covers a wide range of technologies paving the broader trends in machine learning. A large fraction of the ASR-related papers is dedicated to attention mechanisms, end-to-end modeling and sequence-to-sequence models. Microsoft has been using sequence-to-sequence systems for machine translation; in the case of ASR, there are still important problems to iron out. Nevertheless, Microsoft is advancing the field in these areas with papers like \u201c<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/1803.05563\">Advancing Connectionist Temporal Classification with Attention Modeling<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u201d, \u201c<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/1803.05566\">Advancing Acoustic-to-Word CTC Model<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u201d, and \u201c<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/2018.ieeeicassp.org\/Papers\/ViewPapers.asp?PaperNum=3951\">Neural Sequential Malware Detection with Parameters<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u201d.<\/p>\n<h3>What\u2019s Next?<\/h3>\n<p>Clearly core areas of speech technology like automatic speech recognition and text-to-speech synthesis have reached an impressive level of maturity. But there remain significant open questions around how to use voice modality to create more natural user interfaces. Much attention was devoted during the ICASSP sessions to far-field speech processing, diarization, speech separation and similar technical challenges. Microsoft\u2019s interest in these areas is strong and reflected by the presentation of multiple papers in this area including \u201c<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/1804.05166\">Developing Far-field Speaker System via Teacher-Student Learning<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u201d, \u201c<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/www.researchgate.net\/publication\/322887483_Exploring_sequential_characteristics_in_speaker_bottleneck_feature_for_text-dependent_speaker_verification\">Exploring sequential characteristics in speaker bottleneck feature for text-dependent speaker verification<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u201d, and \u201c<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/efficient-integration-fixed-beamformers-speech-separation-networks-multi-channel-far-field-speech-separation-2\/\">Efficient Integration of Fixed Beamformers and Speech Separation Networks for Multi-channel Far-Field Speech Separation<\/a>\u201d.<\/p>\n<p>Challenges across cognitive and behavioral sciences on how to design truly effective and efficient human-computer interaction scenarios remain. As part of these challenges, it is very likely that affective computing (such as emotion processing) will continue to gain momentum and most of the prominent problems will be solved. The challenge will ultimately be to combine such increasingly accurate sensing capabilities to improve and elevate the human-machine communication in both home and work environments.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction Speech technology has come a long way since Alexander Graham Bell&#8217;s famous Mr. Watson \u2013 Come here \u2013 I want to see you became the first speech to be heard over the telephone in 1876. Today, speech technology has moved into realms such as VoIP, teleconferencing systems, home automation, and so on. Its importance [&hellip;]<\/p>\n","protected":false},"author":37074,"featured_media":485823,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[{"type":"user_nicename","value":"Dimitrios Dimitriadis","user_id":"37521"}],"msr_hide_image_in_river":0,"footnotes":""},"categories":[194456,194460],"tags":[],"research-area":[13561,13556,13545,13554],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-485208","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-natural-language-processing","category-search-and-information-retrieval","msr-research-area-algorithms","msr-research-area-artificial-intelligence","msr-research-area-human-language-technologies","msr-research-area-human-computer-interaction","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[664548],"related-projects":[],"related-events":[474855],"related-researchers":[],"msr_type":"Post","featured_image_thumbnail":"<img width=\"480\" height=\"280\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/05\/ICASSP_HLT_Carosel_05_2018_480x280.jpg\" class=\"img-object-cover\" alt=\"\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/05\/ICASSP_HLT_Carosel_05_2018_480x280.jpg 480w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/05\/ICASSP_HLT_Carosel_05_2018_480x280-300x175.jpg 300w\" sizes=\"auto, (max-width: 480px) 100vw, 480px\" \/>","byline":"Dimitrios Dimitriadis","formattedDate":"May 14, 2018","formattedExcerpt":"Introduction Speech technology has come a long way since Alexander Graham Bell&#039;s famous Mr. Watson \u2013 Come here \u2013 I want to see you became the first speech to be heard over the telephone in 1876. Today, speech technology has moved into realms such as&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts\/485208","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/users\/37074"}],"replies":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/comments?post=485208"}],"version-history":[{"count":4,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts\/485208\/revisions"}],"predecessor-version":[{"id":486153,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts\/485208\/revisions\/486153"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media\/485823"}],"wp:attachment":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media?parent=485208"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/categories?post=485208"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/tags?post=485208"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=485208"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=485208"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=485208"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=485208"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=485208"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=485208"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=485208"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=485208"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}