{"id":466398,"date":"2018-02-13T18:20:06","date_gmt":"2018-02-14T02:20:06","guid":{"rendered":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/?post_type=msr-research-item&#038;p=466398"},"modified":"2020-06-04T17:26:30","modified_gmt":"2020-06-05T00:26:30","slug":"a-hybrid-approach-to-combining-conventional-and-deep-learning-techniques-for-single-channel-speech-enhancement-and-recognition","status":"publish","type":"msr-research-item","link":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/a-hybrid-approach-to-combining-conventional-and-deep-learning-techniques-for-single-channel-speech-enhancement-and-recognition\/","title":{"rendered":"A Hybrid Approach to Combining Conventional and Deep Learning Techniques for Single-channel Speech Enhancement and Recognition"},"content":{"rendered":"<p>Conventional speech-enhancement techniques employ statistical signal-processing algorithms. They are computationally efficient and improve speech quality even under unknown noise conditions. For these reasons, they are preferred for deployment in unpredictable environments. One limitation of these algorithms is that they fail to suppress non-stationary noise. This hinders their broad usage. Emerging algorithms based on deep-learning promise to overcome this limitation of conventional methods. However, these algorithms under-perform when presented with noise conditions that were not captured in the training data. In this paper, we propose a single-channel speech-enhancement technique that combines the benefits of both worlds to achieve the best listening-quality and recognition accuracy under conditions of noise that are both unknown and non-stationary. Our method utilizes a conventional speech-enhancement algorithm to produce an intermediate representation of the input data by multiplying noisy input spectrogram features with gain vectors (known as the suppression rule). We process this intermediate representation through a recurrent neural-network based on long short-term memory (LSTM) units. Furthermore, we train this network to jointly learn two targets: a direct estimate of clean-speech features and a noise-reduction mask. Based on this LSTM multistyle training (LSTM-MT) architecture, we demonstrate a PESQ improvement of 0.76 (from 2.65 to 3.41) and a relative word-error rate reduction of 47.7% (from 15.9% to 8.3%).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Conventional speech-enhancement techniques employ statistical signal-processing algorithms. They are computationally efficient and improve speech quality even under unknown noise conditions. For these reasons, they are preferred for deployment in unpredictable environments. One limitation of these algorithms is that they fail to suppress non-stationary noise. This hinders their broad usage. Emerging algorithms based on deep-learning promise [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"\u00a9 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting\/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.","msr_conference_name":"IEEE Int. Conf. Acoustics Speech and Signal Processing (ICASSP)","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2018-4-16","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"https:\/\/2018.ieeeicassp.org\/","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13561,13556,243062,13545],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-466398","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-algorithms","msr-research-area-artificial-intelligence","msr-research-area-audio-acoustics","msr-research-area-human-language-technologies","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2018-4-16","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"489164","msr_publicationurl":"https:\/\/2018.ieeeicassp.org\/","msr_doi":"","msr_publication_uploader":[{"type":"file","viewUrl":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/02\/TuTashevLeeZarar_ICASP_2018.pdf","id":"489164","title":"TuTashevLeeZarar_ICASP_2018","label_id":"243109","label":0},{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/2018.ieeeicassp.org\/","label_id":"243109","label":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[{"id":0,"url":"https:\/\/2018.ieeeicassp.org\/"}],"msr-author-ordering":[{"type":"text","value":"Yan-Hui Tu","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Ivan Tashev","user_id":32127,"rest_url":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Ivan Tashev"},{"type":"user_nicename","value":"Shuayb Zarar","user_id":36563,"rest_url":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Shuayb Zarar"},{"type":"text","value":"Chin-Hui Lee","user_id":0,"rest_url":false}],"msr_impact_theme":[],"msr_research_lab":[199565],"msr_event":[],"msr_group":[144923],"msr_project":[364265],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":364265,"post_title":"Neural Networks-based Speech Enhancement","post_name":"nn-speech-enhancement","post_type":"msr-project","post_date":"2017-02-15 17:08:27","post_modified":"2022-01-21 13:15:43","post_status":"publish","permalink":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/project\/nn-speech-enhancement\/","post_excerpt":"The goal of speech enhancement is to take the audio signal from a microphone,\u00a0clean\u00a0it and forward clean audio to multiple clients such as speech-recognition software, archival databases and speakers.","_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/364265"}]}}]},"_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/466398","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":5,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/466398\/revisions"}],"predecessor-version":[{"id":467625,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/466398\/revisions\/467625"}],"wp:attachment":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media?parent=466398"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=466398"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=466398"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=466398"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=466398"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=466398"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=466398"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=466398"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=466398"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=466398"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=466398"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=466398"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=466398"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}