{"id":467610,"date":"2018-02-20T10:08:10","date_gmt":"2018-02-20T18:08:10","guid":{"rendered":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/?post_type=msr-research-item&#038;p=467610"},"modified":"2018-10-16T22:22:22","modified_gmt":"2018-10-17T05:22:22","slug":"https-link-springer-com-chapter-10-1007-978-3-319-64680-0_19","status":"publish","type":"msr-research-item","link":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/https-link-springer-com-chapter-10-1007-978-3-319-64680-0_19\/","title":{"rendered":"Challenges and Solutions to Deep Learning Network Acoustic Modeling in Speech Recognition Products at Microsoft"},"content":{"rendered":"<p>Deep Learning (DL) Network Acoustic Modeling has been widely deployed to real-world speech recognition products and services that benefit millions of users. In addition to the general modeling research that academic works on, there are special constraints and challenges that the industry has to face, e.g., the runtime constraint of system deployment, the robustness to the variations such as acoustic environment, accents, lack of manual transcription, etc. For large scale ASR applications, this chapter briefly describes selected developments and investigations at Microsoft to make deep learning networks more effective under production environment, including:<\/p>\n<p style=\"padding-left: 30px;\">reducing run-time cost with SVD (singular value decomposition)-based training,<br \/>\nimproving the accuracy of small-size DNN with teacher-student training,<br \/>\nuse of small amount of parameters for speaker adaptation of acoustic models,<br \/>\nimproving the robustness to acoustic environment with variable component DNN modeling,<br \/>\nimproving the robustness to accent\/dialect with model adaptation and accent dependent modeling,<br \/>\nintroducing time and frequency invariance with time-frequency long short-term memory recurrent neural networks,<br \/>\nexploring the generalization capability to unseen data with maximum margin sequence training,<br \/>\nuse of unsupervised data to improve SR accuracy,<br \/>\nincreasing language capability by reusing speech training material across languages.<\/p>\n<p>The outcome has enabled the deployment of DL acoustic models across Microsoft server and client product line including Windows 10 desktop\/laptop\/phone, XBOX, skype speech-to-speech translation.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Deep Learning (DL) Network Acoustic Modeling has been widely deployed to real-world speech recognition products and services that benefit millions of users. In addition to the general modeling research that academic works on, there are special constraints and challenges that the industry has to face, e.g., the runtime constraint of system deployment, the robustness to [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[{"type":"user_nicename","value":"Yifan Gong","user_id":"34994"},{"type":"user_nicename","value":"Jinyu Li","user_id":"32312"}],"msr_publishername":"Springer","msr_publisher_other":"","msr_booktitle":"New Era for Robust Speech Recognitino: Exploiting Deep Learning","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"","msr_pages_string":"407-417","msr_page_range_start":"407","msr_page_range_end":"417","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"","msr_doi":"10.1007\/978-3-319-64680-0_19","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2017-07-26","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"https:\/\/link.springer.com\/chapter\/10.1007\/978-3-319-64680-0_19","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13545],"msr-publication-type":[193721],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-467610","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-human-language-technologies","msr-locale-en_us"],"msr_publishername":"Springer","msr_edition":"","msr_affiliation":"","msr_published_date":"2017-07-26","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"New Era for Robust Speech Recognitino: Exploiting Deep Learning","msr_pages_string":"407-417","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"https:\/\/link.springer.com\/chapter\/10.1007\/978-3-319-64680-0_19","msr_doi":"10.1007\/978-3-319-64680-0_19","msr_publication_uploader":[{"type":"url","title":"https:\/\/link.springer.com\/chapter\/10.1007\/978-3-319-64680-0_19","viewUrl":false,"id":false,"label_id":0},{"type":"doi","title":"10.1007\/978-3-319-64680-0_19","viewUrl":false,"id":false,"label_id":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[{"id":0,"url":"https:\/\/link.springer.com\/chapter\/10.1007\/978-3-319-64680-0_19"}],"msr-author-ordering":[{"type":"user_nicename","value":"Yifan Gong","user_id":34994,"rest_url":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Yifan Gong"},{"type":"user_nicename","value":"Jinyu Li","user_id":32312,"rest_url":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Jinyu Li"}],"msr_impact_theme":[],"msr_research_lab":[199565],"msr_event":[],"msr_group":[],"msr_project":[],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inbook","related_content":[],"_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/467610","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":5,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/467610\/revisions"}],"predecessor-version":[{"id":477363,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/467610\/revisions\/477363"}],"wp:attachment":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media?parent=467610"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=467610"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=467610"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=467610"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=467610"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=467610"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=467610"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=467610"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=467610"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=467610"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=467610"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=467610"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=467610"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}