{"id":164889,"date":"2004-01-01T00:00:00","date_gmt":"2004-01-01T00:00:00","guid":{"rendered":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/msr-research-item\/automatic-speech-recognition-of-co-channel-speech-integrated-speaker-and-speech-recognition-approach\/"},"modified":"2018-10-16T21:10:06","modified_gmt":"2018-10-17T04:10:06","slug":"automatic-speech-recognition-of-co-channel-speech-integrated-speaker-and-speech-recognition-approach","status":"publish","type":"msr-research-item","link":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/automatic-speech-recognition-of-co-channel-speech-integrated-speaker-and-speech-recognition-approach\/","title":{"rendered":"Automatic Speech Recognition of Co-Channel Speech: Integrated Speaker and Speech Recognition Approach"},"content":{"rendered":"<div class=\"asset-content\">\n<p>This paper presents a novel Bayesian approach to the problem of speech recognition of co channel speech &#8211; also known as the &#8220;cocktail party&#8221; problem. The problem is formulated as the joint maximization of the a posteriori probability of the word sequence and the target speaker given the observed speech signal. It is shown that the joint probability can be expressed as the product of six terms: a likelihood score from a speaker-independent speech recognizer, the (normalized) likelihood score of a speaker recognizer, the likelihood of a sequence of prosodic events, the likelihood of a speaker-dependent statistical language model, a prior representing the channel usage patterns of a speaker, and the prior probability of the speaker. An efficient single-pass Viterbi search strategy is presented. Experimental results on over-the telephone recognition of co-channel speech show a 45% reduction in word error rate of a 10-digit telephone number task<\/p>\n<\/div>\n<p><!-- .asset-content --><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This paper presents a novel Bayesian approach to the problem of speech recognition of co channel speech &#8211; also known as the &#8220;cocktail party&#8221; problem. The problem is formulated as the joint maximization of the a posteriori probability of the word sequence and the target speaker given the observed speech signal. It is shown that [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"Interspeech","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2004-01-01","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":2004,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13554],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-164889","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-human-computer-interaction","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2004-01-01","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"228061","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","title":"i04_0829_icslp2004_cochannelASR.pdf","viewUrl":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2004\/01\/i04_0829_icslp2004_cochannelASR.pdf","id":228061,"label_id":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[{"id":228061,"url":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2004\/01\/i04_0829_icslp2004_cochannelASR.pdf"}],"msr-author-ordering":[{"type":"user_nicename","value":"lheck","user_id":32659,"rest_url":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=lheck"},{"type":"text","value":"Mark Z. Mao","user_id":0,"rest_url":false}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[],"msr_project":[169434,169832,169715],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":169434,"post_title":"Acoustic Modeling","post_name":"acoustic-modeling","post_type":"msr-project","post_date":"2004-01-29 16:42:42","post_modified":"2019-08-14 14:50:04","post_status":"publish","permalink":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/project\/acoustic-modeling\/","post_excerpt":"Acoustic modeling of speech typically refers to the process of\u00a0establishing statistical\u00a0representations for the feature vector sequences\u00a0computed from the speech waveform. Hidden Markov Model (HMM) is one most common type of acoustuc models. Other acosutic models include segmental models, super-segmental models (including hidden dynamic models), neural networks, maximum entropy models, and (hidden) conditional random fields, etc. Acoustic modeling also encompasses \"pronunciation modeling\", which describes how a sequence or multi-sequences of fundamental speech units\u00a0(such as phones or&hellip;","_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/169434"}]}},{"ID":169832,"post_title":"Speaker Identification (WhisperID)","post_name":"speaker-identification-whisperid","post_type":"msr-project","post_date":"2004-01-29 16:52:18","post_modified":"2017-06-19 09:22:47","post_status":"publish","permalink":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/project\/speaker-identification-whisperid\/","post_excerpt":"When you speak to someone, they don't just recognize what you say: they recognize who you are. WhisperID will let computers do that, too, figuring out who you are by the way you sound. Home PC Security. In your home, Speaker Identification will make it easier for you to log into your computer, just by saying \"Log me in!\" Office PC Security. In your office, Speaker ID can add an extra level of protection to&hellip;","_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/169832"}]}},{"ID":169715,"post_title":"Noise Robust Speech Recognition","post_name":"noise-robust-speech-recognition","post_type":"msr-project","post_date":"2002-02-19 14:36:52","post_modified":"2017-06-02 09:12:19","post_status":"publish","permalink":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/project\/noise-robust-speech-recognition\/","post_excerpt":"Techniques to improve the robustness of automatic speech recognition systems to noise and channel mismatches Robustness of ASR Technology to Background Noise You have probably seen that most people using a speech dictation software are wearing a close-talking microphone. So, why has senior researcher Li Deng been trying to get rid of close-talking microphones? Close-talking microphones pick up relatively little background noise and speech recognition systems can obtain decent accuracy with them. If you are&hellip;","_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/169715"}]}}]},"_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/164889","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":2,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/164889\/revisions"}],"predecessor-version":[{"id":533454,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/164889\/revisions\/533454"}],"wp:attachment":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media?parent=164889"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=164889"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=164889"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=164889"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=164889"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=164889"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=164889"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=164889"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=164889"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=164889"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=164889"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=164889"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=164889"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}