{"id":168609,"date":"2015-07-01T00:00:00","date_gmt":"2015-07-01T00:00:00","guid":{"rendered":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/msr-research-item\/a-study-of-multimodal-addressee-detection-in-human-human-computer-interaction\/"},"modified":"2018-10-16T21:24:11","modified_gmt":"2018-10-17T04:24:11","slug":"a-study-of-multimodal-addressee-detection-in-human-human-computer-interaction","status":"publish","type":"msr-research-item","link":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/a-study-of-multimodal-addressee-detection-in-human-human-computer-interaction\/","title":{"rendered":"A Study of Multimodal Addressee Detection in Human-Human-Computer Interaction"},"content":{"rendered":"<div class=\"asset-content\">\n<p>The goal of addressee detection is to answer the question , \u201cAre you talking to me?\u201d When a dialogue system interacts with multiple users, it is crucial to detect when a user is speaking to the system as opposed to another person. We study this problem in a multimodal scenario, using lexical, acoustic, visual, dialogue state, and beamforming information. Using data from a multiparty dialogue system, we quantify the benefits of using multiple modalities over using a single modality. We also assess the relative importance of the various modalities, as well as of key individual features, in estimating the addressee. We find that energy-based acoustic features are by far the most important, that information from speech recognition and system state is useful as well, and that visual and beamforming features provide little additional benefit. While we find that head pose is affected by whom the speaker is addressing, it yields little nonredundant information due to the system acting as a situational attractor. Our findings would be relevant to multiparty, open-world dialogue systems in which the agent plays an active, conversational role, such as an interactive assistant deployed in a public, open space. For these scenarios , our study suggests that acoustic, lexical, and system-state information is an effective and practical combination of modalities to use for addressee detection. We also consider how our analyses might be affected by the ongoing development of more realistic, natural dialogue systems<\/p>\n<\/div>\n<p><!-- .asset-content --><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The goal of addressee detection is to answer the question , \u201cAre you talking to me?\u201d When a dialogue system interacts with multiple users, it is crucial to detect when a user is speaking to the system as opposed to another person. We study this problem in a multimodal scenario, using lexical, acoustic, visual, dialogue [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"IEEE - Institute of Electrical and Electronics Engineers","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"IEEE Transactions on Multimedia","msr_number":"","msr_organization":"","msr_pages_string":"1550-1561","msr_page_range_start":"1550","msr_page_range_end":"1561","msr_series":"","msr_volume":"17","msr_copyright":"\u00a9 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting\/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.","msr_conference_name":"","msr_doi":"10.1109\/TMM.2015.2454332","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"T. J. Tsai","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2015-07-01","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":2015,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13554],"msr-publication-type":[193715],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-168609","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-human-computer-interaction","msr-locale-en_us"],"msr_publishername":"IEEE - Institute of Electrical and Electronics Engineers","msr_edition":"","msr_affiliation":"","msr_published_date":"2015-07-01","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"1550-1561","msr_chapter":"","msr_isbn":"","msr_journal":"IEEE Transactions on Multimedia","msr_volume":"17","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"204270","msr_publicationurl":"","msr_doi":"10.1109\/TMM.2015.2454332","msr_publication_uploader":[{"type":"file","title":"ieee-preprint-07153545.pdf","viewUrl":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/ieee-preprint-07153545.pdf","id":204270,"label_id":0},{"type":"doi","title":"10.1109\/TMM.2015.2454332","viewUrl":false,"id":false,"label_id":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[],"msr-author-ordering":[{"type":"text","value":"T. J. Tsai","user_id":0,"rest_url":false},{"type":"user_nicename","value":"anstolck","user_id":31054,"rest_url":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=anstolck"},{"type":"user_nicename","value":"mslaney","user_id":33020,"rest_url":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=mslaney"}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[664548],"msr_project":[171313],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"article","related_content":{"projects":[{"ID":171313,"post_title":"Dialog and Conversational Systems Research","post_name":"dialog-and-conversational-systems-research","post_type":"msr-project","post_date":"2014-03-14 09:46:35","post_modified":"2017-07-11 15:34:26","post_status":"publish","permalink":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/project\/dialog-and-conversational-systems-research\/","post_excerpt":"Conversational systems interact with people through language to assist, enable, or entertain. Research at Microsoft spans dialogs that use language exclusively, or in conjunctions with additional modalities like gesture; where language is spoken or in text; and in a variety of settings, such as conversational systems in apps or devices, and situated interactions in the real world. Projects Spoken Language Understanding","_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/171313"}]}}]},"_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/168609","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/168609\/revisions"}],"predecessor-version":[{"id":434883,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/168609\/revisions\/434883"}],"wp:attachment":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media?parent=168609"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=168609"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=168609"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=168609"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=168609"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=168609"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=168609"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=168609"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=168609"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=168609"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=168609"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=168609"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=168609"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}