{"id":1162754,"date":"2026-02-23T22:18:02","date_gmt":"2026-02-24T06:18:02","guid":{"rendered":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/?post_type=msr-research-item&#038;p=1162754"},"modified":"2026-04-02T12:59:42","modified_gmt":"2026-04-02T19:59:42","slug":"interaction-augmented-instruction-modeling-the-synergy-of-prompts-and-interactions-in-human-genai-collaboration","status":"publish","type":"msr-research-item","link":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/interaction-augmented-instruction-modeling-the-synergy-of-prompts-and-interactions-in-human-genai-collaboration\/","title":{"rendered":"Interaction-Augmented Instruction: Modeling the Synergy of Prompts and Interactions in Human-GenAI Collaboration"},"content":{"rendered":"<p><img decoding=\"async\" class=\"aligncenter size-large wp-image-1162759\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2026\/02\/Teaser-1024x575.png\" alt=\"A four-part overview diagram. (A) illustrates the research problem by contrasting a user struggling with a vague text prompt against a user successfully using text combined with GUI controls. (B) visualizes the Interaction-Augmented Instruction model as a node-link diagram connecting Human, Interaction, Text Prompt, Augmented Instruction, GenAI, and Artifacts. (C) depicts a pyramid structure showing the hierarchy from the Model (top) to Paradigms (middle) and Interfaces (bottom). (D) displays four thumbnails representing different usage scenarios.\" width=\"1024\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2026\/02\/Teaser-1024x575.png 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2026\/02\/Teaser-300x169.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2026\/02\/Teaser-768x431.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2026\/02\/Teaser-1536x863.png 1536w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2026\/02\/Teaser-2048x1151.png 2048w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2026\/02\/Teaser-1066x600.png 1066w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2026\/02\/Teaser-655x368.png 655w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2026\/02\/Teaser-240x135.png 240w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2026\/02\/Teaser-640x360.png 640w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2026\/02\/Teaser-960x540.png 960w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2026\/02\/Teaser-1280x720.png 1280w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2026\/02\/Teaser-1920x1080.png 1920w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/p>\n<p>Text prompt is the most common way for human-generative AI (GenAI) communication. Though convenient, it is challenging to convey fine-grained and referential intent. One promising solution is to combine text prompts with precise GUI interactions, like brushing and clicking. However, there lacks a formal model to capture synergistic designs between prompts and interactions, hindering their comparison and innovation. To fill this gap, via an iterative and deductive process, we develop the Interaction-Augmented Instruction (IAI) model, a compact entity\u2013relation graph formalizing how the combination of interactions and text prompts enhances human-GenAI communication. With the model, we distill twelve recurring and composable atomic interaction paradigms from prior tools, verifying our model\u2019s capability to facilitate systematic design characterization and comparison. Four usage scenarios further demonstrate the model\u2019s utility in applying, refining, and innovating these paradigms. These results illustrate the IAI model\u2019s descriptive, discriminative, and generative power for shaping future GenAI systems.<\/p>\n<div style=\"width: 640px;\" class=\"wp-video\"><video class=\"wp-video-shortcode\" id=\"video-1162754-1\" width=\"640\" height=\"360\" preload=\"metadata\" controls=\"controls\"><source type=\"video\/mp4\" src=\"https:\/\/interaction-augmented-instruction.github.io\/video.mp4?_=1\" \/><a href=\"https:\/\/interaction-augmented-instruction.github.io\/video.mp4\">https:\/\/interaction-augmented-instruction.github.io\/video.mp4<\/a><\/video><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Text prompt is the most common way for human-generative AI (GenAI) communication. Though convenient, it is challenging to convey fine-grained and referential intent. One promising solution is to combine text prompts with precise GUI interactions, like brushing and clicking. However, there lacks a formal model to capture synergistic designs between prompts and interactions, hindering their [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"CHI 2026","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2026-04-13","msr_highlight_text":"Honorable Mention Award","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":null,"footnotes":""},"msr-research-highlight":[246574],"research-area":[13554],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[269148,269142],"msr-field-of-study":[246691],"msr-conference":[260644],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-1162754","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-highlight-award","msr-research-area-human-computer-interaction","msr-locale-en_us","msr-post-option-approved-for-river","msr-post-option-include-in-river","msr-field-of-study-computer-science"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2026-04-13","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"Honorable Mention Award","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/arxiv.org\/abs\/2510.26069","label_id":"252679","label":0},{"type":"doi","viewUrl":"false","id":"false","title":"https:\/\/doi.org\/10.1145\/3772318.3790505","label_id":"243106","label":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[],"msr-author-ordering":[{"type":"text","value":"Leixian Shen","user_id":0,"rest_url":false},{"type":"text","value":"Yifang Wang","user_id":0,"rest_url":false},{"type":"text","value":"Huamin Qu","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Xing Xie","user_id":34906,"rest_url":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Xing Xie"},{"type":"user_nicename","value":"Haotian Li","user_id":43593,"rest_url":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Haotian Li"}],"msr_impact_theme":[],"msr_research_lab":[199560],"msr_event":[1161977],"msr_group":[286754],"msr_project":[],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":[],"_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1162754","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":7,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1162754\/revisions"}],"predecessor-version":[{"id":1162784,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1162754\/revisions\/1162784"}],"wp:attachment":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1162754"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=1162754"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1162754"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=1162754"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=1162754"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=1162754"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1162754"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1162754"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=1162754"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=1162754"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=1162754"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1162754"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=1162754"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}