{"id":921579,"date":"2023-02-22T09:05:19","date_gmt":"2023-02-22T17:05:19","guid":{"rendered":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/?p=921579"},"modified":"2023-02-22T13:09:41","modified_gmt":"2023-02-22T21:09:41","slug":"research-focus-week-of-february-20-2023","status":"publish","type":"post","link":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/blog\/research-focus-week-of-february-20-2023\/","title":{"rendered":"Research Focus: Week of February 20, 2023"},"content":{"rendered":"\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"264\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2023\/02\/RF10_blog_banner_1400x264.png\" alt=\"Microsoft Research Focus 10 edition, week of February 20, 2023\" class=\"wp-image-921600\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2023\/02\/RF10_blog_banner_1400x264.png 1400w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2023\/02\/RF10_blog_banner_1400x264-300x57.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2023\/02\/RF10_blog_banner_1400x264-1024x193.png 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2023\/02\/RF10_blog_banner_1400x264-768x145.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2023\/02\/RF10_blog_banner_1400x264-240x45.png 240w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-pullquote\"><blockquote><p><em class=\"\">Welcome to Research Focus, a new series of blog posts that highlights notable publications, events, code\/datasets, new hires and other milestones from across the research community at Microsoft.<\/em><\/p><\/blockquote><\/figure>\n\n\n<aside id=accordion-71bf6f3e-828a-4d19-bd41-23fb8a1ff94f class=\"msr-table-of-contents-block accordion mb-5 pb-0\" data-bi-aN=\"table-of-contents\">\n\t<button class=\"btn btn-collapse bg-gray-100 mb-0 display-flex justify-content-between\" type=\"button\" data-mount=\"collapse\" data-target=\"#accordion-collapse-71bf6f3e-828a-4d19-bd41-23fb8a1ff94f\" aria-expanded=\"true\" aria-controls=\"accordion-collapse-71bf6f3e-828a-4d19-bd41-23fb8a1ff94f\">\n\t\t<span class=\"msr-table-of-contents-block__label subtitle\">In this article<\/span>\n\t\t<span class=\"msr-table-of-contents-block__current mr-4 text-gray-600 font-weight-normal\" aria-hidden=\"true\"><\/span>\n\t<\/button>\n\t<div id=\"accordion-collapse-71bf6f3e-828a-4d19-bd41-23fb8a1ff94f\" class=\"msr-table-of-contents-block__collapse-wrapper collapse show\" data-parent=\"#accordion-71bf6f3e-828a-4d19-bd41-23fb8a1ff94f\">\n\t\t<div class=\"accordion-body bg-gray-100 border-top pt-4\">\n\t\t\t<ol class=\"msr-table-of-contents-block__list\">\n\t\t\t\t\t\t\t\t\t<li class=\"msr-table-of-contents-block__list-item\">\n\t\t\t\t\t\t<a href=\"#self-supervised-multi-task-pretraining-with-control-transformers-smart\" class=\"msr-table-of-contents-block__list-item-link\">Self-supervised Multi-task pretrAining with contRol Transformers (SMART)<\/a>\n\t\t\t\t\t<\/li>\n\t\t\t\t\t\t\t\t\t<li class=\"msr-table-of-contents-block__list-item\">\n\t\t\t\t\t\t<a href=\"#a-ranking-game-for-imitation-learning\" class=\"msr-table-of-contents-block__list-item-link\">A Ranking Game for Imitation Learning<\/a>\n\t\t\t\t\t<\/li>\n\t\t\t\t\t\t\t\t\t<li class=\"msr-table-of-contents-block__list-item\">\n\t\t\t\t\t\t<a href=\"#microsoft-helps-goodleaf-farms-drive-agricultural-innovation-with-data\" class=\"msr-table-of-contents-block__list-item-link\">Microsoft helps GoodLeaf Farms drive agricultural innovation with data<\/a>\n\t\t\t\t\t<\/li>\n\t\t\t\t\t\t\t\t\t<li class=\"msr-table-of-contents-block__list-item\">\n\t\t\t\t\t\t<a href=\"#reinforcement-learning-open-source-fest\" class=\"msr-table-of-contents-block__list-item-link\">Reinforcement Learning Open Source Fest<\/a>\n\t\t\t\t\t<\/li>\n\t\t\t\t\t\t\t<\/ul>\n\t\t<\/div>\n\t<\/div>\n\t<span class=\"msr-table-of-contents-block__progress-bar\"><\/span>\n<\/aside>\n\n\n\n<h6 id=\"new-research\" class=\"has-blue-color has-text-color\">NEW RESEARCH<\/h6>\n\n\n\n<h2 id=\"self-supervised-multi-task-pretraining-with-control-transformers-smart\">Self-supervised Multi-task pretrAining with contRol Transformers (SMART)<\/h2>\n\n\n\n<p>Many real-world applications require sequential decision making, where an agent interacts with a stochastic environment to perform a task. For example, a navigating robot is expected to control itself and move to a target using sensory information it receives along the way. Learning the proper control policy can be complicated by environmental uncertainty and high-dimensional perceptual information, such as raw-pixel spaces. More importantly, the learned strategy is specific to the task (e.g. which target to reach) and the agent (e.g., a two-leg robot or a four-leg robot). That means that a good strategy for one task does not necessarily apply to a new task or a different agent.<\/p>\n\n\n\n<p>Pre-training a foundation model can help improve overall efficiency when facing a large variety of control tasks and agents. However, although foundation models have achieved incredible success in language domains, different control tasks and agents can have large discrepancies, making it challenging to find a universal foundation. It becomes even more challenging in real-world scenarios that lack supervision or high-quality behavior data.<\/p>\n\n\n\n<p>In a new paper: <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/smart-self-supervised-multi-task-pretraining-with-control-transformers\/\" target=\"_blank\" rel=\"noreferrer noopener\">SMART: Self-supervised Multi-task pretrAining with contRol Transformers<\/a>, Microsoft researchers tackle these challenges and propose a generic pre-training framework for control problems. Their research demonstrates that a single pre-trained SMART model can be fine-tuned for various visual-control tasks and agents, either seen or unseen, with significantly improved performance and learning efficiency. SMART is also resilient to low-quality datasets and works well even when random behaviors comprise the pre-training data.<\/p>\n\n\n\n<div class=\"wp-block-buttons is-content-justification-center is-content-justification-center is-layout-flex wp-container-core-buttons-is-layout-16018d1d wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button is-style-outline is-style-outline--1\"><a data-bi-type=\"button\" class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/smart-self-supervised-multi-task-pretraining-with-control-transformers\/\" target=\"_blank\" rel=\"noreferrer noopener\">Read the paper<\/a><\/div>\n<\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dots\"\/>\n\n\n\n\t<div class=\"border-bottom border-top border-gray-300 mt-5 mb-5 msr-promo text-center text-md-left alignwide\" data-bi-aN=\"promo\" data-bi-id=\"670821\">\n\t\t\n\n\t\t<p class=\"msr-promo__label text-gray-800 text-center text-uppercase\">\n\t\t<span class=\"px-4 bg-white display-inline-block font-weight-semibold small\">Spotlight: Microsoft research newsletter<\/span>\n\t<\/p>\n\t\n\t<div class=\"row pt-3 pb-4 align-items-center\">\n\t\t\t\t\t\t<div class=\"msr-promo__media col-12 col-md-5\">\n\t\t\t\t<a class=\"bg-gray-300 display-block\" href=\"https:\/\/info.microsoft.com\/ww-landing-microsoft-research-newsletter.html\" aria-label=\"Microsoft Research Newsletter\" data-bi-cN=\"Microsoft Research Newsletter\" target=\"_blank\">\n\t\t\t\t\t<img decoding=\"async\" class=\"w-100 display-block\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/09\/Newsletter_Banner_08_2019_v1_1920x1080.png\" alt=\"\" \/>\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t<div class=\"msr-promo__content p-3 px-5 col-12 col-md\">\n\n\t\t\t\t\t\t\t\t\t<h2 class=\"h4\">Microsoft Research Newsletter<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<p id=\"microsoft-research-newsletter\" class=\"large\">Stay connected to the research community at Microsoft.<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<div class=\"wp-block-buttons justify-content-center justify-content-md-start\">\n\t\t\t\t\t<div class=\"wp-block-button is-style-fill-chevron\">\n\t\t\t\t\t\t<a href=\"https:\/\/info.microsoft.com\/ww-landing-microsoft-research-newsletter.html\" aria-describedby=\"microsoft-research-newsletter\" class=\"btn btn-brand glyph-append glyph-append-chevron-right\" data-bi-cN=\"Microsoft Research Newsletter\" target=\"_blank\">\n\t\t\t\t\t\t\tSubscribe today\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div><!--\/.msr-promo__content-->\n\t<\/div><!--\/.msr-promo__inner-wrap-->\n\t<\/div><!--\/.msr-promo-->\n\t\n\n\n<h6 id=\"new-research\" class=\"has-blue-color has-text-color\">NEW RESEARCH<\/h6>\n\n\n\n<h2 id=\"a-ranking-game-for-imitation-learning\">A Ranking Game for Imitation Learning<\/h2>\n\n\n\n<p>Reinforcement learning relies on environmental reward feedback to learn meaningful behaviors. Since reward specification is a hard problem, imitation learning (IL) may be used to bypass reward specification and learn from expert data, often via Inverse Reinforcement Learning (IRL) techniques.&nbsp; In IL, while near-optimal expert data is very informative, it can be difficult to obtain. Even with infinite data, expert data cannot imply a total ordering over trajectories as preferences can. On the other hand, learning from preferences alone is challenging, as a large number of preferences are required to infer a high-dimensional reward function, though preference data is typically much easier to collect than expert demonstrations. The classical IRL formulation learns from expert demonstrations but provides no mechanism to incorporate learning from offline preferences.<\/p>\n\n\n\n<p>In a new paper: <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/a-ranking-game-for-imitation-learning\/\" target=\"_blank\" rel=\"noreferrer noopener\">A Ranking Game for Imitation Learning<\/a> accepted at <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/jmlr.org\/tmlr\/\" target=\"_blank\" rel=\"noopener noreferrer\">TMLR 2023<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, researchers from UT Austin, Microsoft Research, and UMass Amherst create a unified algorithmic framework for IRL that incorporates both expert and suboptimal information for imitation learning. They propose a new framework for imitation learning called \u201crank-game\u201d which treats imitation as a two-player ranking-based game between a policy and a reward. In this game, the reward agent learns to satisfy pairwise performance rankings between behaviors, while the policy agent learns to maximize this reward. A novel ranking loss function is proposed, giving an algorithm that can simultaneously learn from expert demonstrations and preferences, gaining the advantages of both modalities. Experimental results in the paper show that the proposed method achieves state-of-the-art sample efficiency and can solve previously unsolvable tasks in the Learning from Observation (LfO) setting. Project video and code can be found <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/hari-sikchi.github.io\/rank-game\/\" target=\"_blank\" rel=\"noopener noreferrer\">on GitHub<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1100\" height=\"864\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2023\/02\/rank-game.png\" alt=\"rank-game: The Policy agent maximizes the reward function by interacting with the environment. The Reward agent satisfies a set of behavior rankings obtained from various sources: generated by the policy agent (vanilla), automatically generated (auto), or offline annotated rankings obtained from a human or offline dataset (pref). Treating this game in the Stackelberg framework leads to either Policy being a leader and Reward being a follower, or vice versa.\" class=\"wp-image-921615\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2023\/02\/rank-game.png 1100w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2023\/02\/rank-game-300x236.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2023\/02\/rank-game-1024x804.png 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2023\/02\/rank-game-768x603.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2023\/02\/rank-game-229x180.png 229w\" sizes=\"auto, (max-width: 1100px) 100vw, 1100px\" \/><figcaption class=\"wp-element-caption\">Figure 1: <strong>Rank-game:<\/strong> The Policy agent maximizes the reward function by interacting with the environment. The Reward agent satisfies a set of behavior rankings obtained from various sources: generated by the policy agent (vanilla), automatically generated (auto), or offline annotated rankings obtained from a human or offline dataset (pref). Treating this game in the Stackelberg framework leads to either Policy being a leader and Reward being a follower, or vice versa.<\/figcaption><\/figure>\n\n\n\n<div class=\"wp-block-buttons is-content-justification-center is-content-justification-center is-layout-flex wp-container-core-buttons-is-layout-16018d1d wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button is-style-outline is-style-outline--2\"><a data-bi-type=\"button\" class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/a-ranking-game-for-imitation-learning\/\" target=\"_blank\" rel=\"noreferrer noopener\">Read the paper<\/a><\/div>\n<\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dots\"\/>\n\n\n\n<h6 id=\"news\" class=\"has-blue-color has-text-color\">NEWS<\/h6>\n\n\n\n<h2 id=\"microsoft-helps-goodleaf-farms-drive-agricultural-innovation-with-data\">Microsoft helps GoodLeaf Farms drive agricultural innovation with data<\/h2>\n\n\n\n<p>Vertical indoor farming uses extensive technology to manage production and optimize growing conditions. This includes movement of grow benches, lighting, irrigation, and air and temperature controls. Data and analytics can help vertical farms produce the highest possible yields and quality.<\/p>\n\n\n\n<p>Canadian vertical farm pioneer <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.goodleaffarms.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">GoodLeaf Farms<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> has <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.newswire.ca\/news-releases\/driving-agricultural-innovation-with-data-822282504.html\" target=\"_blank\" rel=\"noopener noreferrer\">announced a partnership<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> with Microsoft and data and analytics firm Adastra to optimize crop production and quality. GoodLeaf has deployed <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/azure.microsoft.com\/en-us\/products\/synapse-analytics\/\" target=\"_blank\" rel=\"noopener noreferrer\">Microsoft Azure Synapse Analytics<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/powerplatform.microsoft.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">Microsoft Power Platform<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> to utilize the vast amounts of data it collects.<\/p>\n\n\n\n<p>GoodLeaf is also collaborating with <a href=\"https:\/\/cm-edgetun.pages.dev\/research\/\" target=\"_blank\" rel=\"noreferrer noopener\">Microsoft Research<\/a> through <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/project\/project-farmvibes\/\" target=\"_blank\" rel=\"noreferrer noopener\">Project FarmVibes<\/a>, using GoodLeaf\u2019s data to support research into controlled environment agriculture.<\/p>\n\n\n\n<p>GoodLeaf\u2019s farm in Guelph, Ontario, and two currently under construction in Calgary and Montreal, use a connected system of cameras and sensors to manage plant seeding, growing mediums, germination, temperature, humidity, nutrients, lighting, and air flow. Data science and analytics help the company grow microgreens and baby greens in Canada year-round, no matter the weather using a hydroponics system and specialized LED lights.&nbsp;<\/p>\n\n\n\n<div class=\"wp-block-buttons is-content-justification-center is-content-justification-center is-layout-flex wp-container-core-buttons-is-layout-16018d1d wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button\"><a data-bi-type=\"button\" class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/www.newswire.ca\/news-releases\/driving-agricultural-innovation-with-data-822282504.html\" target=\"_blank\" rel=\"noreferrer noopener\">Learn more<\/a><\/div>\n<\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dots\"\/>\n\n\n\n<h6 id=\"opportunity\" class=\"has-blue-color has-text-color\">OPPORTUNITY<\/h6>\n\n\n\n<h2 id=\"reinforcement-learning-open-source-fest\">Reinforcement Learning Open Source Fest<\/h2>\n\n\n\n<p>Proposals are now being accepted for Reinforcement Learning (RL) Open Source Fest 2023, a global online program that introduces students to open-source RL programs and software development. Our goal is to bring together a diverse group of students from around the world to help solve open-source RL problems and advance state-of-the-art research&nbsp;and development. The program produces open-source code written and released to benefit all.<\/p>\n\n\n\n<p>Accepted students will join a four-month research project from May to August 2023, working virtually alongside researchers, data scientists, and engineers on the Microsoft Research New York City <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/project\/real-world-reinforcement-learning\/\" target=\"_blank\" rel=\"noreferrer noopener\">Real World Reinforcement Learning<\/a> team. Students will also receive a $10,000 USD stipend. At the end of the program, students will present each of their projects to the Microsoft Research Real World Reinforcement Learning team online.<\/p>\n\n\n\n<p>The proposal deadline is Monday, April 3, 2023, at 11:59 PM ET. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/aka.ms\/RLOSFest\" target=\"_blank\" rel=\"noopener noreferrer\">Learn more and submit your proposal<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> today.<\/p>\n\n\n\n<div class=\"wp-block-buttons is-content-justification-center is-content-justification-center is-layout-flex wp-container-core-buttons-is-layout-16018d1d wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button\"><a data-bi-type=\"button\" class=\"wp-block-button__link wp-element-button\" href=\"http:\/\/aka.ms\/RLOSFest\" target=\"_blank\" rel=\"noreferrer noopener\">Learn more<\/a><\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Welcome to Research Focus, a new series of blog posts that highlights notable publications, events, code\/datasets, new hires and other milestones from across the research community at Microsoft. Many real-world applications require sequential decision making, where an agent interacts with a stochastic environment to perform a task. For example, a navigating robot is expected to [&hellip;]<\/p>\n","protected":false},"author":42183,"featured_media":921603,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[],"msr_hide_image_in_river":0,"footnotes":""},"categories":[1],"tags":[],"research-area":[13556,13547],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[243984],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-921579","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-research-area-systems-and-networking","msr-locale-en_us","msr-post-option-blog-homepage-featured"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199565,199571],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[633669],"related-groups":[714067],"related-projects":[881235,568491,239387],"related-events":[],"related-researchers":[],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2023\/02\/RF10_blog_hero_1400x788-960x540.png\" class=\"img-object-cover\" alt=\"Microsoft Research Focus 10 edition, week of February 20, 2023\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2023\/02\/RF10_blog_hero_1400x788-960x540.png 960w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2023\/02\/RF10_blog_hero_1400x788-300x169.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2023\/02\/RF10_blog_hero_1400x788-1024x576.png 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2023\/02\/RF10_blog_hero_1400x788-768x432.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2023\/02\/RF10_blog_hero_1400x788-1066x600.png 1066w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2023\/02\/RF10_blog_hero_1400x788-655x368.png 655w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2023\/02\/RF10_blog_hero_1400x788-343x193.png 343w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2023\/02\/RF10_blog_hero_1400x788-240x135.png 240w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2023\/02\/RF10_blog_hero_1400x788-640x360.png 640w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2023\/02\/RF10_blog_hero_1400x788-1280x720.png 1280w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2023\/02\/RF10_blog_hero_1400x788.png 1400w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"","formattedDate":"February 22, 2023","formattedExcerpt":"Welcome to Research Focus, a new series of blog posts that highlights notable publications, events, code\/datasets, new hires and other milestones from across the research community at Microsoft. Many real-world applications require sequential decision making, where an agent interacts with a stochastic environment to perform&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts\/921579","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/users\/42183"}],"replies":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/comments?post=921579"}],"version-history":[{"count":19,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts\/921579\/revisions"}],"predecessor-version":[{"id":924324,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts\/921579\/revisions\/924324"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media\/921603"}],"wp:attachment":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media?parent=921579"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/categories?post=921579"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/tags?post=921579"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=921579"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=921579"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=921579"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=921579"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=921579"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=921579"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=921579"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=921579"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}