{"id":708913,"date":"2020-12-07T07:59:52","date_gmt":"2020-12-07T15:59:52","guid":{"rendered":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/?p=708913"},"modified":"2020-12-07T20:39:02","modified_gmt":"2020-12-08T04:39:02","slug":"neurips-2020-moving-toward-real-world-reinforcement-learning-via-batch-rl-strategic-exploration-and-representation-learning","status":"publish","type":"post","link":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/blog\/neurips-2020-moving-toward-real-world-reinforcement-learning-via-batch-rl-strategic-exploration-and-representation-learning\/","title":{"rendered":"NeurIPS 2020: Moving toward real-world reinforcement learning via batch RL, strategic exploration, and representation learning"},"content":{"rendered":"\n<figure class=\"wp-block-image alignwide size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/12\/1400x788_RL_Comp_No_logo_still-1024x576.jpg\" alt=\"diagram, schematic\" class=\"wp-image-710134\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/12\/1400x788_RL_Comp_No_logo_still-1024x576.jpg 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/12\/1400x788_RL_Comp_No_logo_still-300x169.jpg 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/12\/1400x788_RL_Comp_No_logo_still-768x432.jpg 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/12\/1400x788_RL_Comp_No_logo_still-1536x864.jpg 1536w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/12\/1400x788_RL_Comp_No_logo_still-2048x1152.jpg 2048w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/12\/1400x788_RL_Comp_No_logo_still-16x9.jpg 16w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/12\/1400x788_RL_Comp_No_logo_still-1066x600.jpg 1066w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/12\/1400x788_RL_Comp_No_logo_still-655x368.jpg 655w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/12\/1400x788_RL_Comp_No_logo_still-343x193.jpg 343w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/12\/1400x788_RL_Comp_No_logo_still-640x360.jpg 640w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/12\/1400x788_RL_Comp_No_logo_still-960x540.jpg 960w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/12\/1400x788_RL_Comp_No_logo_still-1280x720.jpg 1280w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/12\/1400x788_RL_Comp_No_logo_still-1920x1080.jpg 1920w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>As human beings, we encounter unfamiliar situations all the time\u2014learning to drive, living on our own for the first time, starting a new job. And while we can anticipate what to expect based on what others have told us or what we\u2019ve picked up from books and depictions in movies and TV, it isn\u2019t until we\u2019re behind the wheel of a car, maintaining an apartment, or doing a job in a workplace that we\u2019re able to take advantage of one of the most important means of learning: by trying. We make deliberate decisions, see how they pan out, then make more choices and take note of those results, becoming\u2014we hope\u2014better drivers, renters and workers in the process. We learn by interacting with our environments.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote has-text-align-left is-layout-flow wp-block-quote-is-layout-flow\"><p>\u201cHumans have an intuitive understanding of physics, and it\u2019s because&nbsp;when&nbsp;we\u2019re kids, we push things off of tables and stuff like that,\u201d says&nbsp;<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/people\/akshaykr\/\" target=\"_blank\" rel=\"noreferrer noopener\">Principal Researcher&nbsp;Akshay&nbsp;Krishnamurthy<\/a>.&nbsp;\u201cBut if you only watch videos of things falling off tables, you will not actually know about this intuitive gravity business.&nbsp;So our ability to do experimentation in the world is very, very important for us to generalize.\u201d&nbsp;<\/p><\/blockquote>\n\n\n\n<p>For our AI&nbsp;to improve in the world in which we&nbsp;operate,&nbsp;it would stand to reason that our technology be able to do the same. To learn not just from the data it\u2019s been given, as has largely been the&nbsp;approach&nbsp;in machine learning, but to also learn to figure out what additional data it needs to get better.&nbsp;<\/p>\n\n\n\n<p>\u201cWe want AIs to make decisions,&nbsp;and&nbsp;reinforcement learning&nbsp;is the study of how to make decisions,\u201d says&nbsp;Krishnamurthy.&nbsp;&nbsp;&nbsp;<\/p>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"margin-callout\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 annotations__list--right\">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t\t<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/event\/neurips-2020\/\" target=\"_self\" aria-label=\"Microsoft at NeurIPS 2020 \" data-bi-type=\"annotated-link\" data-bi-cN=\"Microsoft at NeurIPS 2020 \" class=\"annotations__list-thumbnail\" >\n\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"172\" height=\"96\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/11\/1920x720_Event_Page_Banner-343x193.png\" class=\"mb-2\" alt=\"illustrated icons related to artificial intelligence for Microsoft's involvement at NeurIPS 2020\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/11\/1920x720_Event_Page_Banner-343x193.png 343w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/11\/1920x720_Event_Page_Banner-1066x600.png 1066w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/11\/1920x720_Event_Page_Banner-655x368.png 655w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/11\/1920x720_Event_Page_Banner-640x360.png 640w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/11\/1920x720_Event_Page_Banner-960x540.png 960w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/11\/1920x720_Event_Page_Banner-1280x720.png 1280w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/11\/1920x720_Event_Page_Banner-1920x1080.png 1920w\" sizes=\"auto, (max-width: 172px) 100vw, 172px\" \/>\t\t\t\t<\/a>\n\t\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">EVENT<\/span>\n\t\t\t<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/event\/neurips-2020\/\" data-bi-cN=\"Microsoft at NeurIPS 2020 \" data-external-link=\"false\" data-bi-aN=\"margin-callout\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>Microsoft at NeurIPS 2020 <\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t\t\t<p class=\"annotations__caption text-neutral-400 mt-2\">Check out Microsoft at NeurIPS 2020, including all of our NeurIPS publications, the Microsoft session schedule, and open career opportunities<\/p>\n\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n<p>Krishnamurthy is a member of the reinforcement learning group at the Microsoft Research lab in New York City, one of several teams helping to steer the course of reinforcement learning at Microsoft. There are also dedicated groups in Redmond, Washington; Montreal; Cambridge, United Kingdom; and Asia; and they\u2019re working toward a collective goal: <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/project\/real-world-reinforcement-learning\/\">RL for the real world<\/a>. They\u2019ve seen their efforts pay off. The teams have translated foundational research into the award-winning <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/azure.microsoft.com\/en-us\/services\/cognitive-services\/personalizer\/\">Azure Personalizer<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, a reinforcement learning system that helps customers build applications that become increasingly customized to the user, which has been successfully deployed in many Microsoft products, such as Xbox.<\/p>\n\n\n\n<p>While reinforcement learning has been around almost as long as machine learning, there&#8217;s&nbsp;still much to explore and understand to support long-term progress with real-world implications and wide applicability, as underscored by the <em>17<\/em> RL-related papers being presented by Microsoft researchers at the <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/event\/neurips-2020\/\">34th Conference on Neural Information Processing Systems (NeurIPS 2020)<\/a>. Here, we explore a selection of the work through the lens of three areas:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>Batch RL<\/strong>, a framework in which agents leverage past experiences, which is a vital capability for real-world applications, particularly in safety-critical scenarios<\/li><li><strong>Strategic exploration<\/strong>, mechanisms by which algorithms identify and collect relevant information, which is crucial for successfully optimizing performance<\/li><li><strong>Representation learning<\/strong>, through which agents summarize and compress inputs to enable more effective exploration, generalization, and optimization<\/li><\/ul>\n\n\n\n<h2 id=\"batch-rl-using-a-static-dataset-to-learn-a-policy\">Batch RL: Using a static dataset to learn a policy<\/h2>\n\n\n\n<p>In traditional RL problems, agents learn on the job. They\u2019re introduced into an environment, act in that environment, and note the outcomes, learning which behaviors get them closer to completing their task. Batch RL takes a different approach: an agent tries to learn a good policy from a static dataset of past experiences, collected\u2014for example\u2014in the regular operation of an existing system in which it will be deployed. While it\u2019s less intuitive than the direct trial-and-error nature of interactive RL, says <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/people\/alekha\/\">Principal Research Manager Alekh Agarwal<\/a>, this framework has some crucial upsides.<\/p>\n\n\n\n<p>\u201cYou can take advantage of any and every available ounce of data that relates to your problem before your agent ever sees the light of day, and that means they can already start at a much higher performance point; they make fewer errors and generally learn much better,\u201d says Agarwal. This is especially important in safety-critical scenarios such as healthcare and autonomous systems.<\/p>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"margin-callout\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 annotations__list--left\">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Publication<\/span>\n\t\t\t<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/provably-good-batch-off-policy-reinforcement-learning-without-great-exploration\/\" data-bi-cN=\"Provably Good Batch Reinforcement Learning Without Great Exploration\" data-external-link=\"false\" data-bi-aN=\"margin-callout\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>Provably Good Batch Reinforcement Learning Without Great Exploration<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"margin-callout\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 annotations__list--right\">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Publication<\/span>\n\t\t\t<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/morel-model-based-offline-reinforcement-learning\/\" data-bi-cN=\"MOReL: Model-Based Offline Reinforcement Learning\" data-external-link=\"false\" data-bi-aN=\"margin-callout\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>MOReL: Model-Based Offline Reinforcement Learning<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n<p>The papers \u201c<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/provably-good-batch-off-policy-reinforcement-learning-without-great-exploration\/\">Provably Good Batch Reinforcement Learning Without Great Exploration<\/a>\u201d and \u201c<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/morel-model-based-offline-reinforcement-learning\/\">MOReL: Model-Based Offline Reinforcement Learning<\/a>\u201d tackle the same batch RL challenge. Static datasets can\u2019t possibly cover every situation an agent will encounter in deployment, potentially leading to an agent that performs well on observed data and poorly on unobserved data. This can make an agent susceptible to \u201ccascading failures,\u201d in which one wrong move leads to a series of other decisions that completely derails the agent. Oftentimes, researchers won\u2019t know until after deployment how effective a dataset was, explains Agarwal.<\/p>\n\n\n\n<p>The papers seek to optimize with the available dataset by preparing for the worst. While showing optimism in the face of uncertainty\u2014that is, treating even wrong moves as learning opportunities\u2014may work well when an agent can interact with its environment, batch RL doesn\u2019t afford an agent a chance to test its beliefs; it only has access to the dataset. So instead, researchers take a pessimistic approach, learning a policy based on the worst-case scenarios in the hypothetical world that could have produced the dataset they\u2019re working with. Performing well under the worst conditions helps ensure even better performance in deployment. So there are two questions at play, Agarwal says: how do you reason about a set of all the worlds that are consistent with a particular dataset and take worst case over them, and how do you find the best policy in this worst-case sense? \u201cProvably Good Batch Reinforcement Learning Without Great Exploration,\u201d which was coauthored by Agarwal, explores these questions in model-free settings, while \u201cMOReL: Model-Based Offline Reinforcement Learning\u201d explores them in a model-based framework.<\/p>\n\n\n\n<p>\u201cProvably Good Batch Reinforcement Learning Without Great Exploration\u201d provides strong theoretical guarantees for such pessimistic techniques, even when the agent perceives its environment through complex sensory observations, a first in the field. A key upshot of the algorithms and results is that when the dataset is sufficiently diverse, the agent provably learns the best possible behavior policy, with guarantees degrading gracefully with the quality of the dataset. MOReL provides convincing empirical demonstrations in physical systems such as robotics, where the underlying dynamics, based on the laws of physics, can often be learned well using a reasonable amount of data. In such settings, the researchers demonstrate that model-based approaches to pessimistic reasoning achieve state-of-the-art empirical performance.<\/p>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"margin-callout\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 annotations__list--left\">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Publication <\/span>\n\t\t\t<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/empirical-likelihood-for-contextual-bandits\/\" data-bi-cN=\"Empirical Likelihood for Contextual Bandits\" data-external-link=\"false\" data-bi-aN=\"margin-callout\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>Empirical Likelihood for Contextual Bandits<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n<p>A third paper, \u201c<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/empirical-likelihood-for-contextual-bandits\/\">Empirical Likelihood for Contextual Bandits<\/a>,\u201d explores another important and practical question in the batch RL space: how much reward is expected when the policy created using a given dataset is run in the real world? Because the answer can\u2019t be truly known, researchers rely on confidence intervals, which provide bounds on future performance when the future is like the past. As applied in this paper, these bounds can be used to decide training details\u2014the types of learning, representation, or features employed.<\/p>\n\n\n\n<p>Confidence intervals are particularly challenging in RL because unbiased estimators of performance decompose into observations with wildly different scales, says <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/people\/jcl\/\">Partner Researcher Manager John Langford<\/a>, a coauthor on the paper. In the work, researchers compare two crude ways to address this: by randomly rounding things to apply binomial confidence intervals, which are too loose, and by using the asymptotically Gaussian structure of any random variable, which is invalid for small numbers of samples. The researchers\u2019 approach, based on empirical likelihood techniques, manages to be tight like the asymptotic Gaussian approach while still being a valid confidence interval. These tighter and sharper confidence intervals are currently being deployed in Personalizer to help customers better design and assess the performance of applications.<\/p>\n\n\n\n<p><strong><em>Additio<\/em><\/strong><em><strong>nal reading<\/strong>: For more on batch RL, check out the NeurIPS paper \u201c<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/multi-task-batch-reinforcement-learning-with-metric-learning\/\">Multi-task Batch Reinforcement Learning with Metric Learning<\/a>.\u201d<\/em><\/p>\n\n\n\n<h2 id=\"strategic-exploration-gathering-data-more-selectively\">Strategic exploration: Gathering data more selectively<\/h2>\n\n\n\n<p>In a learning framework in which knowledge comes by way of trial and error, interactions are a hot commodity, and the information they yield can vary significantly. So how an agent chooses to interact with an environment matters. Exploring without a sense of what will result in valuable information can, for example, negatively impact system performance and erode user faith, and even if an agent\u2019s actions aren\u2019t damaging, choices that provide less-than-useful information can slow the learning process. Meanwhile, avoiding parts of an environment in which it knows there is no good reward in favor of areas where it\u2019s likely to gain new insight will make for a smarter agent.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote has-text-align-right is-layout-flow wp-block-quote-is-layout-flow\"><p>\u201cOnce you\u2019re deployed in the real world, if you want to learn from your experience in a very sample-efficient manner, then strategic exploration basically tells you how to collect the smallest amount of data, how to collect the smallest amount of experience, that is sufficient for doing good learning,\u201d says Agarwal.<\/p><\/blockquote>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"margin-callout\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 annotations__list--left\">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Publication<\/span>\n\t\t\t<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/pc-pg-policy-cover-directed-exploration-for-provable-policy-gradient-learning\/\" data-bi-cN=\"PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning\" data-external-link=\"false\" data-bi-aN=\"margin-callout\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n<p>In \u201c<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/pc-pg-policy-cover-directed-exploration-for-provable-policy-gradient-learning\/\">PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning<\/a>,\u201d Agarwal and his coauthors explore gradient decent\u2013based approaches for RL, called policy gradient methods, which are popular because they\u2019re flexibly usable across a variety of observation and action spaces, relying primarily on the ability to compute gradients with respect to policy parameters as is readily found in most modern deep learning frameworks. However, the theoretical RL literature provides few insights into adding exploration to this class of methods, and there\u2019s a plethora of heuristics that aren\u2019t provably robust. Building on their <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/on-the-theory-of-policy-gradient-methods-optimality-approximation-and-distribution-shift\/\">earlier theoretical work on better understanding of policy gradient approaches<\/a>, the researchers introduce the Policy Cover-Policy Gradient (PC-PG) algorithm, a model-free method by which an agent constructs an ensemble of policies, each one optimized to do something different. This ensemble provides a device for exploration; the agent continually seeks out further diverse behaviors not well represented in the current ensemble to augment it. The researchers theoretically prove PC-PG is more robust than many other strategic exploration approaches and demonstrate empirically that it works on a variety of tasks, from challenging exploration tasks in discrete spaces to those with richer observations.<\/p>\n\n\n\n\n\t<div class=\"border-bottom border-top border-gray-300 mt-5 mb-5 msr-promo text-center text-md-left alignwide\" data-bi-aN=\"promo\" data-bi-id=\"1141385\">\n\t\t\n\n\t\n\t<div class=\"row pt-3 pb-4 align-items-center\">\n\t\t\t\t\t\t<div class=\"msr-promo__media col-12 col-md-5\">\n\t\t\t\t<a class=\"bg-gray-300 display-block\" href=\"https:\/\/ai.azure.com\/labs\" aria-label=\"Azure AI Foundry Labs\" data-bi-cN=\"Azure AI Foundry Labs\" target=\"_blank\">\n\t\t\t\t\t<img decoding=\"async\" class=\"w-100 display-block\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2025\/06\/Azure-AI-Foundry_1600x900.jpg\" \/>\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t<div class=\"msr-promo__content p-3 px-5 col-12 col-md\">\n\n\t\t\t\t\t\t\t\t\t<h2 class=\"h4\">Azure AI Foundry Labs<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<p id=\"azure-ai-foundry-labs\" class=\"large\">Get a glimpse of potential future directions for AI, with these experimental technologies from Microsoft Research.<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<div class=\"wp-block-buttons justify-content-center justify-content-md-start\">\n\t\t\t\t\t<div class=\"wp-block-button\">\n\t\t\t\t\t\t<a href=\"https:\/\/ai.azure.com\/labs\" aria-describedby=\"azure-ai-foundry-labs\" class=\"btn btn-brand glyph-append glyph-append-chevron-right\" data-bi-cN=\"Azure AI Foundry Labs\" target=\"_blank\">\n\t\t\t\t\t\t\tAzure AI Foundry\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div><!--\/.msr-promo__content-->\n\t<\/div><!--\/.msr-promo__inner-wrap-->\n\t<\/div><!--\/.msr-promo-->\n\t\n\n\n\n<div class=\"annotations \" data-bi-aN=\"margin-callout\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 annotations__list--left\">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Publication<\/span>\n\t\t\t<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/information-theoretic-regret-bounds-for-online-nonlinear-control\/\" data-bi-cN=\"Information Theoretic Regret Bounds for Online Nonlinear Control\" data-external-link=\"false\" data-bi-aN=\"margin-callout\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>Information Theoretic Regret Bounds for Online Nonlinear Control<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n<p>In the paper \u201c<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/information-theoretic-regret-bounds-for-online-nonlinear-control\/\">Information Theoretic Regret Bounds for Online Nonlinear Control<\/a>,\u201d researchers bring strategic exploration techniques to bear on continuous control problems. While reinforcement learning and continuous control both involve sequential decision-making, continuous control is more focused on physical systems, such as those in aerospace engineering, robotics, and other industrial applications, where the goal is more about achieving stability than optimizing reward, explains Krishnamurthy, a coauthor on the paper. <\/p>\n\n\n\n<p>The paper departs from classical control theory, which is grounded in linear relationships where random exploration is sufficient, by considering a <em>nonlinear <\/em>model that can more accurately capture real-world physical systems. However, nonlinear systems require more sophisticated exploration strategies for information acquisition. Addressing this challenge via the principle of optimism in the face of uncertainty, the paper proposes the Lower Confidence-based Continuous Control (LC<sup>3<\/sup>) algorithm, a model-based approach that maintains uncertainty estimates on the system dynamics and assumes the most favorable dynamics when planning. The paper includes theoretical results showing that LC<sup>3<\/sup> efficiently controls nonlinear systems, while experiments show that LC<sup>3<\/sup> outperforms existing control methods, particularly in tasks with discontinuities and contact points, which demonstrates the importance of strategic exploration in such settings.<\/p>\n\n\n\n<p><em><strong>Additional reading:<\/strong> For more on strategic exploration, check out the NeurIPS paper \u201c<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/provably-adaptive-reinforcement-learning-in-metric-spaces\/\">Provably adaptive reinforcement learning in metric spaces<\/a>.\u201d<\/em><\/p>\n\n\n\n<h2 id=\"representation-learning-simplifying-complicated-environments\">Representation learning: Simplifying complicated environments<\/h2>\n\n\n\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/1206.5538\">Gains in deep learning are due in part to representation learning<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, which can be described as the process of boiling complex information down into the details relevant for completing a specific task. <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/people\/devonh\/\">Principal Researcher Devon Hjelm<\/a>, who works on representation learning in computer vision, sees representation learning in RL as shifting some emphasis from rewards to the internal workings of the agents\u2014how they acquire and analyze facts to better model the dynamics of their environment.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote has-text-align-left is-layout-flow wp-block-quote-is-layout-flow\"><p>\u201cBeing able to look at your agent, look inside, and say, \u2018OK, what have you learned?\u2019 is an important step toward deployment because it\u2019ll give us some insight on how then they\u2019ll behave,\u201d says Hjelm. \u201cAnd if we don\u2019t do that, the risk is that we might find out just by their actions, and that\u2019s not necessarily as desirable.\u201d<\/p><\/blockquote>\n\n\n\n<p>Representation learning also provides an elegant conceptual framework for obtaining provably efficient algorithms for complex environments and advancing the theoretical foundations of RL.<\/p>\n\n\n\n<p>\u201cWe know RL is not statistically tractable in general; if you want to provably solve an RL problem, you need to assume some structure in the environment, and a nice conceptual thing to do is to assume the structure exists, but that you don\u2019t know it and then you have to discover it,\u201d says Krishnamurthy. But the challenge in doing so is tightly coupled with exploration in a chicken-and-egg situation: you need this structure, or compact representation, to explore because the problem is too complicated without it, but you need to explore to collect informative data to learn the representation.<\/p>\n\n\n\n<p>In two separate papers, Krishnamurthy and Hjelm, along with their coauthors, apply representation learning to two common RL challenges: exploration and generalization, respectively.<\/p>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"margin-callout\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 annotations__list--right\">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Publication<\/span>\n\t\t\t<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/deep-reinforcement-and-infomax-learning\/\" data-bi-cN=\"Deep Reinforcement and InfoMax Learning\" data-external-link=\"false\" data-bi-aN=\"margin-callout\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>Deep Reinforcement and InfoMax Learning<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n<p>With \u201c<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/deep-reinforcement-and-infomax-learning\/\">Deep Reinforcement and InfoMax Learning<\/a>,\u201d Hjelm and his coauthors bring what they\u2019ve learned about representation learning in other research areas to RL. In his computer vision work, Hjelm has been doing self-supervised learning, in which tasks based on label-free data are used to promote strong representations for downstream applications. He gives the example of showing a vision model augmented versions of the same images\u2014so an image of a cat resized and then in a different color, then the same augmentations applied to an image of a dog\u2014so it can learn not only that the augmented cat images came from the same cat image, but that the dog images, though processed similarly, came from a different image. Through this process, the model learns the information content that is similar across instances of similar things. For example, it might learn that all cats tend to have certain key characteristics, such as pointy ears and whiskers. Hjelm likens these augmented images to different perspectives of the same object an RL agent might encounter moving around an environment.<\/p>\n\n\n\n<p>The paper explores how to encourage an agent to execute the actions that will enable it to decide that different states constitute the same thing. The researchers introduce Deep Reinforcement and InfoMax Learning (DRIML), an auxiliary objective based on <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/blog\/deep-infomax-learning-good-representations-through-mutual-information-maximization\/\">Deep InfoMax<\/a>. From different time steps of trajectories over the same reward-based policy, an agent needs to determine if what it\u2019s \u201cseeing\u201d is from the same episode, conditioned on the action it took. Positive examples are drawn from the same trajectory in the same episode; negative examples are created by swapping one of the states out for a future state or state from another trajectory. Incorporating the objective into the RL algorithm C51, the researchers show improved performance in the series of gym environments known as Procgen. In performing well across increasingly difficult versions of the same environment, the agent proved it was learning information that wound up being applicable to new situations, demonstrating generalization.<\/p>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"margin-callout\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 annotations__list--left\">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Publication<\/span>\n\t\t\t<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/flambe-structural-complexity-and-representation-learning-of-low-rank-mdps\/\" data-bi-cN=\"FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs\" data-external-link=\"false\" data-bi-aN=\"margin-callout\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n<p>In \u201c<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/flambe-structural-complexity-and-representation-learning-of-low-rank-mdps\/\">FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs<\/a>,\u201d Krishnamurthy and his coauthors present the algorithm FLAMBE. FLAMBE seeks to exploit the trove of information available in an environment by setting up a prediction problem to learn that much-needed representation, a step that is conceptually similar to the self-supervised problem in DRIML. The prediction problem used in FLAMBE is maximum likelihood estimation: given its current observation, what does an agent expect to see next. In making such a prediction, FLAMBE learns a representation that exposes information relevant for determining the next state in a way that\u2019s easy for the algorithm to access, facilitating efficient planning and learning. An important additional benefit is that redundant information is filtered away. <\/p>\n\n\n\n<p>FLAMBE uses this representation to explore by synthesizing reward functions that encourage the agent to visit all the directions in the representation space. The exploration process drives the agent to new parts of the state space, where it sets up another maximum likelihood problem to refine the representation, and the process repeats. The result of this iterative process is a universal representation of the environment that can be used after the fact to find a near-optimal policy for any reward function in that environment without further exploration. In the paper, the researchers show FLAMBE provably learns such a universal representation and the dimensionality of the representation, as well as the sample complexity of the algorithm, scales with the rank of the transition operator describing the environment.<\/p>\n\n\n\n<p><strong><em>Additional reading<\/em><\/strong>:<em> For more work at the intersection of reinforcement learning and representation learning, check out the NeurIPS papers \u201c<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/learning-the-linear-quadratic-regulator-from-nonlinear-observations\/\">Learning the Linear Quadratic Regulator from Nonlinear Observations<\/a>\u201d and \u201c<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/sample-efficient-reinforcement-learning-of-undercomplete-pomdps\/\">Sample-Efficient Reinforcement Learning of Undercomplete POMDPs<\/a>.\u201d<\/em><\/p>\n\n\n\n<h2 id=\"the-exploration-continues-additional-rl-neurips-papers\">The exploration continues: Additional RL NeurIPS papers<\/h2>\n\n\n\n<p>The above papers represent a portion of Microsoft research in the RL space included at this year\u2019s NeurIPS. To continue the journey, check out these other RL-related Microsoft NeurIPS papers, and for a deeper dive, check out <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/blog\/research-collection-reinforcement-learning-at-microsoft\/\">milestones and past research contributing to today\u2019s RL landscape<\/a> and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/aka.ms\/AA99j9e\">RL\u2019s move from the lab into Microsoft products and services<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n<p>To learn about other work being presented by Microsoft researchers at the conference, visit the <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/event\/neurips-2020\/\">Microsoft at NeurIPS 2020<\/a> page.<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>&#8220;<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/policy-improvement-via-imitation-of-multiple-oracles\/\">Policy Improvement via Imitation of Multiple Oracles<\/a><em>,&#8221; Ching-An Cheng, Andrey Kolobov, Alekh Agarwal<\/em><\/li><\/ul>\n\n\n\n<ul class=\"wp-block-list\"><li>\u201c<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/safe-reinforcement-learning-via-curriculum-induction\/\">Safe Reinforcement Learning via Curriculum Induction<\/a>,&#8221; <em>Matteo Turchetta, Andrey Kolobov, Shital Shah, Andreas Krause, Alekh Agarwal<\/em><\/li><\/ul>\n\n\n\n<ul class=\"wp-block-list\"><li>\u201c<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/the-loca-regret-a-consistent-metric-to-evaluate-model-based-behavior-in-reinforcement-learning\/\">The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning<\/a>,\u201d <em>Harm van Seijen, Hadi Nekoei, Evan Racah, Sarath Chandar<\/em><\/li><\/ul>\n\n\n\n<ul class=\"wp-block-list\"><li>\u201c<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/constrained-episodic-reinforcement-learning-in-concave-convex-and-knapsack-settings\/\">Constrained episodic reinforcement learning in concave-convex and knapsack settings<\/a>,\u201d <em>Kiant\u00e9 Brantley, Miroslav Dudik, Thodoris Lykouris, Sobhan Miryoosefi, Max Simchowitz, Aleksandrs Slivkins, Wen Sun<\/em><\/li><\/ul>\n\n\n\n<ul class=\"wp-block-list\"><li>\u201c<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/efficient-contextual-bandits-with-continuous-actions\/\">Efficient Contextual Bandits with Continuous Actions<\/a>,\u201d <em>Maryam Majzoubi, Chicheng Zhang, Rajan Chari, Akshay Krishnamurthy, John Langford, Aleksandrs Slivkins<\/em><\/li><\/ul>\n\n\n\n<ul class=\"wp-block-list\"><li>\u201c<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/ave-assistance-via-empowerment\/\">AvE: Assistance via Empowerment<\/a>,\u201d <em>Yuqing Du, Stas Tiomkin, Emre Kiciman, Daniel Polani, Pieter Abbeel, Anca Dragan<\/em><\/li><\/ul>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>As human beings, we encounter unfamiliar situations all the time\u2014learning to drive, living on our own for the first time, starting a new job. And while we can anticipate what to expect based on what others have told us or what we\u2019ve picked up from books and depictions in movies and TV, it isn\u2019t until [&hellip;]<\/p>\n","protected":false},"author":38838,"featured_media":710134,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[],"msr_hide_image_in_river":0,"footnotes":""},"categories":[1],"tags":[],"research-area":[13556],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[243984],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-708913","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-locale-en_us","msr-post-option-blog-homepage-featured"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199560,199561,199563,199565,437514],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[568491],"related-events":[708199],"related-researchers":[],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/12\/1400x788_RL_Comp_No_logo_still-960x540.jpg\" class=\"img-object-cover\" alt=\"diagram, schematic\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/12\/1400x788_RL_Comp_No_logo_still-960x540.jpg 960w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/12\/1400x788_RL_Comp_No_logo_still-300x169.jpg 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/12\/1400x788_RL_Comp_No_logo_still-1024x576.jpg 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/12\/1400x788_RL_Comp_No_logo_still-768x432.jpg 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/12\/1400x788_RL_Comp_No_logo_still-1536x864.jpg 1536w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/12\/1400x788_RL_Comp_No_logo_still-2048x1152.jpg 2048w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/12\/1400x788_RL_Comp_No_logo_still-16x9.jpg 16w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/12\/1400x788_RL_Comp_No_logo_still-1066x600.jpg 1066w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/12\/1400x788_RL_Comp_No_logo_still-655x368.jpg 655w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/12\/1400x788_RL_Comp_No_logo_still-343x193.jpg 343w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/12\/1400x788_RL_Comp_No_logo_still-640x360.jpg 640w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/12\/1400x788_RL_Comp_No_logo_still-1280x720.jpg 1280w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/12\/1400x788_RL_Comp_No_logo_still-1920x1080.jpg 1920w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"","formattedDate":"December 7, 2020","formattedExcerpt":"As human beings, we encounter unfamiliar situations all the time\u2014learning to drive, living on our own for the first time, starting a new job. And while we can anticipate what to expect based on what others have told us or what we\u2019ve picked up from&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts\/708913","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/users\/38838"}],"replies":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/comments?post=708913"}],"version-history":[{"count":37,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts\/708913\/revisions"}],"predecessor-version":[{"id":710737,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts\/708913\/revisions\/710737"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media\/710134"}],"wp:attachment":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media?parent=708913"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/categories?post=708913"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/tags?post=708913"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=708913"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=708913"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=708913"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=708913"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=708913"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=708913"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=708913"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=708913"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}