{"id":621762,"date":"2019-11-25T09:00:08","date_gmt":"2019-11-25T17:00:08","guid":{"rendered":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/?p=621762"},"modified":"2019-11-25T08:26:25","modified_gmt":"2019-11-25T16:26:25","slug":"icebreaker-new-model-with-novel-element-wise-information-acquisition-method-reduces-cost-and-data-needed-to-train-machine-learning-models","status":"publish","type":"post","link":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/blog\/icebreaker-new-model-with-novel-element-wise-information-acquisition-method-reduces-cost-and-data-needed-to-train-machine-learning-models\/","title":{"rendered":"Icebreaker: New model with novel element-wise information acquisition method reduces cost and data needed to train machine learning models"},"content":{"rendered":"<p><a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/11\/MSResearch_20191119_NeurIPS_Icebreaker_1400x788.gif\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-622197\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/11\/MSResearch_20191119_NeurIPS_Icebreaker_1400x788.gif\" alt=\"Figure showing the two components of an Icebreaker model.\" width=\"1400\" height=\"788\" \/><\/a><\/p>\n<p style=\"text-align: left;\">In many real-life scenarios, obtaining information is costly, and getting fully observed data is almost impossible. For example, in the recruiting world, obtaining relevant information (in other words, a feature value) for a company could mean performing time-consuming interviews. The same applies to many other scenarios, such as in education and the medical field, where each feature value is an often more complex answer to a question. Unfortunately, AI-aided decision making usually requires large amounts of data. Microsoft researchers, in project <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/project\/minimum-data-ai\/\">Minimum Data AI<\/a>, aim to investigate how to best utilize AI algorithms to aid decision making while simultaneously minimizing data requirements.<\/p>\n<p>In our paper accepted at <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/neurips.cc\/Conferences\/2019\">the thirty-third Conference on Neural Information Processing Systems<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (NeurIPS 2019), titled \u201c<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/icebreaker-element-wise-active-information-acquisition-with-bayesian-deep-latent-gaussian-model\/\">Icebreaker: Element-wise efficient information acquisition with a Bayesian Deep Latent Gaussian Model<\/a>,\u201d Microsoft researchers, along with researchers in the Department of Engineering at University of Cambridge, tackle the challenge of deploying machine learning models when very little or no training data is initially available and also when acquiring each feature element of data is associated with costs. We call this challenge the ice-start problem. To solve this problem, we propose the Icebreaker solution, a novel deep generative model that minimizes the amount and cost of data required to train a machine learning model. This work can be combined with our previous work, <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/eddi-efficient-dynamic-discovery-of-high-value-information-with-partial-vae\/\">EDDI<\/a>, which performs efficient information acquisition in the test time.<\/p>\n<div id=\"attachment_622251\" style=\"width: 506px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/11\/icebreaker-5dd58c1530fc3.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-622251\" class=\"wp-image-622251 \" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/11\/icebreaker-5dd58c1530fc3.png\" alt=\"Graphic showing the two components of an icebreaker model. \" width=\"496\" height=\"486\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/11\/icebreaker-5dd58c1530fc3.png 885w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/11\/icebreaker-5dd58c1530fc3-300x294.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/11\/icebreaker-5dd58c1530fc3-768x752.png 768w\" sizes=\"auto, (max-width: 496px) 100vw, 496px\" \/><\/a><p id=\"caption-attachment-622251\" class=\"wp-caption-text\">Figure 1: Our model employs two components. The first component is a deep generative model (PA-BELGAM), shown in the top half of the model above, which features a novel inference algorithm that can explicitly quantify epistemic uncertainty. The second component is a set of new element-wise training data selection objectives for data acquisition, shown in the bottom half of the model.<\/p><\/div>\n<h3><b><span lang=\"EN-GB\">Partial Amortized Bayesian Deep Latent Gaussian Model (PA-BELGAM) <\/span><\/b><\/h3>\n<p><span lang=\"EN-GB\">To enable element-wise training data selection for a flexible model with explicit parameter uncertainty modelling, our work consists of two components: the first part is a deep generative model called the Partial Amortized Bayesian Deep Latent Gaussian Model (PA-BELGAM) with a novel inference algorithm, which can explicitly quantify epistemic uncertainty and can be trained with any volume of partially observed data. The second part is a set of new element-wise training data selection objectives for element-wise training data acquisition.<\/span><\/p>\n<div id=\"attachment_622104\" style=\"width: 310px\" class=\"wp-caption alignleft\"><a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/11\/Icebreaker-fig2.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-622104\" class=\"wp-image-622104 size-medium\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/11\/Icebreaker-fig2-300x261.png\" alt=\"Figure showing PA-BELGAM, a deep latent gaussian model with explicit parameter uncertainty estimation.\" width=\"300\" height=\"261\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/11\/Icebreaker-fig2-300x261.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/11\/Icebreaker-fig2.png 548w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><p id=\"caption-attachment-622104\" class=\"wp-caption-text\">Figure 2: PA-BELGAM, a deep latent gaussian model with explicit parameter uncertainty estimation, is also able to handle any subset of missing feature values.<\/p><\/div>\n<p><span lang=\"EN-GB\">The PA-BELGAM model is based on the variational autoencoder model. The key differences are the ability to handle missing elements of the data and the Bayesian treatment of the decoder weights \u03b8. Instead of using a standard deep neural network as the decoder to map data from a latent representation, we use a Bayesian neural network, and we put a prior distribution over the decoder weights \u03b8. <\/span><\/p>\n<p><span lang=\"EN-GB\">The design of PA-BELGAM leads to the new challenge of having to approximate the intractable posterior of the decoder weights and the latent variables in an efficient and accurate way. Our solution is to combine the efficiency of amortized inference with the accuracy of sampling methods. In general, the inference consists of an encoder parameter update using amortized inference followed by a decoder sampling using Stochastic Gradient MCMC. <\/span><\/p>\n<p>To enable the active element-wise training data selection, we then designed element-wise data selection objectives for different applications, considering both unsupervised settings (such as matrix imputation tasks, which can be used as a recommender), and supervised settings (such as classification or regression tasks). For imputation tasks, the objective aims to select the feature element that maximizes the reduction of the model parameter uncertainty. For the prediction tasks, the objective not only handles traditional one-step prediction but also considers active sequential prediction (such as <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/eddi-efficient-dynamic-discovery-of-high-value-information-with-partial-vae\/\">EDDI<\/a>). We designed the objective to both reduce the model parameter uncertainty and maximize the predictive power.<\/p>\n<div id=\"attachment_622107\" style=\"width: 878px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/11\/Icebreaker-Figure3.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-622107\" class=\"wp-image-622107\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/11\/Icebreaker-Figure3-1024x426.jpg\" alt=\"Figure 3: Results showing use of Icebreaker for training feature element acquisition on the MIMIC data set.\" width=\"868\" height=\"361\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/11\/Icebreaker-Figure3-1024x426.jpg 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/11\/Icebreaker-Figure3-300x125.jpg 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/11\/Icebreaker-Figure3-768x320.jpg 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/11\/Icebreaker-Figure3.jpg 1379w\" sizes=\"auto, (max-width: 868px) 100vw, 868px\" \/><\/a><p id=\"caption-attachment-622107\" class=\"wp-caption-text\">Figure 3: Results showing use of Icebreaker for training feature element acquisition on the MIMIC data set.<\/p><\/div>\n<p>As an example, we show our method using the largest publicly available medical dataset called <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/mimic.physionet.org\/\">MIMIC<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (note: this dataset does not use protected health information). We use Icebreaker for training feature element acquisition and combine it with our prior work on test time information acquisition and prediction called <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/eddi-efficient-dynamic-discovery-of-high-value-information-with-partial-vae\/\">EDDI<\/a>. In Figure 3, the graph on the left shows that our proposed method performs better than several baselines, achieving better test accuracy with less training data. The graph on the right in Figure 3 shows the number of data points for eight features as the total size of our data set grows. The non-linear growth for some of the features (for example, Glucose) indicates that our method is able to learn to select different features at different stages of the training procedures. For more experiments, please check out our <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/icebreaker-element-wise-active-information-acquisition-with-bayesian-deep-latent-gaussian-model\/\">NeurIPS paper<\/a>.<\/p>\n<p>We have released our <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/Icebreaker\">code<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> online on GitHub. We encourage you to experiment with the whole framework\u2014train a PA-BELGAM model with element-wise training data selection or simply try out PA-BELGAM as an alternative to variational autoencoders for your application. To learn more about the Minimum Data AI project, please visit our <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/project\/minimum-data-ai\/\">project page<\/a>.<\/p>\n<p>If you will be attending NeurIPS 2019 and are interested in learning about this work or the Minimum Data AI project, come find us at our poster session on Wednesday, December 11<sup>th<\/sup>, from 5:00 PM &#8211; 7:00 PM PST in East Exhibition Hall B and C. We will be there and will be happy to speak with you more about our research. Feel free to stop by and meet us at the Microsoft booth as well!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In many real-life scenarios, obtaining information is costly, and getting fully observed data is almost impossible. For example, in the recruiting world, obtaining relevant information (in other words, a feature value) for a company could mean performing time-consuming interviews. The same applies to many other scenarios, such as in education and the medical field, where [&hellip;]<\/p>\n","protected":false},"author":38679,"featured_media":621765,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[{"type":"user_nicename","value":"Cheng Zhang","user_id":"37428"},{"type":"user_nicename","value":"Sebastian Tschiatschek","user_id":"37179"}],"msr_hide_image_in_river":0,"footnotes":""},"categories":[1],"tags":[],"research-area":[13556],"msr-region":[256048],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-621762","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-region-global","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199561],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[587692],"related-events":[609480],"related-researchers":[],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/11\/MSResearch_20191106_NeurIPS_Icebreaker_1400x788-960x540.png\" class=\"img-object-cover\" alt=\"Graphic showing the components of the Icebreaker model\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/11\/MSResearch_20191106_NeurIPS_Icebreaker_1400x788-960x540.png 960w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/11\/MSResearch_20191106_NeurIPS_Icebreaker_1400x788-300x169.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/11\/MSResearch_20191106_NeurIPS_Icebreaker_1400x788-768x432.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/11\/MSResearch_20191106_NeurIPS_Icebreaker_1400x788-1024x576.png 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/11\/MSResearch_20191106_NeurIPS_Icebreaker_1400x788-1066x600.png 1066w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/11\/MSResearch_20191106_NeurIPS_Icebreaker_1400x788-655x368.png 655w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/11\/MSResearch_20191106_NeurIPS_Icebreaker_1400x788-343x193.png 343w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/11\/MSResearch_20191106_NeurIPS_Icebreaker_1400x788-640x360.png 640w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/11\/MSResearch_20191106_NeurIPS_Icebreaker_1400x788-1280x720.png 1280w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/11\/MSResearch_20191106_NeurIPS_Icebreaker_1400x788.png 1400w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"Cheng Zhang and Sebastian Tschiatschek","formattedDate":"November 25, 2019","formattedExcerpt":"In many real-life scenarios, obtaining information is costly, and getting fully observed data is almost impossible. For example, in the recruiting world, obtaining relevant information (in other words, a feature value) for a company could mean performing time-consuming interviews. The same applies to many other&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts\/621762","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/users\/38679"}],"replies":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/comments?post=621762"}],"version-history":[{"count":25,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts\/621762\/revisions"}],"predecessor-version":[{"id":623319,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts\/621762\/revisions\/623319"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media\/621765"}],"wp:attachment":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media?parent=621762"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/categories?post=621762"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/tags?post=621762"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=621762"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=621762"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=621762"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=621762"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=621762"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=621762"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=621762"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=621762"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}