{"id":1101105,"date":"2024-11-18T02:14:40","date_gmt":"2024-11-18T10:14:40","guid":{"rendered":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/?p=1101105"},"modified":"2025-10-31T13:38:59","modified_gmt":"2025-10-31T20:38:59","slug":"biomedparse-a-foundation-model-for-smarter-all-in-one-biomedical-image-analysis","status":"publish","type":"post","link":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/blog\/biomedparse-a-foundation-model-for-smarter-all-in-one-biomedical-image-analysis\/","title":{"rendered":"BiomedParse: A foundation model for smarter, all-in-one biomedical image analysis"},"content":{"rendered":"\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"788\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/BiomedParse-BlogHeroFeature-1400x788-1.jpg\" alt=\"A stylized illustration of a green line-drawn hand holding a transparent prism with colorful bands of light being refracted through it against a black background.\" class=\"wp-image-1102929\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/BiomedParse-BlogHeroFeature-1400x788-1.jpg 1400w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/BiomedParse-BlogHeroFeature-1400x788-1-300x169.jpg 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/BiomedParse-BlogHeroFeature-1400x788-1-1024x576.jpg 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/BiomedParse-BlogHeroFeature-1400x788-1-768x432.jpg 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/BiomedParse-BlogHeroFeature-1400x788-1-1066x600.jpg 1066w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/BiomedParse-BlogHeroFeature-1400x788-1-655x368.jpg 655w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/BiomedParse-BlogHeroFeature-1400x788-1-240x135.jpg 240w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/BiomedParse-BlogHeroFeature-1400x788-1-640x360.jpg 640w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/BiomedParse-BlogHeroFeature-1400x788-1-960x540.jpg 960w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/BiomedParse-BlogHeroFeature-1400x788-1-1280x720.jpg 1280w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/figure>\n\n\n\n<p>In cancer diagnosis or advanced treatments like immunotherapy, every detail in a medical image counts. Radiologists and pathologists rely on these images to track tumors, understand their boundaries, and analyze how they interact with surrounding cells. This work demands pinpoint accuracy across several tasks\u2014identifying whether a tumor is present, locating it precisely, and mapping its contours on complex CT scans or pathology slides.&nbsp;<\/p>\n\n\n\n<p>Yet, these crucial steps\u2014object recognition, detection, and segmentation\u2014are often tackled separately, which can limit the depth of analysis. Current tools like <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.nature.com\/articles\/s41467-024-44824-z\" target=\"_blank\" rel=\"noopener noreferrer\">MedSAM<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/segment-anything.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">SAM<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> focus on segmentation only, thus missing out on the opportunity to blend these insights holistically and relegating object as an afterthought.&nbsp;<\/p>\n\n\n\n<p>In this blog, we introduce <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.nature.com\/articles\/s41592-024-02499-w\" target=\"_blank\" rel=\"noopener noreferrer\">BiomedParse<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, a new approach for holistic image analysis by treating object as the first-class citizen. By unifying object recognition, detection, and segmentation into a single framework, BiomedParse allows users to specify what they\u2019re looking for through a simple, natural-language prompt. The result is a more cohesive, intelligent way of analyzing medical images that supports faster, more integrated clinical insights.\u00a0<\/p>\n\n\n\n<p>While biomedical segmentation datasets abound, there are relatively few prior works on object detection and recognition in biomedicine, let alone datasets covering all three tasks. To pretrain BiomedParse, we created the first such dataset by harnessing OpenAI\u2019s GPT-4 for data synthesis from <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/huggingface.co\/datasets\/microsoft\/BiomedParseData\" target=\"_blank\" rel=\"noopener noreferrer\">standard segmentation datasets<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n<p>BiomedParse is a single&nbsp;foundation model that can accurately segment biomedical objects across nine modalities, as seen in Figure 1, outperforming prior best methods while requiring orders of magnitude fewer user operations, as it doesn\u2019t require an object-specific bounding box. By learning semantic representation for individual object types, BiomedParse\u2019s superiority is particularly pronounced in the most challenging cases with irregularly shaped objects. Through joint pretraining of object recognition, detection, and segmentation, BiomedParse opens new possibilities for holistic image analysis and image-based discovery in biomedicine.&nbsp;&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"2185\" height=\"2560\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure1_BiomedParse-scaled.jpg\" alt=\"a, The GPT-4 constructed ontology showing a hierarchy of object types that are used to unify semantic concepts across datasets. Bar plots showing the number of images containing that object type. b, Bar plot showing the number of image\u2013mask\u2013description triples for each modality in BiomedParseData. CT is abbreviation for Computed Tomography. MRI is abbreviation for Magnetic Resonance Imaging. OCT is abbreviation for Optical Coherence Tomography. c, Flowchart of BiomedParse. BiomedParse takes an image and a text prompt as input and then outputs the segmentation masks for the objects specified in the prompt. Image-specific manual interaction such as bounding box or clicks is not required in our framework. To facilitate semantic learning for the image encoder, BiomedParse also incorporates a learning objective to classify the meta-object type. For online inference, GPT-4 is used to resolve text prompt into object types using the object ontology, which also uses the meta-object type output from BiomedParse to narrow down candidate semantic labels. d, Uniform Manifold Approximation and Projection (UMAP) plots contrasting the text embeddings for different cell types derived from BiomedParse text encoder (left) and PubMedBERT (right). e, UMAP plots contrasting the image embeddings for different cell types derived from BiomedParse image encoder (left) and Focal (right). \" class=\"wp-image-1102950\" style=\"width:825px;height:auto\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure1_BiomedParse-scaled.jpg 2185w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure1_BiomedParse-256x300.jpg 256w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure1_BiomedParse-874x1024.jpg 874w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure1_BiomedParse-768x900.jpg 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure1_BiomedParse-1311x1536.jpg 1311w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure1_BiomedParse-1748x2048.jpg 1748w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure1_BiomedParse-154x180.jpg 154w\" sizes=\"auto, (max-width: 2185px) 100vw, 2185px\" \/><figcaption class=\"wp-element-caption\">Figure 1. Overview of BiomedParse and BiomedParseData<em>.<\/em><\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"image-parsing-a-unifying-framework-for-holistic-image-analysis\">Image parsing: a unifying framework for holistic image analysis&nbsp;<\/h2>\n\n\n\n<p>Back in 2005, researchers first introduced the concept of \u201cimage parsing\u201d\u2014a unified approach to image analysis that jointly conducts object recognition, detection, and segmentation. Built on Bayesian networks, this early model offered a glimpse into a future of joint learning and reasoning in image analysis, though it was limited in scope and application. Fast forward to today, cutting-edge advances in generative AI have breathed new life into this vision. With our model, BiomedParse, we have created a foundation for biomedical image parsing that leverages interdependencies across the three subtasks, thus addressing key limitations in traditional methods. BiomedParse enables users to simply input a natural-language description of an object, which the model uses to predict both the object label and its segmentation mask, thus eliminating the need for a bounding box (Figure 1c). In other words, this joint learning approach lets users segment objects based on text alone.<\/p>\n\n\n\n\t<div class=\"border-bottom border-top border-gray-300 mt-5 mb-5 msr-promo text-center text-md-left alignwide\" data-bi-aN=\"promo\" data-bi-id=\"1160910\">\n\t\t\n\n\t\t<p class=\"msr-promo__label text-gray-800 text-center text-uppercase\">\n\t\t<span class=\"px-4 bg-white display-inline-block font-weight-semibold small\">video series<\/span>\n\t<\/p>\n\t\n\t<div class=\"row pt-3 pb-4 align-items-center\">\n\t\t\t\t\t\t<div class=\"msr-promo__media col-12 col-md-5\">\n\t\t\t\t<a class=\"bg-gray-300 display-block\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/story\/on-second-thought\/\" aria-label=\"On Second Thought\" data-bi-cN=\"On Second Thought\" target=\"_blank\">\n\t\t\t\t\t<img decoding=\"async\" class=\"w-100 display-block\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2026\/01\/MFST_feature_SecondThought_1400x788.jpg\" alt=\"On Second Thought with Sinead Bovell\" \/>\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t<div class=\"msr-promo__content p-3 px-5 col-12 col-md\">\n\n\t\t\t\t\t\t\t\t\t<h2 class=\"h4\">On Second Thought<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<p id=\"on-second-thought\" class=\"large\">A video series with Sinead Bovell built around the questions everyone\u2019s asking about AI. With expert voices from across Microsoft, we break down the tension and promise of this rapidly changing technology, exploring what\u2019s evolving and what\u2019s possible.<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<div class=\"wp-block-buttons justify-content-center justify-content-md-start\">\n\t\t\t\t\t<div class=\"wp-block-button\">\n\t\t\t\t\t\t<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/story\/on-second-thought\/\" aria-describedby=\"on-second-thought\" class=\"btn btn-brand glyph-append glyph-append-chevron-right\" data-bi-cN=\"On Second Thought\" target=\"_blank\">\n\t\t\t\t\t\t\tExplore the series\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div><!--\/.msr-promo__content-->\n\t<\/div><!--\/.msr-promo__inner-wrap-->\n\t<\/div><!--\/.msr-promo-->\n\t\n\n\n<h2 class=\"wp-block-heading\" id=\"harnessing-gpt-4-for-large-scale-data-synthesis-from-existing-datasets\">Harnessing GPT-4 for large-scale data synthesis from existing datasets&nbsp;<\/h2>\n\n\n\n<p>We created the first dataset for biomedical imaging parsing by harnessing GPT-4 for large-scale data synthesis from 45 existing biomedical segmentation datasets (Figure 1a and 1b). The key insight is to leverage readily available natural-language descriptions already in these datasets and use GPT-4 to organize this often messy, unstructured text with established biomedical object taxonomies.&nbsp;&nbsp;<\/p>\n\n\n\n<p>Specifically, we use GPT-4 to help create a unifying biomedical object taxonomy for image analysis and harmonize natural language descriptions from existing datasets with this taxonomy. We further leverage GPT-4 to synthesize additional variations of object descriptions to facilitate more robust text prompting.&nbsp;&nbsp;<\/p>\n\n\n\n<p>This enables us to construct BiomedParseData, a biomedical image analysis dataset comprising over 6 million sets of images, segmentation masks, and text descriptions drawn from more than 1 million images. This dataset includes 64 major biomedical object types, 82 fine-grained subtypes, and spans nine imaging modalities.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2242\" height=\"2560\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure2_BiomedParse-scaled.jpg\" alt=\"a, Box plot comparing the Dice score between our method and competing methods on 102,855 test instances (image\u2013mask\u2013label triples) across nine modalities. MedSAM and SAM require bounding box as input. We consider two settings: oracle bounding box (minimum bounding box covering the gold mask); bounding boxes generated from the text prompt by Grounding DINO, a state-of-the-art text-based grounding model. Each modality category contains multiple object types. Each object type was aggregated as the instance median to be shown in the plot. n in the plot denotes the number of test instances in the corresponding modality. b, Nine examples comparing the segmentation results by BiomedParse and the ground truth, using just the text prompt at the top. c, Box plot comparing the Dice score between our method and competing methods on a cell segmentation test set with n=42 images. BiomedParse requires only a single user operation (the text prompt \u2018Glandular structure in colon pathology\u2019). By contrast, to get competitive results, MedSAM and SAM require 430 operations (one bounding box per an individual cell). d, Five examples contrasting the segmentation results by BiomedParse and MedSAM, along with text prompts used by BiomedParse and bounding boxes used by MedSAM. e, Comparison between BiomedParse and MedSAM on a benign tumor image (top) and a malignant tumor image (bottom). The improvement of BiomedParse over MedSAM is even more pronounced on abnormal cells with irregular shapes. f, Box plot comparing the two-sided K\u2013S test P values between valid text prompt and invalid text prompt. BiomedParse learns to reject invalid text prompts describing object types not present in the image (small P value). We evaluated a total of 4,887 invalid prompts and 22,355 valid prompts. g, Plot showing the precision and recall of our method on detecting invalid text prompts across different K\u2013S test P value cutoff. h,i, Scatter-plots comparing the area under the receiver operating characteristic curve (AUROC) (h) and F1 (i) between BiomedParse and Grounding DINO on detecting invalid descriptions. \" class=\"wp-image-1102956\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure2_BiomedParse-scaled.jpg 2242w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure2_BiomedParse-263x300.jpg 263w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure2_BiomedParse-897x1024.jpg 897w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure2_BiomedParse-768x877.jpg 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure2_BiomedParse-1345x1536.jpg 1345w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure2_BiomedParse-1793x2048.jpg 1793w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure2_BiomedParse-158x180.jpg 158w\" sizes=\"auto, (max-width: 2242px) 100vw, 2242px\" \/><figcaption class=\"wp-element-caption\">Figure 2: Comparison on large-scale biomedical image segmentation datasets.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"state-of-the-art-performance-across-64-major-object-types-in-9-modalities\">State-of-the-art performance across 64 major object types in 9 modalities<\/h2>\n\n\n\n<p>We evaluated BiomedParse on a large held-out test set with 102,855 image-mask-label sets across 64 major object types in nine modalities. BiomedParse outperformed prior best methods such as MedSAM and SAM, even when oracle per-object bounding boxes were provided. In the more realistic setting when MedSAM and SAM used a state-of-the-art object detector (Grounding DINO) to propose bounding boxes, BiomedParse outperformed them by a wide margin, between 75 and 85 absolute points in dice score (Figure 2a). BiomedParse also outperforms a variety of other prominent methods such as SegVol, Swin UNETR, nnU-Net, DeepLab V3+, and UniverSeg.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2170\" height=\"2560\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure3_BiomedParse-scaled.jpg\" alt=\"a, Attention maps of text prompts for irregular-shaped objects, suggesting that BiomedParse learns rather faithful representation of their typical shapes. US, ultrasound. b\u2013d, Scatter-plots comparing the improvement in Dice score for BiomedParse over MedSAM with shape regularity in terms of convex ratio (b), box ratio (c) and inversed rotational inertia (d). A smaller number in the x axis means higher irregularity on average. Each dot represents an object type. e, Six examples contrasting BiomedParse and MedSAM on detecting irregular-shaped objects. Plots are ordered from the least irregular one (left) to the most irregular one (right). f,g Comparison between BiomedParseData and the benchmark dataset used by MedSAM in terms of convex ratio (f) and box ratio (g). BiomedParseData is a more faithful representation of real-world challenges in terms of irregular-shaped objects. h, Box plots comparing BiomedParse and competing approaches on BiomedParseData and the benchmark dataset used by MedSAM. BiomedParse has a larger improvement on BiomedParseData, which contains more diverse images and more irregular-shaped objects. The number of object types are as follows: n=50 for MedSAM benchmark and n=112 for BiomedParseData. \" class=\"wp-image-1102965\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure3_BiomedParse-scaled.jpg 2170w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure3_BiomedParse-254x300.jpg 254w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure3_BiomedParse-868x1024.jpg 868w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure3_BiomedParse-768x906.jpg 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure3_BiomedParse-1302x1536.jpg 1302w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure3_BiomedParse-1736x2048.jpg 1736w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure3_BiomedParse-153x180.jpg 153w\" sizes=\"auto, (max-width: 2170px) 100vw, 2170px\" \/><figcaption class=\"wp-element-caption\">Figure 3. Evaluation on detecting irregular-shaped objects.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"recognizing-and-segmenting-irregular-and-complex-objects\">Recognizing and segmenting irregular and complex objects<\/h2>\n\n\n\n<p>Biomedical objects often have complex and irregular shapes, which present significant challenges for segmentation, even with oracle bounding box. By joint learning with object recognition and detection, BiomedParse learns to model object-specific shapes, and its superiority is particularly pronounced for the most challenging cases (Figure 3). Encompassing a large collection of diverse object types in nine modalities, BiomedParseData also provides a much more realistic representation of object complexity in biomedicine.&nbsp;&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2276\" height=\"2560\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure4_BiomedParse-scaled.jpg\" alt=\"a, Six examples showing the results of object recognition by our method. Object recognition identifies and segments all objects in an image without requiring any user-provided input prompt. b\u2013d, Scatter-plots comparing the F1 (b), Precision (c) and Recall (d) scores between BiomedParse and Grounding DINO on identifying objects presented in the image. e, Comparison between BiomedParse and Grounding DINO on object identification in terms of median F1 score across different numbers of objects in the image. f, Box plot comparing BiomedParse and MedSAM\/SAM (using bounding boxes generated by Grounding DINO) on end-to-end object recognition (including segmentation) in relation to various modalities. g, Comparison between BiomedParse and MedSAM\/SAM (using bounding boxes generated by Grounding DINO) on end-to-end object recognition (including segmentation) in relation to numbers of distinct objects in the image.\" class=\"wp-image-1102968\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure4_BiomedParse-scaled.jpg 2276w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure4_BiomedParse-267x300.jpg 267w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure4_BiomedParse-910x1024.jpg 910w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure4_BiomedParse-768x864.jpg 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure4_BiomedParse-1366x1536.jpg 1366w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure4_BiomedParse-1821x2048.jpg 1821w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/Figure4_BiomedParse-160x180.jpg 160w\" sizes=\"auto, (max-width: 2276px) 100vw, 2276px\" \/><figcaption class=\"wp-element-caption\">Figure 4. Evaluation on object recognition.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"promising-step-toward-scaling-holistic-biomedical-image-analysis\">Promising step toward scaling holistic biomedical image analysis<\/h2>\n\n\n\n<p>By operating through a simple text prompt, BiomedParse requires substantially less user effort than prior best methods that typically require object-specific bounding boxes, especially when an image contains a large number of objects (Figure 2c). By modeling object recognition threshold, BiomedParse can detect invalid prompt and reject segmentation requests when an object is absent from the image. BiomedParse can be used to recognize and segment all known objects in an image in one fell swoop (Figure 4). By scaling holistic image analysis, BiomedParse can potentially be applied to key precision health applications such as early detection, prognosis, treatment decision support, and progression monitoring.&nbsp;&nbsp;<\/p>\n\n\n\n<p>Going forward, there are numerous growth opportunities. BiomedParse can be extended to handle more modalities and object types. It can be integrated into advanced multimodal frameworks such as <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/aka.ms\/llava-med\" target=\"_blank\" rel=\"noopener noreferrer\">LLaVA-Med<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> to facilitate conversational image analysis by \u201ctalking to the data.\u201d To facilitate research in biomedical image analysis, we have made BiomedParse <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/aka.ms\/biomedparse-release\" target=\"_blank\" rel=\"noopener noreferrer\">open-source<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> with Apache 2.0 license. We\u2019ve also made it available on <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/ai.azure.com\/explore\/models\/MedImageParse\/version\/3\/registry\/azureml\/latest\" target=\"_blank\" rel=\"noopener noreferrer\">Azure AI<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> for direct deployment and real-time inference. For more information, check out our <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/microsoft.github.io\/BiomedParse\/\" target=\"_blank\" rel=\"noopener noreferrer\">demo.<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>&nbsp;<\/p>\n\n\n\n<p>BiomedParse is a joint work with Providence and the University of Washington&#8217;s Paul G. Allen School of Computer Science & Engineering, and brings collaboration from multiple teams within Microsoft*. It reflects Microsoft\u2019s larger commitment to advancing multimodal generative AI for precision health, with other exciting progress such as <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/blog\/gigapath-whole-slide-foundation-model-for-digital-pathology\/\" target=\"_blank\" rel=\"noreferrer noopener\">GigaPath<\/a>, <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/large-scale-domain-specific-pretraining-for-biomedical-vision-language-processing\/\" target=\"_blank\" rel=\"noreferrer noopener\">BiomedCLIP<\/a>,\u202f <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/project\/project-maira\/\" target=\"_blank\" rel=\"noreferrer noopener\">LLaVA-Rad<\/a>, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/arxiv.org\/abs\/2310.10765\" target=\"_blank\" rel=\"noopener noreferrer\">BiomedJourney<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/rad-dino-exploring-scalable-medical-image-encoders-beyond-text-supervision\/\" target=\"_blank\" rel=\"noreferrer noopener\">MAIRA<\/a>, <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/rad-dino-exploring-scalable-medical-image-encoders-beyond-text-supervision\/\" target=\"_blank\" rel=\"noreferrer noopener\">Rad-DINO<\/a>, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/arxiv.org\/pdf\/2309.07778v5\" target=\"_blank\" rel=\"noopener noreferrer\">Virchow<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.\u202f&nbsp;<\/p>\n\n\n\n<p><em>(Acknowledgment footnote) *: Within Microsoft, it is a wonderful collaboration among Health Futures, MSR Deep Learning, and Nuance.&nbsp;<\/em><\/p>\n\n\n\n<p>Paper co-authors: Theodore Zhao, Yu Gu, <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/people\/jianwyan\/\">Jianwei Yang<\/a>, <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/people\/naotous\/\">Naoto Usuyama<\/a>, Ho Hin Lee, Sid Kiblawi, <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/people\/tristan\/\">Tristan Naumann<\/a>, <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/people\/jfgao\/\">Jianfeng Gao<\/a>, Angela Crabtree, Jacob Abel, Christine Moung-Wen, Brian Piening, Carlo Bifulco, Mu Wei, <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/people\/hoifung\/\">Hoifung Poon<\/a>, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/homes.cs.washington.edu\/~swang\/\">Sheng Wang<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>BiomedParse reimagines medical image analysis, integrating advanced AI to capture complex insights across imaging types\u2014a step forward for diagnostics and precision medicine.<\/p>\n","protected":false},"author":38004,"featured_media":1102929,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_hide_image_in_river":null,"footnotes":""},"categories":[1],"tags":[],"research-area":[13556,13553],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[269148,243984,269142],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-1101105","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-research-area-medical-health-genomics","msr-locale-en_us","msr-post-option-approved-for-river","msr-post-option-blog-homepage-featured","msr-post-option-include-in-river"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[849856],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[144931],"related-projects":[978063],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Hoifung Poon","user_id":32016,"display_name":"Hoifung Poon","author_link":"<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/people\/hoifung\/\" aria-label=\"Visit the profile page for Hoifung Poon\">Hoifung Poon<\/a>","is_active":false,"last_first":"Poon, Hoifung","people_section":0,"alias":"hoifung"},{"type":"guest","value":"theodore-zhao","user_id":"1104840","display_name":"Theodore Zhao","author_link":"Theodore Zhao","is_active":true,"last_first":"Zhao, Theodore","people_section":0,"alias":"theodore-zhao"},{"type":"guest","value":"mu-wei","user_id":"654207","display_name":"Mu Wei","author_link":"Mu Wei","is_active":true,"last_first":"Wei, Mu","people_section":0,"alias":"mu-wei"},{"type":"guest","value":"sheng-wang-2","user_id":"1104045","display_name":"Sheng Wang","author_link":"<a href=\"https:\/\/homes.cs.washington.edu\/~swang\/\" aria-label=\"Visit the profile page for Sheng Wang\">Sheng Wang<\/a>","is_active":true,"last_first":"Wang, Sheng","people_section":0,"alias":"sheng-wang-2"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/BiomedParse-BlogHeroFeature-1400x788-1-960x540.jpg\" class=\"img-object-cover\" alt=\"A stylized illustration of a green line-drawn hand holding a transparent prism with colorful bands of light being refracted through it against a black background.\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/BiomedParse-BlogHeroFeature-1400x788-1-960x540.jpg 960w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/BiomedParse-BlogHeroFeature-1400x788-1-300x169.jpg 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/BiomedParse-BlogHeroFeature-1400x788-1-1024x576.jpg 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/BiomedParse-BlogHeroFeature-1400x788-1-768x432.jpg 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/BiomedParse-BlogHeroFeature-1400x788-1-1066x600.jpg 1066w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/BiomedParse-BlogHeroFeature-1400x788-1-655x368.jpg 655w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/BiomedParse-BlogHeroFeature-1400x788-1-240x135.jpg 240w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/BiomedParse-BlogHeroFeature-1400x788-1-640x360.jpg 640w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/BiomedParse-BlogHeroFeature-1400x788-1-1280x720.jpg 1280w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2024\/11\/BiomedParse-BlogHeroFeature-1400x788-1.jpg 1400w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"","formattedDate":"November 18, 2024","formattedExcerpt":"BiomedParse reimagines medical image analysis, integrating advanced AI to capture complex insights across imaging types\u2014a step forward for diagnostics and precision medicine.","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts\/1101105","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/users\/38004"}],"replies":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/comments?post=1101105"}],"version-history":[{"count":27,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts\/1101105\/revisions"}],"predecessor-version":[{"id":1154360,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts\/1101105\/revisions\/1154360"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media\/1102929"}],"wp:attachment":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1101105"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/categories?post=1101105"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/tags?post=1101105"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1101105"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=1101105"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=1101105"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1101105"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1101105"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1101105"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=1101105"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=1101105"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}