{"id":898830,"date":"2022-11-16T08:27:56","date_gmt":"2022-11-16T16:27:56","guid":{"rendered":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/?post_type=msr-blog-post&#038;p=898830"},"modified":"2022-12-01T02:09:23","modified_gmt":"2022-12-01T10:09:23","slug":"weakly-supervised-learning-substantially-reduces-the-number-of-labels-required-for-intracranial-haemorrhage-detection-on-head-ct","status":"publish","type":"msr-blog-post","link":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/articles\/weakly-supervised-learning-substantially-reduces-the-number-of-labels-required-for-intracranial-haemorrhage-detection-on-head-ct\/","title":{"rendered":"Should expert radiologists label individual images or entire examinations?"},"content":{"rendered":"\n<p><\/p>\n\n\n\n<p><strong>Winner of an <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/www.rsna.org\/annual-meeting\/awards-recognition\/trainee-research-prize\">RSNA 2022 Trainee Research Prize<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong>: <\/p>\n\n\n\n<p><em>Read the full pre-print on Arxiv <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2211.15924\">[2211.15924] Weakly Supervised Learning Significantly Reduces the Number of Labels Required for Intracranial Hemorrhage Detection on Head CT (arxiv.org)<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/em><\/p>\n\n\n\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/jacopoteneggi.github.io\/\" target=\"_blank\" rel=\"noopener noreferrer\">Jacopo Teneggi<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, a PhD student at Johns Hopkins University (JHU), used InnerEye OSS and Azure Machine Learning to help answer the question: \u201c<em>Should expert radiologists label individual images or entire examinations<\/em>?\u201d. This project was conducted in collaboration with <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.medschool.umaryland.edu\/profiles\/Yi-Paul\/\" target=\"_blank\" rel=\"noopener noreferrer\">Prof. Paul Yi, MD<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (Director of University of Maryland Medical Intelligent Imaging (UM2ii) Center) and under the supervision of <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/sites.google.com\/view\/jsulam\" target=\"_blank\" rel=\"noopener noreferrer\">Prof. Jeremias Sulam<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (Assistant Professor in the&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.google.com\/url?q=https%3A%2F%2Fbme.jhu.edu&sa=D&sntz=1&usg=AOvVaw32_PFhHBW91FePwhltlQNn\" target=\"_blank\" rel=\"noopener noreferrer\">Biomedical Engineering Department<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> at JHU).&nbsp;<\/p>\n\n\n\n<p><strong>Challenge&nbsp;<\/strong><\/p>\n\n\n\n<p>Deep Learning (DL) continues to drive exciting advances across medical imaging tasks, from image reconstruction and enhancement to automatic lesion detection and segmentation. However, the labor-intensive collection of image annotations by experts hinders the development of DL models in radiology. Usually, people train classifiers in a fully supervised fashion requiring many labelled CT slices, and\u202fconsequently huge budget for radiologists\u202fto look at each image slice. An alternative approach is to use weakly-supervised learning, such as using radiology examination report labels, instead of the individual image labels. Jacopo\u2019s work compares these two approaches to compare their performance and scalability with large numbers of images.&nbsp;<\/p>\n\n\n\n<p><strong>Solution&nbsp;<\/strong><\/p>\n\n\n\n<p>A Multiple Instance Learning (MIL) weakly supervised machine learning model was developed and compared to a strongly supervised model that used individual image labels for training. Both models were trained on the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.kaggle.com\/competitions\/rsna-intracranial-hemorrhage-detection\/overview\" target=\"_blank\" rel=\"noopener noreferrer\">RSNA 2019 Brain CT Haemorrhage Challenge<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> dataset, which comprises 21 784 examinations with a total of 752 803 images. <\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"830\" height=\"205\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2022\/11\/RSNAChallenge.jpg\" alt=\"graphical user interface, application\" class=\"wp-image-899124\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2022\/11\/RSNAChallenge.jpg 830w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2022\/11\/RSNAChallenge-300x74.jpg 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2022\/11\/RSNAChallenge-768x190.jpg 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2022\/11\/RSNAChallenge-240x59.jpg 240w\" sizes=\"auto, (max-width: 830px) 100vw, 830px\" \/><figcaption><a href=\"https:\/\/www.kaggle.com\/competitions\/rsna-intracranial-hemorrhage-detection\/overview\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/www.kaggle.com\/competitions\/rsna-intracranial-hemorrhage-detection\/overview<\/a><\/figcaption><\/figure>\n\n\n\n<p>Every image in the RSNA dataset was labelled by expert neuroradiologists with the type(s) of haemorrhage present. In addition to the RSNA dataset, models were compared on two external test sets\u2014the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/headctstudy.qure.ai\/dataset\" target=\"_blank\" rel=\"noopener noreferrer\">CQ500 dataset<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (436 examinations) and the CT-ICH dataset (75 examinations). The CQ500 dataset only provides examination-level labels, while the CT-ICH dataset provides both image-level labels and manual annotations of the bleeds. Hence, Jacopo extended the CQ500 dataset with the segmentations provided in the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/physionet.org\/content\/bhx-brain-bounding-box\/1.1\/\" target=\"_blank\" rel=\"noopener noreferrer\">BHX dataset<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.&nbsp;&nbsp;<\/p>\n\n\n\n<p>The MIL framework can regard each examination as a <em>bag of images<\/em> labelled as \u201c<em>with haemorrhage<\/em>\u201d as soon as one image shows signs of haemorrhage, or \u201c<em>healthy<\/em>\u201d otherwise. By using an attention-based MIL model, two models were trained: a fully supervised model using image labels, and a weakly supervised model using examination-level labels. A ResNet18 was pretrained on ImageNet as encoder, a two-layer attention module, and a binary linear classifier with sigmoid activation. With these, a strong learner is the composition of the encoder with the classifier, whereas a weak learner is the composition of the encoder (applied to each image in the examination), the attention mechanism, and the final classifier.&nbsp;&nbsp;<\/p>\n\n\n\n<p>The ML models were originally developed using the PyTorch framework and were easily ported to <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/InnerEye-DeepLearning\" target=\"_blank\" rel=\"noopener noreferrer\">InnerEye OSS Deep Learning Toolkit<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> via the <em>Bring Your Own PyTorch Lightning Model <\/em>functionality of the OSS. The integration of InnerEye OSS with <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/learn.microsoft.com\/en-gb\/azure\/machine-learning\/\" target=\"_blank\" rel=\"noopener noreferrer\">Azure Machine Learning<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> allowed Jacopo to store and manage the large volumes of data required for training straightforwardly, and to train models at scale on multiple GPUs efficiently. The flexibility of the OSS with Azure Machine Learning, coupled with the direct availability of <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/torchio.readthedocs.io\/\" target=\"_blank\" rel=\"noopener noreferrer\">TorchIO<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> in the OSS: a popular open-source Python library for efficient processing and augmentation of 3D medical images, enabled Jacopo to prototype his models quickly. Finally, Jacopo extended the InnerEye code to support <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/wandb.ai\/site\" target=\"_blank\" rel=\"noopener noreferrer\">Weights & Biases<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> logging to help track experiments, evaluate model performance, and finetune the hyperparameters of the training process.&nbsp;<\/p>\n\n\n\n<p>The results in Figure 1 show that there is virtually no difference between strong (<em>SL<\/em>) and weak learners (<em>WL<\/em>) on the RSNA and CQ500 datasets. The weak learner, however, exhibits significantly higher generalization power on the CT-ICH dataset.&nbsp;&nbsp;&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"602\" height=\"274\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2022\/11\/JHUFig1.png\" alt=\"graphical user interface, application, Word\" class=\"wp-image-899091\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2022\/11\/JHUFig1.png 602w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2022\/11\/JHUFig1-300x137.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2022\/11\/JHUFig1-240x109.png 240w\" sizes=\"auto, (max-width: 602px) 100vw, 602px\" \/><\/figure>\n\n\n\n<p>Figure 2 shows results for image-level haemorrhage detection. There appears to be no significant qualitative (saliency maps) or quantitative (<em>f<sub>1<\/sub><\/em> scores) difference between the strong and weak learners.&nbsp;&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"610\" height=\"443\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2022\/11\/JHUFig2.png\" alt=\"graphical user interface, application\" class=\"wp-image-899094\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2022\/11\/JHUFig2.png 610w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2022\/11\/JHUFig2-300x218.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2022\/11\/JHUFig2-240x174.png 240w\" sizes=\"auto, (max-width: 610px) 100vw, 610px\" \/><\/figure>\n\n\n\n<p>Figure 3 shows the mean examination-level haemorrhage detection performance on the RSNA dataset as a function of the number of labels available to each learner during training, denoted by <em>m<\/em>. The fully supervised strong learner performs better with fewer than around 10,000 image labels, however the weak learners quickly outperform the strong learners with more than 10,000 labels. Importantly, the performance of weak and strong learners trained on the entire RSNA dataset is comparable, with weak learners using \u2248 35 times fewer labels.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"502\" height=\"228\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2022\/11\/JHUFig3.png\" alt=\"chart, line chart\" class=\"wp-image-899097\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2022\/11\/JHUFig3.png 502w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2022\/11\/JHUFig3-300x136.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2022\/11\/JHUFig3-240x109.png 240w\" sizes=\"auto, (max-width: 502px) 100vw, 502px\" \/><\/figure>\n\n\n\n<p><strong>Outcome&nbsp;<\/strong><\/p>\n\n\n\n<p>The results suggest that with MIL, radiologists may not need to provide labor-intensive, image-level annotations for 3D imaging volumes (e.g., CT\/MRI) to train high performing ML models. This approach could dramatically reduce the time-consuming data annotation process, overcoming a major hurdle in machine learning for medical imaging.&nbsp;<\/p>\n\n\n\n<p><em>Read the full pre-print on Arxiv  <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2211.15924\">[2211.15924] Weakly Supervised Learning Significantly Reduces the Number of Labels Required for Intracranial Hemorrhage Detection on Head CT (arxiv.org)<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Winner of an RSNA 2022 Trainee Research Prize (opens in new tab): Read the full pre-print on Arxiv [2211.15924] Weakly Supervised Learning Significantly Reduces the Number of Labels Required for Intracranial Hemorrhage Detection on Head CT (arxiv.org) (opens in new tab) Jacopo Teneggi (opens in new tab), a PhD student at Johns Hopkins University (JHU), [&hellip;]<\/p>\n","protected":false},"author":32522,"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-content-parent":740356,"msr_hide_image_in_river":0,"footnotes":""},"research-area":[],"msr-locale":[268875],"msr-post-option":[],"class_list":["post-898830","msr-blog-post","type-msr-blog-post","status-publish","hentry","msr-locale-en_us"],"msr_assoc_parent":{"id":740356,"type":"project"},"_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/898830","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post"}],"about":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-blog-post"}],"author":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/users\/32522"}],"version-history":[{"count":20,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/898830\/revisions"}],"predecessor-version":[{"id":903288,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/898830\/revisions\/903288"}],"wp:attachment":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media?parent=898830"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=898830"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=898830"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=898830"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}