{"id":589285,"date":"2019-05-31T09:02:06","date_gmt":"2019-05-31T16:02:06","guid":{"rendered":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/?p=589285"},"modified":"2019-06-26T14:14:38","modified_gmt":"2019-06-26T21:14:38","slug":"whats-in-a-name-using-bias-to-fight-bias-in-occupational-classification","status":"publish","type":"post","link":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/blog\/whats-in-a-name-using-bias-to-fight-bias-in-occupational-classification\/","title":{"rendered":"What\u2019s in a name?  Using Bias to Fight Bias in Occupational Classification"},"content":{"rendered":"<p><a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/05\/Whats-In-A-Name_Blog_Site_05_2019_1400x788.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-589309 size-large\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/05\/Whats-In-A-Name_Blog_Site_05_2019_1400x788-1024x576.png\" alt=\"What\u2019s in a name? Using Bias to Fight Bias in Occupational Classification\" width=\"1024\" height=\"576\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/05\/Whats-In-A-Name_Blog_Site_05_2019_1400x788-1024x576.png 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/05\/Whats-In-A-Name_Blog_Site_05_2019_1400x788-300x169.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/05\/Whats-In-A-Name_Blog_Site_05_2019_1400x788-768x432.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/05\/Whats-In-A-Name_Blog_Site_05_2019_1400x788-1066x600.png 1066w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/05\/Whats-In-A-Name_Blog_Site_05_2019_1400x788-655x368.png 655w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/05\/Whats-In-A-Name_Blog_Site_05_2019_1400x788-343x193.png 343w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/05\/Whats-In-A-Name_Blog_Site_05_2019_1400x788.png 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/p>\n<p>Bias in AI is a big problem. In particular, AI can compound the effects of existing societal biases: in a recruiting tool, if more men than women are software engineers, AI is likely to use that data to identify job applicants and overscreen for men, creating a vicious circle of bias. Indeed, Amazon recently scrapped its <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/www.reuters.com\/article\/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G\">AI recruiting engine project<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> for that reason. Now that AI is increasingly used in high-impact applications, such as criminal justice, hiring, and healthcare, characterizing and mitigating biases is more urgent than ever\u2014we must prevent AI from further disadvantaging already-disadvantaged people.<\/p>\n<p>To what extent can AI be used to address its own problems? Most methodologies that have been proposed to mitigate biases in AI and machine learning systems assume access to sensitive demographic attributes. However, in practice, this information is often unavailable and, in some contexts, may even be illegal to use. How can we mitigate biases if we do not have access to sensitive attributes?<\/p>\n<p>What if we could use fire to fight fire, or in this case, use bias to fight bias?<\/p>\n<p>That\u2019s the approach we came up with for our paper, &#8220;<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/whats-in-a-name-reducing-bias-in-bios-without-access-to-protected-attributes\/\">What\u2019s in a Name? Reducing Bias in Bios Without Access to Protected Attributes<\/a>&#8220;, to be presented at the 2019 <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/naacl2019.org\/\">North American Chapter of the Association for Computational Linguistics<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (NAACL) conference running June 2-7, 2019, in Minneapolis. We\u2019re also pleased to announce that our paper won the Best Thematic Paper award.<\/p>\n<p>In our paper, we propose a method that relies on word embeddings of names to reduce biases without requiring access to sensitive attributes. Our method even tackles intersectional biases, such as biases involving combinations of race and gender. As we showed in <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/blog\/what-are-the-biases-in-my-data\/\">previous work<\/a>, inherent societal biases involving sensitive attributes are encoded in word embeddings. Essentially, word embeddings are mappings of words to vectors, learned from large collections of documents, so they capture any biases represented in those documents. For example, in commonly used word embeddings, Hispanic names are \u201cembedded\u201d closer to \u2018taco\u2019 than to \u2018hummus,\u2019 as well as reflecting harmful cultural stereotypes.<\/p>\n<p>Specifically, we look at mitigating biases in occupation classification, using names from a large-scale dataset of online biographies: the predicted probability of an individual\u2019s occupation should not depend on their name\u2014nor on any sensitive attributes that may be inferred from it. Crucially, and in contrast to previous work, our method requires access to names only at training time and not at deployment time.<\/p>\n<h3>Using societal biases in word embeddings<\/h3>\n<p>Here\u2019s how our method works: we penalize the classifier if there is a correlation between the embedding of an individual\u2019s name and the probability of correctly predicting that individual\u2019s occupation. This encourages the classifier to use signals that are useful for occupation classification but not useful for predicting names or any sensitive attributes correlated with them. We find that our method reduces differences in classification accuracy across race and gender, while having very little effect on the classifier\u2019s overall performance, quantified in terms of its true positive rate.<\/p>\n<p>We propose two variations of our method. The first variation uses k-means to cluster word embeddings of the names of the individuals in the training set and then, for each pair of clusters, minimizes between-cluster differences in the predicted probabilities of the true occupations of the individuals in the training set. The second variation directly minimizes the covariance between the predicted probability of each (training set) individual\u2019s true occupation and a word embedding of that individual\u2019s name.<\/p>\n<p>Both variations of our method therefore mitigate societal biases that are encoded in names, including biases involving age, religion, race, and gender. (Biases that are not encoded in names, such those involving disabilities, are not addressed by our method.) Crucially, our method mitigates intersectional biases involving specific combinations of these attributes, which may otherwise go undetected.<\/p>\n<p>Because our method requires access to names only at training time, it extends fairness benefits to individuals whose sensitive attributes are not reflected in their names, such as women named Alex.<\/p>\n<h3>Relationships started during internships at Microsoft<\/h3>\n<p>This work came about because we were interns together at Microsoft Research New England, and worked on projects in the space of fairness in AI. The experience was very positive, and we formed a great team, so we continued to work together on this project even after the internship was over.<\/p>\n<p>Both of us were motivated by our desire to build AI systems that work well for everyone. We\u2019re passionate about understanding the roadblocks that prevent the effective use of AI and then developing ways to address them. Each of our co-authors (<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/people\/wallach\/\">Hanna Wallach<\/a>, <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/people\/jchayes\/\">Jennifer Chayes<\/a>, <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/people\/borgs\/\">Christian Borgs<\/a>, Alexandra Chouldechova, Sahin Geyik, Krishnaram Kenthapadi, Anna Rumshisky, and <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/people\/adum\/\">Adam Kalai<\/a>) contributed a unique perspective to this project, and we\u2019d like to thank them for their contributions and for the research environment they have created. In Hanna\u2019s words, \u201cMicrosoft is particularly interested in work at the intersection of machine learning and the social sciences that can be used to mitigate biases in AI systems. This paper presents a one such method. Moving forward, we\u2019re excited to see how our method might be used in real-world settings.\u201d<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Bias in AI is a big problem. In particular, AI can compound the effects of existing societal biases: in a recruiting tool, if more men than women are software engineers, AI is likely to use that data to identify job applicants and overscreen for men, creating a vicious circle of bias. Indeed, Amazon recently scrapped [&hellip;]<\/p>\n","protected":false},"author":38022,"featured_media":589309,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_hide_image_in_river":0,"footnotes":""},"categories":[194467],"tags":[],"research-area":[13556],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-589285","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artifical-intelligence","msr-research-area-artificial-intelligence","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199571],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[330695],"related-projects":[],"related-events":[589690],"related-researchers":[{"type":"guest","value":"alexey-romanov","user_id":"489746","display_name":"Alexey Romanov","author_link":"<a href=\"http:\/\/www.cs.uml.edu\/~aromanov\/\" aria-label=\"Visit the profile page for Alexey Romanov\">Alexey Romanov<\/a>","is_active":true,"last_first":"Romanov, Alexey","people_section":0,"alias":"alexey-romanov"},{"type":"guest","value":"maria-de-arteaga","user_id":"488954","display_name":"Maria De-Arteaga","author_link":"<a href=\"https:\/\/mariadearteaga.com\/\" aria-label=\"Visit the profile page for Maria De-Arteaga\">Maria De-Arteaga<\/a>","is_active":true,"last_first":"De-Arteaga, Maria","people_section":0,"alias":"maria-de-arteaga"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/05\/Whats-In-A-Name_Blog_Site_05_2019_1400x788.png\" class=\"img-object-cover\" alt=\"\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/05\/Whats-In-A-Name_Blog_Site_05_2019_1400x788.png 1400w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/05\/Whats-In-A-Name_Blog_Site_05_2019_1400x788-300x169.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/05\/Whats-In-A-Name_Blog_Site_05_2019_1400x788-768x432.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/05\/Whats-In-A-Name_Blog_Site_05_2019_1400x788-1024x576.png 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/05\/Whats-In-A-Name_Blog_Site_05_2019_1400x788-1066x600.png 1066w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/05\/Whats-In-A-Name_Blog_Site_05_2019_1400x788-655x368.png 655w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/05\/Whats-In-A-Name_Blog_Site_05_2019_1400x788-343x193.png 343w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"<a href=\"http:\/\/www.cs.uml.edu\/~aromanov\/\" title=\"Go to researcher profile for Alexey Romanov\" aria-label=\"Go to researcher profile for Alexey Romanov\" data-bi-type=\"byline author\" data-bi-cN=\"Alexey Romanov\">Alexey Romanov<\/a> and <a href=\"https:\/\/mariadearteaga.com\/\" title=\"Go to researcher profile for Maria De-Arteaga\" aria-label=\"Go to researcher profile for Maria De-Arteaga\" data-bi-type=\"byline author\" data-bi-cN=\"Maria De-Arteaga\">Maria De-Arteaga<\/a>","formattedDate":"May 31, 2019","formattedExcerpt":"Bias in AI is a big problem. In particular, AI can compound the effects of existing societal biases: in a recruiting tool, if more men than women are software engineers, AI is likely to use that data to identify job applicants and overscreen for men,&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts\/589285","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/users\/38022"}],"replies":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/comments?post=589285"}],"version-history":[{"count":5,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts\/589285\/revisions"}],"predecessor-version":[{"id":589324,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts\/589285\/revisions\/589324"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media\/589309"}],"wp:attachment":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media?parent=589285"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/categories?post=589285"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/tags?post=589285"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=589285"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=589285"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=589285"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=589285"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=589285"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=589285"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=589285"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=589285"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}