{"id":1103,"date":"2013-09-24T09:23:00","date_gmt":"2013-09-24T09:23:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/msr_er\/2013\/09\/24\/data-mining-competition-takes-center-stage-in-chicago\/"},"modified":"2016-07-20T07:31:16","modified_gmt":"2016-07-20T14:31:16","slug":"data-mining-competition-takes-center-stage-in-chicago","status":"publish","type":"post","link":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/blog\/data-mining-competition-takes-center-stage-in-chicago\/","title":{"rendered":"Data mining competition takes center stage in Chicago"},"content":{"rendered":"<p><span style=\"font-family: verdana,geneva; font-size: medium;\"><img decoding=\"async\" style=\"margin: 5px; border: 0px currentColor; float: left;\" title=\"Microsoft Research Connections was proud to sponsor the 2013 KDD Cup\" src=\"https:\/\/msdnshared.blob.core.windows.net\/media\/MSDNBlogsFS\/prod.evol.blogs.msdn.com\/CommunityServer.Blogs.Components.WeblogFiles\/00\/00\/01\/32\/81\/2318.KDD-logo_157px.png\" original-url=\"http:\/\/blogs.msdn.com\/resized-image.ashx\/__size\/157x0\/__key\/communityserver-blogs-components-weblogfiles\/00-00-01-32-81\/2318.KDD_2D00_logo_5F00_157px.png\" alt=\"Microsoft Research Connections was proud to sponsor the 2013 KDD Cup\" \/>In keeping with our mission to collaborate with top academic and scientific researchers to foster innovations in scientific inquiry, Microsoft Research Connections was proud to sponsor the 2013 KDD Cup, arguably the world&rsquo;s best-known competition in data mining. The winning teams were announced at <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/www.kdd.org\/kdd2013\/\" target=\"_blank\">KDD 2013<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, the 19th annual conference of ACM SIGKDD (the Association for Computing Machinery&rsquo;s Special Interest Group on Knowledge Discovery and Data Mining) which took place in Chicago in August. KDD is the premier event for researchers grappling with today&rsquo;s data deluge, as it&rsquo;s the only conference spanning big data, data mining, data science, and analytics and all the related algorithms, foundations, applications, and practices.<\/span><\/p>\n<p style=\"text-align: center;\"><span style=\"font-family: verdana,geneva; font-size: medium;\"><img decoding=\"async\" style=\"border: 0px currentColor;\" title=\"2013 KDD Cup challenge winners, Team Algorithm, from National Taiwan University\" src=\"https:\/\/msdnshared.blob.core.windows.net\/media\/MSDNBlogsFS\/prod.evol.blogs.msdn.com\/CommunityServer.Blogs.Components.WeblogFiles\/00\/00\/01\/32\/81\/7217.KDD-Cup-Award_Team-Algorithm-496.jpg\" original-url=\"http:\/\/blogs.msdn.com\/resized-image.ashx\/__size\/496x0\/__key\/communityserver-blogs-components-weblogfiles\/00-00-01-32-81\/7217.KDD_2D00_Cup_2D00_Award_5F00_Team_2D00_Algorithm_2D00_496.jpg\" alt=\"2013 KDD Cup challenge winners, Team Algorithm, from National Taiwan University\" \/><br \/><span style=\"color: #808080; font-size: small;\">2013 KDD Cup challenge winners, Team Algorithm, from National Taiwan University<\/span><\/span><\/p>\n<p><span style=\"font-family: verdana,geneva; font-size: medium;\">The 2013 KDD Cup challenge focused on the ability to search literature and to collect metrics around publications&mdash;a capability that is essential to modern research, as academic and industry researchers increasingly rely on search to discover what has been published and by whom. The competition made use of a data set of 250,000 authors and 2.5 million published papers. The dataset was broken up into a distinct labeled training set, a validation set for the leaderboard, and a test set. The competitors faced two tasks: first, a prediction task to determine whether an author had written a paper, and second, a name disambiguation task to identify duplicate author names in a dataset with name variants.<\/span><\/p>\n<p><span style=\"font-family: verdana,geneva; font-size: medium;\">These tasks go to the heart of one of the main challenges of information extraction and curation in any people-centric dataset: resolving people-name ambiguity. In the scholarly publishing world, many authors publish under several variations of their own name, and to add to the complexity of discovery, different authors might share a similar or even the same name. As a result, the profile of an author with an ambiguous name tends to contain noise, resulting in papers that are incorrectly assigned to him or her. The KDD Cup task challenged participants to determine which papers in an author profile were truly written by a given author. Read the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/www.kdd.org\/kddcup2013\/sites\/default\/files\/papers\/papers.pdf\" target=\"_blank\">full parameters of the challenge<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/span><\/p>\n<p><span style=\"font-family: verdana,geneva; font-size: medium;\">The competition was fierce, with more than 800 teams from more than 40 different countries developing approximately 12,000 data-mining models over the course of a few months. The winning solution, created by Professor Chih-Jen Lin and Team Algorithm from National Taiwan University, was the product of outstanding teamwork: eighteen students and three teaching assistants actually designed a graduate course around the competition. Other winners included teams from University of Illinois at Urbana-Champaign, Moscow State University, and FICO. Winners presented their solutions at a KDD Cup workshop and poster session at the conference. Moreover, solutions created for the competition resulted in 10 research papers that are available through the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/dl.acm.org\/citation.cfm?id=2517288\" target=\"_blank\">KDD Cup 2013 Workshop proceedings<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/span><\/p>\n<p style=\"text-align: center;\"><span style=\"font-family: verdana,geneva; font-size: medium;\"><img decoding=\"async\" style=\"border: 0px currentColor;\" title=\"KDD Cup poster session participants at KDD 2013\" src=\"https:\/\/msdnshared.blob.core.windows.net\/media\/MSDNBlogsFS\/prod.evol.blogs.msdn.com\/CommunityServer.Blogs.Components.WeblogFiles\/00\/00\/01\/32\/81\/1185.KDD-poster-session-496.jpg\" original-url=\"http:\/\/blogs.msdn.com\/resized-image.ashx\/__size\/496x0\/__key\/communityserver-blogs-components-weblogfiles\/00-00-01-32-81\/1185.KDD_2D00_poster_2D00_session_2D00_496.jpg\" alt=\"KDD Cup poster session participants at KDD 2013\" \/><br \/><span style=\"color: #808080; font-size: small;\">KDD Cup poster session participants at KDD 2013<\/span><\/span><\/p>\n<p><span style=\"font-family: verdana,geneva; font-size: medium;\">On behalf of Microsoft Research Connections, I would like to thank the key collaborators who helped make this competition a success. The Microsoft Research Connections proposal for the KDD Cup challenge was selected after careful deliberation by 2013 KDD Cup chairpersons Claudia Perlich and Brian Dalessandro of Media6&deg;. Partnering with me in designing the contest rules and evaluation criteria were Professors Martine DeCock of Ghent University and Senjuti Basu Roy of the University of Washington Tacoma, along with Ben Hamner and Will Cukierski of Kaggle. Swapna Savvana and Yitao Li from the University of Washington Tacoma helped with the logistics of the contest execution.<\/span><\/p>\n<p><span style=\"font-family: verdana,geneva; font-size: medium;\">So congrats to the KDD Cup winners, and kudos to everyone who accepted the challenge. The many outstanding solutions showed great creativity, which is exactly what we&rsquo;ll need as we move forward in this new world of data-intensive scientific inquiry.<\/span><\/p>\n<p><em><span style=\"font-family: verdana,geneva; font-size: medium;\">&mdash;Vani Mandava, Senior Program Manager, Microsoft Research Connections<\/span><\/em><\/p>\n<p><strong><span style=\"font-family: verdana,geneva; font-size: medium;\">Learn more<\/span><\/strong><\/p>\n<ul>\n<li><span style=\"font-family: verdana,geneva; font-size: small;\"><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/www.kdd.org\/\" target=\"_blank\">KDD<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (Special Interest Group on Knowledge Discovery and Data Mining)<\/span><\/li>\n<li><span style=\"font-family: verdana,geneva; font-size: small;\"><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/dl.acm.org\/citation.cfm?id=2487575&CFID=244286608&CFTOKEN=65686451\" target=\"_blank\">KDD 2013 proceedings<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/span><\/li>\n<li><span style=\"font-family: verdana,geneva; font-size: small;\"><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/www.acm.org\/\" target=\"_blank\">ACM<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (Association for Computing Machinery)<\/span><\/li>\n<li><span style=\"font-family: verdana,geneva; font-size: small;\"><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/research.microsoft.com\/en-us\/collaboration\/focus\/education\/default.aspx\" target=\"_blank\">Education and Scholarly Communication at Microsoft Research Connections<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/span><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>In keeping with our mission to collaborate with top academic and scientific researchers to foster innovations in scientific inquiry, Microsoft Research Connections was proud to sponsor the 2013 KDD Cup, arguably the world&rsquo;s best-known competition in data mining. The winning teams were announced at KDD 2013, the 19th annual conference of ACM SIGKDD (the Association [&hellip;]<\/p>\n","protected":false},"author":32627,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[],"msr_hide_image_in_river":0,"footnotes":""},"categories":[1],"tags":[194510,186834,186833,194721,194793,186831,194975,195220,193598,186854,187057,195580,195907,196305,196442,196551,196617,196878,196919,187353,186485,197151,197279,197352,197384,197597,197678,197761,197866],"research-area":[],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-1103","post","type-post","status-publish","format-standard","hentry","category-research-blog","tag-2013-kdd-cup-challenge","tag-algorithms","tag-analytics","tag-association-of-computing-machinery","tag-ben-hamner","tag-big-data","tag-chicago","tag-curation","tag-data","tag-data-mining","tag-data-science","tag-fico","tag-information-extraction","tag-martine-decock","tag-microsoft-research-connections-kdd-2013-acm-sigkdd","tag-moscow-state-university","tag-national-taiwan-university","tag-prof-chih-jen-lin","tag-publication-metrics","tag-scholarly-communication","tag-scholarly-publishing","tag-senjuti-basu-roy","tag-special-interest-group-on-knowledge-discovery-and-data-mining","tag-swapna-savvana","tag-team-algorithm","tag-university-of-illinois-at-urbana-champaign","tag-vani-mandava","tag-will-cukierski","tag-yitao-li","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[],"related-events":[],"related-researchers":[],"msr_type":"Post","byline":"","formattedDate":"September 24, 2013","formattedExcerpt":"In keeping with our mission to collaborate with top academic and scientific researchers to foster innovations in scientific inquiry, Microsoft Research Connections was proud to sponsor the 2013 KDD Cup, arguably the world&rsquo;s best-known competition in data mining. The winning teams were announced at KDD&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts\/1103","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/users\/32627"}],"replies":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/comments?post=1103"}],"version-history":[{"count":1,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts\/1103\/revisions"}],"predecessor-version":[{"id":261423,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts\/1103\/revisions\/261423"}],"wp:attachment":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1103"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/categories?post=1103"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/tags?post=1103"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1103"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=1103"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=1103"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1103"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1103"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1103"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=1103"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=1103"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}