{"id":840193,"date":"2022-04-28T07:01:12","date_gmt":"2022-04-28T14:01:12","guid":{"rendered":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/?post_type=msr-research-item&#038;p=840193"},"modified":"2025-06-29T05:48:06","modified_gmt":"2025-06-29T12:48:06","slug":"fostering-coopetition-while-plugging-leaks-the-design-and-implementation-of-the-ms-marco-leaderboards","status":"publish","type":"msr-research-item","link":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/fostering-coopetition-while-plugging-leaks-the-design-and-implementation-of-the-ms-marco-leaderboards\/","title":{"rendered":"Fostering Coopetition While Plugging Leaks: The Design and Implementation of the MS MARCO Leaderboards"},"content":{"rendered":"<p>We articulate the design and implementation of the MS MARCO document ranking and passage ranking leaderboards. In contrast to &#8220;standard&#8221; community-wide evaluations such as those at TREC, which can be characterized as simultaneous games, leaderboards represent sequential games, where <em>every<\/em> player move is immediately visible to the entire community. The fundamental challenge with this setup is that <em>every<\/em> leaderboard submission leaks information about the held-out evaluation set, which conflicts with the fundamental tenant in machine learning about separation of training and test data. These &#8220;leaks&#8221;, accumulated over long periods of time, threaten the validity of the insights that can be derived from the leaderboards. In this paper, we share our experiences grappling with this issue over the past few years and how our considerations are operationalized into a coherent submission policy. Our work provides a useful guide to help the community understand the design choices made in the popular MS MARCO leaderboards and offers lessons for designers of future leaderboards.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We articulate the design and implementation of the MS MARCO document ranking and passage ranking leaderboards. In contrast to &#8220;standard&#8221; community-wide evaluations such as those at TREC, which can be characterized as simultaneous games, leaderboards represent sequential games, where every player move is immediately visible to the entire community. The fundamental challenge with this setup [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"ACM","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2022-4-1","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":null,"footnotes":""},"msr-research-highlight":[],"research-area":[13556,13555],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[248503,246685],"msr-conference":[260209],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-840193","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-search-information-retrieval","msr-locale-en_us","msr-field-of-study-information-retrieval","msr-field-of-study-machine-learning"],"msr_publishername":"ACM","msr_edition":"","msr_affiliation":"","msr_published_date":"2022-4-1","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/dl.acm.org\/doi\/10.1145\/3477495.3531725","label_id":"243109","label":0},{"type":"file","viewUrl":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2022\/04\/sigir2022-msmarco-leaderboard.pdf","id":"840196","title":"sigir2022-msmarco-leaderboard","label_id":"243132","label":0},{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/bhaskar-mitra.github.io\/showpdf\/?file=3477495.3531725","label_id":"243132","label":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[{"id":840196,"url":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2022\/04\/sigir2022-msmarco-leaderboard.pdf"}],"msr-author-ordering":[{"type":"text","value":"Jimmy Lin","user_id":0,"rest_url":false},{"type":"text","value":"Daniel Campos","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Nick Craswell","user_id":33088,"rest_url":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Nick Craswell"},{"type":"user_nicename","value":"Bhaskar Mitra","user_id":31257,"rest_url":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Bhaskar Mitra"},{"type":"text","value":"Emine Yilmaz","user_id":0,"rest_url":false}],"msr_impact_theme":[],"msr_research_lab":[437514],"msr_event":[852294],"msr_group":[267093],"msr_project":[],"publication":[],"video":[],"msr-tool":[571575],"msr_publication_type":"inproceedings","related_content":[],"_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/840193","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":2,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/840193\/revisions"}],"predecessor-version":[{"id":1136358,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/840193\/revisions\/1136358"}],"wp:attachment":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media?parent=840193"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=840193"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=840193"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=840193"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=840193"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=840193"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=840193"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=840193"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=840193"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=840193"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=840193"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=840193"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=840193"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}