{"id":358571,"date":"2017-01-26T16:05:16","date_gmt":"2017-01-27T00:05:16","guid":{"rendered":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/?post_type=msr-research-item&#038;p=358571"},"modified":"2018-10-16T20:02:45","modified_gmt":"2018-10-17T03:02:45","slug":"diversified-search-evaluation-lessons-ntcir-9-intent-task","status":"publish","type":"msr-research-item","link":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/diversified-search-evaluation-lessons-ntcir-9-intent-task\/","title":{"rendered":"Diversified Search Evaluation: Lessons from the NTCIR-9 INTENT Task"},"content":{"rendered":"<p>The evaluation of diversified web search results is a relatively new research topic and is not as well-understood as the time-honoured evaluation methodology of traditional IR based on precision and recall. In diversity evaluation, one topic may have more than one <em class=\"EmphasisTypeItalic \">intent<\/em>, and systems are expected to balance relevance and diversity. The recent NTCIR-9 evaluation workshop launched a new task called INTENT which included a diversified web search subtask that differs from the TREC web diversity task in several aspects: the choice of evaluation metrics, the use of intent popularity and per-intent graded relevance, and the use of topic sets that are twice as large as those of TREC. The objective of this study is to examine whether these differences are useful, using the actual data recently obtained from the NTCIR-9 INTENT task. Our main experimental findings are: (1) The <span id=\"IEq42\" class=\"InlineEquation\"><span id=\"MathJax-Element-1-Frame\" class=\"MathJax\" tabindex=\"0\" data-mathml=\"<math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mstyle displaystyle=\"false\" scriptlevel=\"0\"><mtext>D<\/mtext><\/mstyle><mspace width=\"thinmathspace\" \/><mi mathvariant=\"normal\">&#x266F;<\/mi><\/math>\"><span id=\"MathJax-Span-1\" class=\"math\"><span id=\"MathJax-Span-2\" class=\"mrow\"><span id=\"MathJax-Span-3\" class=\"mstyle\"><span id=\"MathJax-Span-4\" class=\"mrow\"><span id=\"MathJax-Span-5\" class=\"mtext\">D<\/span><\/span><\/span><span id=\"MathJax-Span-6\" class=\"mspace\"><\/span><span id=\"MathJax-Span-7\" class=\"mi\">\u266f<\/span><\/span><\/span><\/span><\/span><span id=\"IEq42\" class=\"InlineEquation\"><\/span> evaluation framework used at NTCIR provides more \u201cintuitive\u201d and statistically reliable results than Intent-Aware Expected Reciprocal Rank; (2) Utilising both intent popularity and per-intent graded relevance as is done at NTCIR tends to improve discriminative power, particularly for <span id=\"IEq43\" class=\"InlineEquation\"><span id=\"MathJax-Element-2-Frame\" class=\"MathJax\" tabindex=\"0\" data-mathml=\"<math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mstyle displaystyle=\"false\" scriptlevel=\"0\"><mtext>D<\/mtext><\/mstyle><mspace width=\"thinmathspace\" \/><mi mathvariant=\"normal\">&#x266F;<\/mi><\/math>\"><span id=\"MathJax-Span-8\" class=\"math\"><span id=\"MathJax-Span-9\" class=\"mrow\"><span id=\"MathJax-Span-10\" class=\"mstyle\"><span id=\"MathJax-Span-11\" class=\"mrow\"><span id=\"MathJax-Span-12\" class=\"mtext\">D<\/span><\/span><\/span><span id=\"MathJax-Span-13\" class=\"mspace\"><\/span><span id=\"MathJax-Span-14\" class=\"mi\">\u266f<\/span><\/span><\/span><\/span><\/span><span id=\"IEq43\" class=\"InlineEquation\"><\/span>-nDCG; and (3) Reducing the topic set size, even by just 10 topics, can affect not only significance testing but also the entire system ranking; when 50 topics are used (as in TREC) instead of 100 (as in NTCIR), the system ranking can be substantially different from the original ranking and the discriminative power can be halved. These results suggest that the directions being explored at NTCIR are valuable.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The evaluation of diversified web search results is a relatively new research topic and is not as well-understood as the time-honoured evaluation methodology of traditional IR based on precision and recall. In diversity evaluation, one topic may have more than one intent, and systems are expected to balance relevance and diversity. The recent NTCIR-9 evaluation [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"Springer Netherlands","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"4","msr_journal":"Information Retrieval","msr_number":"","msr_organization":"","msr_pages_string":"504\u2013529","msr_page_range_start":"504","msr_page_range_end":"529","msr_series":"","msr_volume":"16","msr_copyright":"","msr_conference_name":"","msr_doi":"10.1007\/s10791-012-9208-x","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2013-08-01","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"http:\/\/link.springer.com\/article\/10.1007\/s10791-012-9208-x","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13555],"msr-publication-type":[193715],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-358571","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-search-information-retrieval","msr-locale-en_us"],"msr_publishername":"Springer Netherlands","msr_edition":"","msr_affiliation":"","msr_published_date":"2013-08-01","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"504\u2013529","msr_chapter":"","msr_isbn":"","msr_journal":"Information Retrieval","msr_volume":"16","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"4","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"http:\/\/link.springer.com\/article\/10.1007\/s10791-012-9208-x","msr_doi":"10.1007\/s10791-012-9208-x","msr_publication_uploader":[{"type":"url","title":"http:\/\/link.springer.com\/article\/10.1007\/s10791-012-9208-x","viewUrl":false,"id":false,"label_id":0},{"type":"doi","title":"10.1007\/s10791-012-9208-x","viewUrl":false,"id":false,"label_id":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[{"id":0,"url":"http:\/\/link.springer.com\/article\/10.1007\/s10791-012-9208-x"}],"msr-author-ordering":[{"type":"text","value":"Tetsuya Sakai","user_id":0,"rest_url":false},{"type":"user_nicename","value":"rsong","user_id":33460,"rest_url":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=rsong"}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[],"msr_project":[],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"article","related_content":[],"_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/358571","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/358571\/revisions"}],"predecessor-version":[{"id":519856,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/358571\/revisions\/519856"}],"wp:attachment":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media?parent=358571"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=358571"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=358571"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=358571"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=358571"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=358571"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=358571"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=358571"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=358571"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=358571"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=358571"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=358571"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=358571"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}