{"id":184068,"date":"2005-07-13T00:00:00","date_gmt":"2009-10-31T13:20:46","guid":{"rendered":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/msr-research-item\/xml-full-text-search-and-scoring\/"},"modified":"2018-07-19T11:01:10","modified_gmt":"2018-07-19T18:01:10","slug":"xml-full-text-search-and-scoring","status":"publish","type":"msr-video","link":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/video\/xml-full-text-search-and-scoring\/","title":{"rendered":"XML Full-Text Search and Scoring"},"content":{"rendered":"<div class=\"asset-content\">\n<p>One of the key benefits of XML is its ability to represent a mix of structured and text data. Querying XML is a well-explored topic with powerful database-style query languages such as XPath\/XQuery set to become W3C standards. However, these languages are not powerful enough to express full-text search queries. For this reason, we developed TeXQuery, a full-text extension to XPath\/XQuery which provides a rich set of fully composable full-text search primitives, such as keyword and Boolean search, proximity distance, stemming and regular expressions and gracefully combines them with structured search with XPath\/XQuery. TeXQuery is the precursor of XQuery Full-Text, the current full-text extension to XPath 2.0 and XQuery 1.0 that is being developed by the W3C. TeXQuery also supports a flexible scoring construct that allows users to express queries such as &#8220;return the top 20 elements ranked by their relevance to some structural conditions and contain 3 occurrences of some keywords within some distance of each other&#8221;. I will present a family of scoring methods for XML that are inspired from tf*idf and that allow to take both content and structure into account for scoring answers to XML queries.<\/p>\n<\/div>\n<p><!-- .asset-content --><\/p>\n","protected":false},"excerpt":{"rendered":"<p>One of the key benefits of XML is its ability to represent a mix of structured and text data. Querying XML is a well-explored topic with powerful database-style query languages such as XPath\/XQuery set to become W3C standards. However, these languages are not powerful enough to express full-text search queries. For this reason, we developed [&hellip;]<\/p>\n","protected":false},"featured_media":195306,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr_hide_image_in_river":0,"footnotes":""},"research-area":[13563],"msr-video-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-session-type":[],"msr-impact-theme":[],"msr-pillar":[],"msr-episode":[],"msr-research-theme":[],"class_list":["post-184068","msr-video","type-msr-video","status-publish","has-post-thumbnail","hentry","msr-research-area-data-platform-analytics","msr-locale-en_us"],"msr_download_urls":"","msr_external_url":"https:\/\/youtu.be\/aSTcLW4_vlA","msr_secondary_video_url":"","msr_video_file":"","_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/184068","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-video"}],"about":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-video"}],"version-history":[{"count":1,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/184068\/revisions"}],"predecessor-version":[{"id":496229,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/184068\/revisions\/496229"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media\/195306"}],"wp:attachment":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media?parent=184068"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=184068"},{"taxonomy":"msr-video-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-video-type?post=184068"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=184068"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=184068"},{"taxonomy":"msr-session-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-session-type?post=184068"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=184068"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=184068"},{"taxonomy":"msr-episode","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-episode?post=184068"},{"taxonomy":"msr-research-theme","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-theme?post=184068"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}