Essential Pages

  • Darko Kirovski

MSR-TR-2008-15 |

Results to Web search queries are ranked using heuristics that typically analyze the global link topology, user behavior, and content relevance. We point to a particular inefficiency of such methods: information redundancy. In queries where learning about a subject is an objective, modern search engines return relatively unsatisfactory results as they consider the query coverage by each page individually, not a set of pages as a whole. We address this problem using essential pages. If we denote as mathbbSQ the total knowledge that exists on the Web about a given query Q, we want to build a search engine that returns a set of essential pages EQ that maximizes the information covered over mathbbSQ. In this paper, we present a preliminary prototype that optimizes the selection of essential pages; we draw some informal comparisons with respect to existing search engines; and finally, we demonstrate our prototype in action using a blind-test user study.