Pictor: an Interactive System for Importing Data from a Website

  • Shuyi Zheng ,
  • Matthew R. Scott ,
  • Ruihua Song ,
  • Ji-Rong Wen

KDD '08 Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining |

Published by ACM Press

Publication

We present a demonstration of an interactive wrapper induction system, called Pictor, which is able to minimize labeling cost, yet extract data with high accuracy from a website. Our demonstration will introduce two proposed technologies: record-level wrappers and a wrapper-assisted labeling strategy. These approaches allow Pictor to exploit previously generated wrappers, in order to predict similar labels in a partially labeled webpage or a completely new webpage. Our experiment results show the effectiveness of the Pictor system.