Towards Building a Scholarly Big Data Platform: Challenges, Lessons and Opportunities
- Zhaohui Wu ,
- Jian Wu ,
- Madian Khabsa ,
- Kyle Williams ,
- Hung-Hsuan Chen ,
- Wenyi Huang ,
- Suppawong Tuarob ,
- Sagnik Ray Choudhury ,
- Alexander G. Ororbia ,
- Prasenjit Mitra ,
- C. Lee Giles
Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries |
Published by ACM
We introduce a big data platform that provides various services for harvesting scholarly information and enabling efficient scholarly applications. The core architecture of the platform is built on a secured private cloud, crawls data using a scholarly focused crawler that leverages a dynamic scheduler, processes by utilizing a map reduce based crawl-extraction-ingestion (CEI) workflow, and is stored in distributed repositories and databases. Services such as scholarly data harvesting, information extraction, and user information and log data analytics are integrated into the platform and provided by an OAI and RESTful API. We also introduce a set of scholarly applications built on top of this platform including citation recommendation and collaborator discovery.