On the Practicality of Integrity Attacks on Document-Level Sentiment Analysis

  • Andrew Newell ,
  • Rahul Potharaju ,
  • Luojie Xiang ,
  • Cristina Nita-Rotuaru

AISec '14 Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop |

DOI

Sentiment analysis plays an important role in the way companies, organizations, or political campaigns are run, making it an attractive target for attacks. In integrity attacks an attacker influences the data used to train the sentiment analysis classification model in order to decrease its accuracy. Previous work did not consider practical constraints dictated by the characteristics of data generated by a sentiment analysis application and relied on synthetic or pre-processed datasets inspired by spam, intrusion detection, or handwritten digit recognition. We identify and demonstrate integrity attacks against document-level sentiment analysis that take into account such practical constraints. Our attacks, while inspired by existing work, require novel improvements to function in a realistic environment where a victim performs typical steps such as data cleaning, labeling, and feature extraction prior to training the classification model. We demonstrate the effectiveness of the attacks on three datasets — two Twitter datasets and an Android dataset.