Understanding and Defending Against Malicious Crowdsourcing
Transcription
Understanding and Defending Against Malicious Crowdsourcing
Understanding and Defending Against Malicious Crowdsourcing SANDLab http://sandlab.cs.ucsb.edu [email protected] Ben Y. Zhao (PI), Haitao Zheng (CoPI), Gang Wang (PhD student), University of California at Santa Barbara 1. Malicious Crowdsourcing New Threat: Malicious Crowdsourcing = Crowdturfing + Hire a large group of real Internet users for malicious attacks + Fake reviews, rumors, targerted spam + Most existing defenses failed against real users (e.g., CAPTCHA) 2. Understanding Crowdturfing Key Players Target Networks Crowd-workers + Customers: pay to run a campaign + Workers: real users, spam for $ + Target Networks: social networks, revew sites Scale and Revenue + How does crowdturfing work? [1] + What’s the scale, economics and impact of crowturfing campaigns? [2] + How to defend against crowdturfing? [1] Crowdturfing around the World 3. Defense: Machine Learning Classifiers Machine Learning (ML) vs. Crowdturfing + Simple method does not work on real users (e.g., CAPTCHA, rate limit) + Machine learning: more sophiscaed modeling on user behaviors + Perfect context to study adversarial machine learning - Human workers are adaptive to evade classifiers - Crowdturf admins can temper with training data by chaning worker behaviors How Effective is ML-based Detecor? + Groundtruth: 28K workers in crowdturfing campaigns on Weibo (Chinenes Twitter) + Baseline users: 371K Weibo user accounts + 30 behavioral features + Classiiers: Random Forest, Decision Tree, SVM, Naive Bayes, Bayesian Network False Positive Rate False Negative Rate + Random Forest is the most accurate (95% accuracy) + 99% accuracy on professional workers (>100 tasks) + How robust are those classifiers? ZBJ, SDH 1000 100 10000 $ ZBJ 1000 SDH 10 Campaigns 1 Jan. 2008 $ 100 10 Campaigns Jan. 2009 Jan. 2010 1 Jan. 2011 Fiverr, Freelancer, MinuteWorkers, Myeasytasks, Microworkers, Shorttasks Adversarial Machine Learning + Evasion attack: individual workers change behaviors to evade the detection - Impact: single feature-change saves 95% of workers + Poisoning attack: site admins tamper with training data to mislead classifier training Dollars per Month Research Questions 1000000 100000 Paisalive Detection Model Training Training e.g. SVM Classifier Training Data Poisning Attack Evasion Attack Example: Poisoning Attack + Inject mislabeled samples to training data wrong classifier e.g., inject benign accounts as “workers” in training data + Uniformly change workers behavior by enforcing task policies hard to train an accurate classifier Summary More accurate classifiers can be more vulnerable 20 False Positive Rate + Web services that recruit Internet users as workers (spam for $) + Connect workers to customers who want to run malicious campaigns + Measurements of two largest crowdturfing sites (in China) - ZBJ (zhubajie.com), five years - SDH (sandaha.com), two yeras + 18.5M tasks, 79K campaigns, 180K workers + Millions dollars of revenue per month Site Growth Over Time 10000 Campaigns per Month Crowdturfing Sites 60% 50% 40% 30% 20% 10% 0% Customer Crowdturfing Site 15 Tree 10 RF SVMp 5 0 SVMr 0 0.2 0.4 0.6 0.8 1 Ratio of Injected Sample to Turfing + Machine learning classifiers are effective against current crowd-workers + Classifiers are highly vulnerable to adversarial attacks. Future works will focus on improving the robustiness of ML-classifiers [1] G. WANG, T. WANG, H. ZHENG, B. ZHAO. Man vs. machine: practical adversarial detection of malicious crowdsourcing workers. In Proc. of Usenix Security (2014) RF Tree SVMr SVMp NB BN [2] G. WANG, C. WILSON, X. ZHAO, Y. ZHU, M. MOHANLAL, H. ZHENG, B. ZHAO. Serf and turf: crowdturfing for fun and profit. In Proc. of WWW (2012)