Spotlight: Malware Lead Generation At Scale

Fabian Kaczmarczyck; Bernhard Grill; Luca Invernizzi; Jennifer Pullman; Cecilia M. Procopiuc; David Tao; Borbala Benko; Elie Bursztein

Malware is one of the key threats to online security today, with applications ranging from phishing mailers to ransomware and trojans. Due to the sheer size and variety of the malware threat, it is impractical to combat it as a whole. Instead, governments and companies have instituted teams dedicated to identifying, prioritizing, and removing specific malware families that directly affect their population or business model.

The identification and prioritization of the most disconcerting malware families (known as malware hunting) is a time-consuming activity, accounting for more than 20% of the work hours of a typical threat intelligence researcher, according to our survey. To save this precious resource and amplify the team’s impact on users’ online safety we present Spotlight, a large-scale malware lead-generation framework.

Spotlight first sifts through a large malware data set to remove known malware families, based on first and third-party threat intelligence.It then clusters the remaining malware into potentially-undiscovered families, and prioritizes them for further investigation using a score based on their potential business impact. We evaluate Spotlight on 67M malware samples, to show that it can produce top-priority clusters with over 99% purity (i.e., homogeneity), which is higher than simpler approaches and prior work.

To showcase Spotlight’s effectiveness, we apply it to ad-fraud malware hunting on real-world data. Using Spotlight’s output, threat intelligence researchers were able to quickly identify three large botnets that perform ad fraud.

Available Media	Publication (Pdf)
Conference	Proceedings of Annual Computer Security Applications Conference (ACSAC) - 2020
Authors	Fabian Kaczmarczyck , Bernhard Grill , Luca Invernizzi , Jennifer Pullman , Cecilia M. Procopiuc , David Tao , Borbala Benko , Elie Bursztein
Citation	Bibtex Citation @inproceedings{NANSPOTLIGHT:,title = {Spotlight: Malware Lead Generation At Scale},author = {"Fabian Kaczmarczyck" and "Bernhard Grill" and "Luca Invernizzi" and "Jennifer Pullman" and "Cecilia M. Procopiuc" and "David Tao" and "Borbala Benko" and "Elie Bursztein"},booktitle = {Proceedings of Annual Computer Security Applications Conference},year = {2020},organization = {ACM}}

Recent

DROIDCCT: Cryptographic Compliance Test via Trillion-Scale Measurement

Autonomous Timeline Analysis and Threat Hunting

FACADE High-Precision Insider Threat Detection Using Contrastive Learning

Get cutting edge research directly in your inbox.