Malware is one of the key threats to online security today, with applications ranging from phishing mailers to ransomware and trojans. Due to the sheer size and variety of the malware threat, it is impractical to combat it as a whole. Instead, governments and companies have instituted teams dedicated to identifying, prioritizing, and removing specific malware families that directly affect their population or business model.
The identification and prioritization of the most disconcerting malware families (known as malware hunting) is a time-consuming activity, accounting for more than 20% of the work hours of a typical threat intelligence researcher, according to our survey. To save this precious resource and amplify the team’s impact on users’ online safety we present Spotlight, a large-scale malware lead-generation framework.
Spotlight first sifts through a large malware data set to remove known malware families, based on first and third-party threat intelligence.It then clusters the remaining malware into potentially-undiscovered families, and prioritizes them for further investigation using a score based on their potential business impact. We evaluate Spotlight on 67M malware samples, to show that it can produce top-priority clusters with over 99% purity (i.e., homogeneity), which is higher than simpler approaches and prior work.
To showcase Spotlight’s effectiveness, we apply it to ad-fraud malware hunting on real-world data. Using Spotlight’s output, threat intelligence researchers were able to quickly identify three large botnets that perform ad fraud.