Skip to main content

QuickSand document malware similarity clustering

One of the most powerful features of is the ability to identify similar malware samples. We can use both the structhash (an md5 based on the structure of the document) and struzzy fuzzy hash to cluster samples.

In this example, we will start with 500 recent samples of document malware with at least 10 detections on VirusTotal and write some python to count the number of unique similarity hashes. If two documents have the same structhush they likely originate from the same criminal or APT group or were generated from the same tool. We will build clusters to quickly group our 500 samples into buckets of similar samples to see what the main threats are.

Out of the 500 randomly selected malware documents, we can quickly see some clusters with up to 76 samples with an identical structure denoted by an identical stucthash.

We can use the fuzzy hash struzzy to squeeze out a few more similarities into the clusters as well. These samples might have a few minor differences as objects are renamed, removed, or added. The fuzzy matching tends to be quite useful for exploit kit generated documents where the core document is the same and additional exploit objects are added.

Pivoting on the biggest cluster, lets dig in and see what the malware payload is. Lets do a quick loop to grab the sha256's of our biggest cluster 422961367c3702a6891d7fcc727fd1ff:

Grabbing a few of the samples randomly we can confirm that this cluster with structhash 422961367c3702a6891d7fcc727fd1ff is detected as emotet:

We can also grab the domains from the behaviour and see from this tweet that the muliarental domain is also confirmed as emotet. As you identify structhashes in samples coming into your organization you can use this information to quickly triage threats and focus on the less common ones for identification. 

Finally grabbing a few more samples from this cluster showed they were all communicating to the same emoted infrastructure and are all likely part of the same group.