What is CASM?
CASM (Create Assembly Segmentation Matches) is an tool created by Internet 2.0, specifically crafted for the purpose of detecting code reuse and originally designed to be used with ransomware samples. This product allows malware analysts and reverse engineers the ability to determine the underlying similarities that exist between files, leveraging assembly code within each file as the key reference point. To facilitate this analysis, CASM harnesses the power of multiple mathematical sequences:
Jaccard similarity: a proximity measurement used to compute the similarity between two objects.
Levenshtein distance: a string metric for measuring the distance between two sequences.
Ssdeep string matching: ssdeep is a tool used to determine similarities in files, we use it to compare similarities in strings.
Sequence match ratio comparisons: a similarity formula designed to compare the similarities between two sequences.
The exact formula used in order to determine the similarity can be summed up as the following equation:
What is code reuse?
Code reuse can be summed up with: “utilizing existing software components to construct new software solutions.” An example, envision a scenario where three distinct tools are designed to achieve a specific task. Now, imagine combining these three tools into a single tool that accomplishes the same task more efficiently. This shows the concept of code reuse.
In the realm of malware analysis, the presence of code reuse within a sample can indicate either the usage of code from another sample or the work of the same developer reconfiguring their tools to accomplish a different task. One approach to identifying code reuse involves disassembling the file to its behavioral (assembly) code and comparing it with another sample. The following example shows this process:
Although it may initially appear daunting, rest assured that the process is extremely straightforward. The key aspect to focus on is the relative abundance of red and green. By associating green with similarity and red with dissimilarity, compelling evidence emerges, showcasing the striking resemblance between these two code strings. This principle is applied when conducting code reuse analysis: disassembling the binaries, consolidating the assembly code, checking the code for disparities, and calculating a similarity score. This approach enables the identification of commonalities and variations, providing a measure of similarity.
What makes CASM different?
CASM stands out from other code reuse analysis tools by its unique approach that solely focuses on the disassembled assembly code of the file. Unlike other tools, CASM does not take into account other factors such as code signing certificates or functionality when determining similarity. Its objective is to meticulously compare the assembly code through intelligent grouping and employ advanced mathematical techniques to generate an accurate similarity score. Integrating mathematically sound algorithms, CASM revolutionizes the field of code reuse analysis, offering an intriguing methodology for uncovering instances of code reuse within files. This process unlocks a realm of complexity and fascination in the pursuit of identifying code reuse.
CASM in action
Comparison #1:
Filenames:
KBDTH0.DLL
KBDTH1.DLL
Hashes:
61ac624aae7f02ba1ac57e9825167f4c11b21ebec206c72ece1ef215634d6174
83030d88d4ab7b70d6e716f95b88058d401a19b973428cd476d6086046f80358
VT links:
Malcore Link:
The first files we will be analyzing are Windows System32 files that are obviously very similar from the names.
The level of similarity observed between these two files may not come as a surprise, given their similar naming schemes and the fact that they are both Windows files. However, the real intrigue lies in exploring the realm of files that are not inherently expected to be associated with one another.
Comparison #2:
Filenames:
AgentTesla
AvosLocker
Hashes
cd2c6e74c9698f2069a0f2c76a88f9247ee537312196a3377b06db5c3f272596
01792043e07a0db52664c5878b253531b293754dc6fd6a8426899c1a66ddd61f
VT links:
Malcore link: https://malcore.io/share/637b72e6d2815d7e836350da/64947812f5f66e1327033fca
The degree of similarity observed between these files is astonishing, particularly when considering the subtle differences they exhibit. Despite their minimal disparities, it is intriguing to note that VT (VirusTotal) categorizes these two files as different entities, with one of them barely registering any detection. This contrast adds an element of surprise to the analysis emphasizing the nature of their relationship.
Comaprison #3:
Filenames:
Babuk
BlueSky
Hashes:
79456569b6aba9d00e641ce0067a0b18e4fe69232d6c356201d1ab62ebfe4c8f
baaa0e49398ef681ef71e84a7a86dd2b78f36ac83785e0d9a061067ebaf8b006
VT links:
Malcore link:
The consensus within the cybersecurity community suggests that BlueSky and Conti either belong to the same group or that a new group has adopted the same ransomware. However, what often receives less attention is the remarkable similarity between BlueSky and Babuk. Upon examination of the aforementioned similarities, it becomes evident that BlueSky and Babuk share significant commonalities. This raises an intriguing question: Could it be possible that Conti and Babuk also share substantial similarities, perhaps indicating that they stem from the same developers? Exploring these potential connections sheds light on the intricate relationships that exist within the realm of ransomware development and underscores the need for further analysis and investigation.
Conclusion
CASM emerges as an invaluable asset in the realm of code reuse analysis, unearthing previously undiscovered commonalities within samples and shedding new light on the cyber security landscape. Its exceptional utility empowers analysts with a fresh and exhilarating approach to swiftly and effectively conduct code reuse analysis, leveraging the capabilities of Malcore. Today, by simply signing up for Malcore, users can readily harness the full potential of CASM at no cost, facilitating seamless and comprehensive analysis. Embrace the opportunity to explore the vast possibilities that CASM and Malcore offer, revolutionizing the field of code reuse analysis.