The Case for Behavioral Analysis

February 28, 2017 by Mounir Hahad
The Case for Behavioral Analysis

In this article, we will lay out arguments with real-life examples in support of behavioral detection of malware as opposed to more traditional static methods of detection. For the sake of simplicity, we will limit the scope of the discussion to executable files only. Similar arguments can be made for other types of files that harbor executable scripts (like Microsoft Office documents) or which rely on vulnerabilities of the handling application to trigger malicious code execution (like flash exploits).

Behavioral detection relies on observing the execution of a program, the sample, and inferring malicious intent based on those observations. This is usually done in a contained and instrumented environment like a sandbox, but can also occur during real time execution on a real endpoint host.

Most network security solutions do not require the ability to observe program execution on each endpoint they protect and rely only on an embedded sandbox to extract breadcrumbs of execution known as traces during program execution. There are several types of sandboxes that could perform partial emulation, full system emulation or true execution with binary translation. We will not discuss the differences here since all end up with some number of traces that the solution needs to analyze.

Based on the observed traces of execution, a security solution will apply any number of methods to infer malicious intent. We will discuss heuristics based rules and machine learning methods in a separate article. The goal of most methods is to render a verdict: the sample is either safe or malicious. Some methods will shy away from rendering a verdict and provide a probability of maliciousness, shifting the responsibility of setting a malware verdict threshold to the user.

In contrast, Static detection methods do not require to simulate the execution of a sample to infer malicious intent. They rely on the extraction of static attributes from the sample and application of some analysis method to render a verdict. Again, the analysis could be heuristics rules based or machine learning, just like with behavioral analysis.

In order for us to appreciate the complexities either approach has to beat to be successful, here are two examples:

The first example is armoring. Armoring is a set of techniques implemented by malware authors to thwart any attempt at automated behavior analysis and therefore prevent the security solution from issuing a malware verdict. There is a wide range of such techniques. Some apply randomness: the sample will only execute its malicious code if the host computer has been up for some period of time. Or a compromised web site will only deliver malware to one in a thousand visitors. But most malware will rely on detecting an analysis environment and either do nothing (exit) or perform some harmless activity to throw the analysis off track. The quality of a behavioral detection solution will depend heavily on how it counters some of these armoring techniques. This doesn’t mean the malware needs to be duped into executing its malicious payload. After all, how often do you see a legitimate application try to detect the presence of a debugger and exit if it thinks one is present?

The second example is packing. This technique is implemented by malware authors to avoid detection by static analysis solutions. Packing is the process of packaging an executable payload inside another executable whose only role is to install the inner payload. This technique gives malware operators the ability to store the malware payload in a way that makes it virtually unique and hides all static attributes of the malware. One can no longer rely on hashes or string patterns to identify malware since each new variant is basically a new binary data blob. That said, recent research has shown that even in these cases, some attributes remain detectable by deep learning methods when packing does not involve encryption. Otherwise, all bets are off, and most advanced engines resort to detecting the fact the file is packed, not that it is malicious in nature.

Some malware relies on known packers for which AV companies have developed un-packers (in the tune of a few dozen packers) but the more sophisticated threats would use custom packers for which no known unpackers exist.

Most malware goes after the mass market: it’s a numbers game. As long as the authors build a bigger botnet, or infect a large number of endpoints with ransomware or key loggers, they couldn’t care less whether the malware is detected by a behavioral analysis solution or not. Since behavioral analysis tends to be more expensive to implement and to operate, it is usually not present on endpoints, especially non-enterprise endpoints. This drives the vast majority of malware to employ techniques to defeat static analysis tools, but only the more sophisticated malware will attempt to thwart behavioral analysis tools.

 

Case In Point – Real-Life Examples

Let’s analyze how static Anti-Virus engines and Cyphort’s behavioral detection engine fair with some relatively well known malware families.

We will use the Trojan Dynamer as an example. Dynamer has evolved from a run of the mill trojan to a fairly sophisticated banking trojan. It is capable of downloading modules to update itself and perform new tasks.

Here is a table that shows the progression of AV engines detection on VirusTotal from the first time a sample of this family is uploaded to the last scan some time later:

sha256

First Seen Date

VT positives on First Seen date

VT positives 24h later

bb428124645e1da9f9ce18a1eee477c8fd0d0f0d39fed249a02b9b25bea31da6

2016-11-29 4:18:36

5

33 (current: 47)

1755b6a732ec5099af2812a9f93e59695022af9dc71f5e95931edf361d0e5134

2016-12-02 14:01:27

7

28 (current: 37)

c0b885350b2e8f76dfeba186328219058159c7b28628d780dfde369eb70d9ffd

2016-12-03 10:10:48

9

28 (current: 44)

e153d8bd7c9996e04c9d76fe167d7caf6f4d52b5ead76f2d0dbb5b25fcd991fb

2016-12-04 11:29:43

4

26 (current: 38)

This example illustrates the fact that over several days, a pattern of detection emerges: As a new sample of this family is discovered in the wild, AV engines struggle to detect it on the first day, then catch up 24h later as they develop new signatures. But the very next day, a new sample is discovered and detection for it is again low. This cycle keeps repeating day after day.

On the flip side, a behavioral analysis that is well implemented has no problem consistently detecting all these variants. This is because the behavior of each sample when executed in a sandbox remains largely the same. Malware authors have no interest, nor time, to vastly change the behavior of a malware family every day, but it is trivial for them to change packing technique or encryption parameters to create new samples that evade signature detection. So as long as the sandbox is well implemented and the analysis of its traces uses a solid method, detection of variants causes no problem.