An Analytical Approach

Monday, July 2, 2012

You Can't Teach Analytical Skills

From time to time, I get asked to teach people how to be analysts. What I've found over the years is that there are those who are naturally analytical and can become solid, experienced analysts. There are also those who are not naturally analytical. Teach someone how to use a tool or tools to ask incisive questions of the data? Sure. Teach someone what makes one network traffic sample legitimate and another network traffic sample malicious? Sure. Teach someone how an attack pattern/intrusion vector works? Sure. Teach someone how to be an analyst? Nope. Can't be done. They either have analytical skills or they don't. My job is to lay the foundation and share my experience. If a person is analytically inclined, he or she will take off running. If not? Then, unfortunately, no amount of training will be able to make an analyst of the person.

End of an Era?

For many years, domain-based intelligence (e.g., lists of known malicious domain names) provided actionable information that could be leveraged to identify infected systems on enterprise networks. In its day, domain-based intelligence represented a considerable step forward over IP-based intelligence, which had proven to be quite prone to false positives. Of late, however, domain-based intelligence has itself fallen victim to a high rate of false positives. There are a number of reasons for this, but chief among them is the fact that attackers have moved from using entirely malicious domains to compromising small corners of legitimate domains. Because of this, URL patterns (e.g., a POST request for /res.php) have proven to be far more effective at identifying infected systems. Now, for sure, there are some entirely malicious domains that are still used. These domains are often randomly generated via algorithms that change daily, hourly, or even more frequently. Quite simply put, the domains change faster than the intelligence lists can share them out. Could it be that we've reached the end of an era vis a vis domain-based intelligence? Has the era of URL pattern based intelligence begun? I know that I am leveraging URL patterns heavily, and I know that I am not alone in that.

Wednesday, June 13, 2012

Humility

In my experience, the most successful organizations are those that are humble. Successful organizations are smart enough to know what they don't know and are also humble enough to consider that others may know better than they do.

I can think of two recent examples of this:

1) I was recently on an email thread where some individuals from the federal government were discussing the capabilities of the federal government analytically. After much chest beating and boastfulness, someone wondered aloud if the private sector may also have some interesting and unique capabilities analytically. I've worked in both the federal sector and the private sector, and that statement may very well take the prize for the understatement of the year award. If I were still in the federal sector, I would assume that others were better analytically until I found out otherwise, and moreover, I would try to learn from them. The attitude in the government appears to be the opposite, and in my experience, is unjustified. It's a shame really.

2) I recently witnessed a vendor pitch gone horribly wrong. Although the vendor was explicitly told several times what the customer was looking for, the vendor chose to decide for itself what it wanted to sell the customer. The result was downright embarrassing and painful to watch. A catastrophic miscalculation? Sure. But a little humility and willingness to actually listen to the customer could have gone a long way towards avoiding what turned out to be a dead end and waste of everyone's time.

A little humility can go a long way.

Friday, June 1, 2012

The Right Pivot is Everything

Every incisive query into network traffic data needs to be anchored, or keyed, on some field. This is, essentially, the pivot field. Pivoting on the right field is crucial -- I've seen inexperienced analysts spend days mired in data that is off-topic and non-covergent. In some cases, simply changing their pivot vantage point results in an answer/convergence in a matter of minutes.

For example, consider the simple case of a malicious binary download detected in proxy logs. If we want to understand what else the client endpoint was doing around the time of the download, we would pivot on source IP address and search a variety of different data sources keyed on source IP address during the given time period. If we want to quickly assess who else may have also downloaded the malicious binary, we would pivot on domain and/or URL.

Naturally, these are simple pivots, but the point is a good one. Take care to use the right pivot. Otherwise, the results may be confusing, divergent, and inconclusive.

Peer Inside The Tangled Web

Most enterprises do a reasonably good job monitoring their edges, or at the very least logging network transactions traversing those edges these days. What's interesting though is that most enterprises have almost no visibility (or interest for that matter) to understand what's happening in the interior of their networks. I have found there to be great value in peering inside the tangled web that is an enterprise network.

There are several data sources that can prove extremely valuable for monitoring the interior of a network:

Interior firewall logs
DNS logs
Network flow data (netflow)

Allow me to illustrate with two examples I commonly see. The first example deals with detection an infected system, while the second example deals with tracing back an indicator of compromise (IOC) to an endpoint (infected) system.

First, let's consider the example where an enterprise monitors proxy logs and DPI (Deep Packet Inspection) at the edge. Let's say that a client endpoint was re-directed via a drive-by re-direct attack (e.g., Blackhole Exploit Kit), downloaded, and was subsequently infected by a malicious executable. Further, let's say that the malicious executable that was downloaded is not proxy aware. If we miss the executable download (which happens somewhat regularly, even in an enterprise that is monitored 24x7), then our proxy and DPI will likely be of little help to us in detecting the artifacts of intrusion. This is because of two main reasons:

The malicious code is not proxy aware, and thus its callback/C&C (command and control) attempts will most likely be blocked by an interior firewall.
The infected system will likely attempt domain name lookups for callback/C&C domain names. Even if these domain name requests resolve (they don't always resolve, i.e., in cases where the domain name has been taken down), there will no subsequent HTTP request (remember, it was blocked by an interior firewall). Because of this, there will be no detectable footprint in the proxy log. In the DPI data, the DNS request will be co-mingled with the millions of other DNS requests and will show as coming from an enterprise DNS server. This makes detection and endpoint identification nearly impossible.

I believe we can now see the value in retaining, monitoring, and analyzing the three data types I've identified above:

Interior firewall logs will allow us to detect attempts to connect to callback/C&C sites that have been blocked/denied.
DNS logs will allow us to identify endpoints requesting suspicious/malicious domain names.
Netflow data will allow us to very quickly identify other systems that may be exhibiting the same suspicious/malicious behavior.

Consider a second example as well. Let's assume that an indicator of compromise (IOC) has been detected leaving and/or entering the edge of the enterprise network. What needs to happen first and foremost is endpoint identification. How many endpoints are infected? Where are they located? When did the activity begin? What was the infection vector? In the modern enterprise, with many layers of NATing and firewalls, the right logging is key here. Without it, tracing back the infection to the endpoint(s) will be nearly impossible. None of the preceding questions can be answered before endpoint identification is completed successfully.

Hopefully this hard earned advice is helpful to all.

Wednesday, May 2, 2012

Best Practices

Around the information security community, you'll often hear people talking about "implementing best practices". While best practices provide helpful guidance at a high level, I'd argue they can't be "implemented". Implementing something involves wrestling with the realities, nuances, and imperfections of a real enterprise network. This pain is felt particularly acutely when developing reliable and actionable jumping off points for analysis. As I'm sure you're aware, the number of false positives caused by blindly implementing "best practices" is enormous and is enough to stifle any incident response workflow. What I've found over the course of my career is that the best and most reliable jumping off points are created through sweat equity that comes from an iterative cycle of intuition, data-driven refinement, and automation. No manual, white paper, or vendor can sell you the secret sauce.

So how does one get there? By using analysis (of real data) to navigate the difficult path from conception to implementation. Know your network.

Monday, April 16, 2012

Actions Speak Louder Than Words

We all know the popular phrase, "actions speak louder than words". What some of us may not realize is just how wise and true that statement really is. In my experience, I've seen people express words of respect for others, but it's rarely that I witness people demonstrating repect for others through their actions. The difference is important and is one that people notice. Here's to action.