Friday, May 21, 2010

Aggregation

Aggregation is my friend. When I'm first introduced to a pile of data, be it logs, flow data, PCAP, etc., it can be overwhelming. With a client eagerly awaiting some results, what is an analyst to do? Enter aggregation. Aggregating data over multiple fields can help an analyst very quickly slice through data to get a big picture view and pull out events of interest to analyze further. It's also a great way to create jumping off points (reference an earlier post).

What are some of my favorite fields to aggregate over you may ask? In this post, I'll start with one of my favorites:

Source Port, Destination Port, Number of Bytes

Why do I find this particular aggregation so interesting? Let's go through it. For those that are familiar with Internet Protocol (IP), we know that servers typically communicate on a fixed port. For example, most web servers serve web pages on port 80. For this example, we will equate server port to destination port. In other words, we will assume that we are on the inside of our network looking out (in practice, this is actually a useful vantage point to take). Clients, on the other hand, choose a source (client) port in an incremental fashion. The exact method of picking the source port varies by operating system, but with a large enough sample, we can assume that the distribution of source ports is roughly uniform. In practice, network traffic is so voluminous, that this is a relatively safe assumption.

So, where does that leave us? Well, for starters, we can exploit the roughly uniform distribution of source (client) ports to identify cases in which the source port did not appear to be chosen as expected. In other words, one or more source ports were "favored" for one reason or another. Typically, an automated/machine action will cause one or more source ports to be "favored", whereas a human action would not have this same effect. If we aggregate source port, destination port, and number of bytes, what we're in effect doing is picking out instances where a given byte size is sent repeatedly from the same source port(s). Pretty neat, eh?

As an added benefit, this aggregation is time agnostic. That means that it can catch the low and slow attacks just as well as it can catch the blatantly obvious. Love it.

No comments:

Post a Comment