Thursday, March 31, 2011
80-20 Rule
There are some emerging technologies coming onto the scene now to get the uber analyst closer to that last 20%. At the same time, the broader cyber security community is awakening to the first 80% (in that the awareness of the need for network monitoring is rising). We live in interesting times for sure, and I'm excited to watch the evolution.
Tuesday, March 29, 2011
Sharing
My intent is to continue to share knowledge and techniques with the larger community. All indications are that the community is extremely interested in the topic.
Monday, March 21, 2011
Collection and Analysis
We mustn't forget the other half of the equation: analysis. I think if we as a community thought more about what we wanted to do analytically before we instrumented our networks, we would save ourselves a lot of pain down the line. Perhaps we will improve in this area in the coming years.
Wednesday, March 16, 2011
The Future
I issued the students this challenge: "Seek out, identify, and study the unknown unknowns and turn them into known knowns" (reference an earlier blog post regarding known knowns). I believe that this is the boiled down essence of our obligation as network monitoring professionals/analysts.
The future holds great potential for our field. I am realizing that the onus is on those of us currently in the field to capture the interest and energy of the brightest minds. The network monitoring field and broader cyber security field face many challenges, and in order to conquer them, we will need the best and the brightest.
Sunday, March 13, 2011
Jumping Off Points Revisited
I have previously blogged about the concept of jumping off points -- identifying subsets of data in a workflow that an analyst can run with. I often speak about this topic as well -- raw data can be overwhelming, and finding a way to present it to an analyst with a way forward can greatly improve productivity.
What I discovered last week was that this concept also holds with other types of work as well. I was working on a paper with another co-worker. We were having trouble finding a way forward until we identified a jumping off point that we could work from. Once we figured out our jumping off point, everything else flowed. It was amazing.
I guess I shouldn't be surprised by now that a structured, well organized approach would yield good results.
Monday, March 7, 2011
Known Knowns
"Known Knowns" are network traffic data that we understand well and can firmly identify. Members of this class of network traffic can be categorized as either benign or malicious. Detection methods here can be automated and don't require much human analyst labor on a continuing basis. Unfortunately, this is the class of network traffic that we as a community spend the bulk of our time on. Why do I use the term unfortunately? More on that later.
"Known unknowns" are network traffic data that we have detected, but are puzzled by. We don't have a good, solid understanding of how to categorize this class of network data. One would think that because of this, we should spend a decent amount of time trying to figure out what exactly this traffic is. After all, if we don't know what it is, it could be malicious, right? Unfortunately, not enough time is put into this class of network traffic, and as a result, most organizations remain puzzled and/or turn a blind eye to the known unknowns. Why don't we work harder here? We're too focused on the known knowns.
"Unknown unknowns" are network traffic data that we have not yet detected, and as a result, we aren't aware of what this class of network traffic is (or isn't) doing on our network. This is the class of network data that contains most of the large breaches (and thus most of the collateral damage), as well as most of the truly interesting network traffic. Finding this traffic takes a skilled analyst, good tools, the right data, and a structured, well-organized approach to network monitoring. Ironically, this class would be extremely interesting to a skilled analyst, but due to the known known "rut" that we as a community are in, analysts don't really get a chance to touch this class.
So now I think you can understand why I think it's unfortunate that we as a community are so focused on the known knowns. We are so busy "detecting" that which we've detected time and time again, that we ignore the bulk of the rest of the network traffic out there. That's where we get in trouble repeatedly.
On the bright side, I do see the idea of taking an analytical approach to information security slowly spreading throughout the community. I think it's only a matter of time before one organization after another wakes up to the fact that their 1990s era signature-based approaches are only one part of the larger solution. With proper analysis and monitoring of network data and network traffic comes knowledge. And with knowledge comes the realization that what you don't know is often a lot scarier than what you do know.
Friday, March 4, 2011
Data Value
In thinking about why organizations end up in an overloaded/confused/complicated state, I've come up with two primary reasons:
1) No one data type by itself gives them what they need analytically/forensically/legally
2) There is great uncertainty of what data needs to be collected and maintained to ensure adequate “network knowledge”, so organizations err on the side of caution and collect everything.
To me, this seems quite wasteful. It's not only wasteful of computing resources (storage, instrumentation hardware, etc.), but it's also wasteful of precious analytical/monitoring/forensics cycles. With so few individuals skilled in how to properly monitor a network, the last thing we want to do is make doing so harder, more confusing, and further obfuscated.
The good news is, I think there is a way forward here. As I discussed in a previous post, enriching layer 4 meta-data (e.g., network flow data) with some of the layer 7 (application layer) meta-data can open up a world of analytical/monitoring/forensics possibilities. I believe that one could take the standard netflow (layer 4) meta-data fields and enrich them with some application layer (layer 7) meta-data to create an "uber" data source that would meet the network monitoring needs of most organizations. I'm not sure exactly what that "uber" data source would look like, but I know it would be much easier to collect, store, and analyze than the current state of the art. The idea would be to find the right balance between the extremes of netflow (extremely compact size, but no context) and full packet capture (full context, but extremely large size). The "uber" data source would be somewhat compact in size and have some context. Exactly how to tweak that dial should be the subject of further thought and dialogue.
This is something I intend to continue thinking about, as I see great promise here.