Wednesday, April 16, 2014

Data Value vs. Data Volume

There is no shortage of big data talk these days. One concept that I have always hoped would get more attention is the concept of data value. Much of today’s big data discussion is dominated by the concepts of volume, velocity, and variety, but unfortunately, I haven’t seen much discussion around the concept of value.

When security organizations tackle the big data challenge, they primarily focus on two things:
  1. Gaining access to every data source that might be relevant to security operations
  2. Warehousing the data from all of those data sources
In my experience, organizations take this approach for two primary reasons:
  1. Historically, access to log data was scarce, creating a “let’s take everything we can get our hands on” culture
  2. There is not a great understanding of the value of each different data source to security operations, creating a “let’s collect everything so that we don’t miss anything” philosophy
Unfortunately, this creates new challenges that are particularly acute in the era of big data:
  1. The variety of data sources creates confusion, uncertainty, and inefficiency -- the first question during incident response is often “Where do I go to get the data I need?” rather than “What question do I need to ask of the data?”
  2. The volume and velocity of the data deluge the collection/warehouse system, resulting in an inability to retrieve the data in a timely manner when required
While it is true that a conservative, “collect everything” approach is good in the absence of anything better, I would suggest an alternative process when facing the challenges of collection and analysis head-on:
  1. Determine logging/visibility needs scientifically based on business needs, policy requirements, incident response process, and other guidelines
  2. Review the network architecture to identify the most efficient collection points
  3. Instrument the network appropriately where necessary/lacking visibility
  4. Identify the smallest subset of data sources that provide the required visibility and value with the least amount of volume
This approach may seem radical at first glance, but those of us that have worked with log data in an incident response setting will see that this is really the only way that security operations programs can keep pace with the big data deluge. After all, if you can’t get a timely answer from the very data you insisted on collecting, was there really any value in collecting it? What goes in must come out easily, efficiently, and rapidly. Otherwise, there is simply no point in collecting it.

No comments:

Post a Comment