In the early days of security operations and incident response, organizations collected every network traffic log they could get their hands on. There were many reasons why this was done, but some of those reasons included:
- It was not clear which data did or did not provide value to security operations and incident response
- It was not clear when/how often the data should be reviewed and analyzed or how exactly to review and analyze it.
- The velocity at which the data streamed into centralized collection points was far lower that it is today
- The volumes of data being collected were far lower than they are today, in part because network speeds were lower, and in part because networks were less well instrumented for collection
- The variety and diversity of the data being collected were far lower than they are today, in part because networks were less well instrumented for collection, and in part because there were less specialized technologies collecting data
Organizations followed this model through the years, and for good reason -- there was no reasonable alternative. Over the years, the incident response community has grown, organizational knowledge has increased, and capabilities have matured. We are now to the point where we can, with some effort, assess the value of each available data source to security operations and incident response. The incident response process and the incident handling life cycle are both well documented and well understood. The velocity, volume, and variety of data have increased tremendously and continue to increase. All of these factors contribute to the new reality -- that for security operations, incident response, and network forensics purposes, it is not possible to collect every data source available. The operational complexity, workflow inefficiency, storage requirements, and query performance simply do not allow for this. Rather, each data source should be evaluated based upon its value-add to security operations, incident response, and network forensics, while at the same time being weighed against the volume of the data produced by the data source. This is something we routinely do in other aspects of our lives -- we opt to carry one $20 bill, rather than 80 quarters because it scales better. Why shouldn't security operations use the same approach?
Critics of this approach will say that if they omit certain types of data from their collection, they run the risk of losing visibility and/or not being able to perform incident response. To those critics, I would ask two questions: 1) What makes you so certain that you cannot retain the same level of visibility using fewer data sources of higher value? And, 2) If it takes 8 hours to query 24 hours worth of non-prioritized log data looking for the few log entries that are relevant, are you really able to perform timely and accurate incident response? Clearly there is a balance that needs to happen. In these cases, I am a big fan of the Pareto rule (sometimes called the 80/20 rule). I have seen organizations that don't ever look at 80% (or more) of the log data they collect. So, with 20% of that data, the same visibility can be retained, and as an added bonus, retention can be increased five-fold (say from 30 days to 150 days) at the same storage cost. With the need to be ready to perform rapid incident response as critical as ever, it pays to think about how less allows us to do more.
Critics of this approach will say that if they omit certain types of data from their collection, they run the risk of losing visibility and/or not being able to perform incident response. To those critics, I would ask two questions: 1) What makes you so certain that you cannot retain the same level of visibility using fewer data sources of higher value? And, 2) If it takes 8 hours to query 24 hours worth of non-prioritized log data looking for the few log entries that are relevant, are you really able to perform timely and accurate incident response? Clearly there is a balance that needs to happen. In these cases, I am a big fan of the Pareto rule (sometimes called the 80/20 rule). I have seen organizations that don't ever look at 80% (or more) of the log data they collect. So, with 20% of that data, the same visibility can be retained, and as an added bonus, retention can be increased five-fold (say from 30 days to 150 days) at the same storage cost. With the need to be ready to perform rapid incident response as critical as ever, it pays to think about how less allows us to do more.
No comments:
Post a Comment