Monday, March 17, 2014

Signal-to-Noise Ratio

Recent media reports discussing the Target and Nieman Marcus breaches have indicated that, in both cases, numerous alerts fired as a result of the intrusion activity. In both cases, the alerts were not properly handled, causing the breaches to remain undetected. I'm sure there are many angles in which these reports can be dissected. Rather than play the blame game, I would like to discuss a subject that remains a challenge for our profession as a whole: the signal-to-noise ratio.

Wikipedia defines the signal-to-noise ratio as "a measure used in science and engineering that compares the level of a desired signal to the level of background noise." In other words, the more you have of what you want, and the less you have of what you don't want, the easier it is to measure something. Let's illustrate this concept by imagining a conversation between two people in a noisy cafe. If I record that conversation from the next table, upon playback, it will be very difficult for me to truly understand what was discussed. Conversely, if I record that conversation in a quiet room, it will be much easier to understand what was discussed upon playback. The signal-to-noise ratio in the second scenario is much higher than in the first scenario.

The same concept applies to security operations and incident response. In security operations, true positives are the signal, and false positives are the noise. Consider the case of two different Security Operations Centers (SOCs), SOC A and SOC B. In SOC A, the daily work queue contains approximately 100 reliable, high fidelity, actionable alerts. Each alert is reviewed by an analyst. If incident response is necessary for a given alert, it is performed. In SOC B, the daily work queue contains approximately 100,000 alerts, almost all of which are false positives. Analysts attempt to review the alerts of the highest priority. Because of the large volume of even the highest priority alerts, analysts are not able to successfully review all of the highest priority alerts. Additionally, because of the large number of false positives, SOC B's analysts become desensitized to alerts and do not take them particularly seriously.

One day, 10 additional alerts relating to payment card stealing malware fire within a few minutes of each other.

In SOC A, where every alert is reviewed by an analyst, where the signal-to-noise ratio is high, and where 10 additional alerts seems like a lot, analysts successfully identify the breach less than 24 hours after it occurs. SOC A's team is able to perform analysis, containment, and remediation within the first 24 hours of the breach. The team is able to stop the bleeding before any payment card data is exfiltrated. Although there has been some damage, it can be controlled. The organization can assess the damage, respond appropriately, and return to normal business operations.

In SOC B, where an extremely small percentage of the alerts are reviewed by an analyst, where the signal-to-noise ratio is low, and where 10 additional alerts doesn't even raise an eyebrow, the breach remains undetected. Months later, SOC B will learn of the breach from a third party. The damage will be extensive, and it will take the organization months or years to fully recover.

Unfortunately, in my experience, there are a lot more SOC B's out there than there are SOC A's. It is relatively straightforward to turn a SOC B into a SOC A, but it does require experienced professionals, organizational will, and focus. How do I know? I've turned SOC B's into SOC A's several times during my career.

We are fortunate to have some great technology choices these days that we can leverage to improve our security operations and incident response functions. These technology choices can enable us to learn of and respond to breaches soon after they occur. Before purchasing any technology intended to produce alerts destined for the work queue, we should ensure that it supports the ability to issue very precise, targeted, incisive questions of the data. This enables us to hone in on the activity we want to identify (the true positives/the signal), while minimizing the activity we do not want to identify (the false positives/the noise). As always, these technologies are tools that need to be properly leveraged as part of a larger people, process, and technology picture.

What is your signal-to-noise ratio? Is it high enough to detect the next breach, or could it stand to be strengthened? I would posit that the ratio of true positives to false positives (the signal-to-noise ratio) is an important metric that all organizations should review. Not doing so could have dire consequences.

No comments:

Post a Comment