Because I'm all about the "good enough."

Saturday, February 11, 2012

In 50 gigabytes, turn left: data-driven security.

I love Scott Crawford's research into data-driven security.  I agree with him that IT operations and development can both benefit from the right security data -- where "right" means at the appropriate level and relevant to what they're doing.  It also has to be in the right mode:  an alert should be based on a conclusion drawn from the analysis of data (20 failed logins per second = someone is using automation to try to break in), based on an event or confluence of certain events.  Once someone in IT needs to perform an investigation, the need changes to looking at more atomic data (exactly which logins are being targeted, whether they're active or disabled, etc.).  In other words, the details need to be available on demand, but they shouldn't be shoved at the IT staff in lieu of useful alerts.

Another kind of data that is useful is situational data:  how things are configured and what is happening during "normal" operation.  Viewing all the responses from a database is too much to ask of a developer -- but the developer would benefit a lot by knowing that some queries are taking 25 minutes to return (do you suppose that would have some effect on application performance?).  This is the sort of data that is incredibly useful, but setting up every possible abnormal situation to trigger an alert is way beyond the scope of an overworked operations team.  Sometimes you just have to sit down and do some exploring every so often, to find out these sorts of operational problems.  Packet captures can teach you things you can't learn any other way -- if you have the time and skills to read them.

Because detection is expensive.  It requires the luxury of having staff both knowledgeable in the technology and in the context of those particular systems, and having them devote a lot of their time just to sitting and looking at things, sorting out what's normal from what's not.  Those are the kind of costly eyeballs that have been transferred so frequently to managed security service providers.  It's the kind of thing you pay consultants to do, because if your staff weren't completely occupied with keeping the infrastructure running, you wouldn't be allowed to keep them.  Data analysis today is expensive, and it's a one-off deal unless you can find economies of scale somewhere.

Yes, automation is getting better, but it's not there yet.  There are still too many alerts taking up too much time to sort through (particularly in the tuning phase).  IT staff get hundreds of emails a day; they can't handle more than two or three alerts that require real investigation.  (By the way, this is why operations often can't respond to something until it's down -- it's the most severe and least frequent kind of alert that they receive all day, and they don't have time to chase down anything lower-level, like a warning message that hasn't resulted in badness yet.)

If you break security events down, you're generally looking for two kinds of things:  normal activities that are being done by the wrong people (as in, from a Tor exit node through your administration console), or abnormal activities that are being done by the "right" people (internal fraud, or someone has taken over an authorized account).  And by "people," of course, I also mean "systems," but at first glance it's sometimes hard to tell the difference. 

This determination of "wrong" and "right" is a security activity, and for the reasons I listed above, operations people may not care that much until it makes something happen that they have to fix.  If someone wipes a database, they'll care a whole lot, but if there's some unusual encrypted traffic leaving the enterprise on port 80, not so much.  A fully leveraged (i.e. overworked) ops team doesn't have time to analyze alerts at that level.

"Wrong" and "right" to the business is on a completely different stratum, and it's one that's hard for automation to reach today.  Executives care when it gets to the level where they have to do something about it, like fire someone for looking at patient data, or talk to the press about a breach.  They care when an event starts to present the risk of legal liability or increased cost.  But you can't bring them alerts like that until you have digested everything at a lower level and put together enough evidence to reveal a business issue.

And finally, historical data can be extremely useful in determining what works in security and operations and what doesn't.  But that kind of data has to be analyzed in a different way from real-time operational data or situational data.  It requires a different model that caters to the requirements of risk analysis -- and that, too, is expensive, even assuming you know how to do it today.  (Hi, Chris.)

My point here is to say that data-driven security is where we need to go, absolutely.  But there is no single path to take with the data we have; there are a number of divergent paths that are all needed in the enterprise.  We also need to be able to drive the data in the right delivery directions -- which means that we need a really good data navigation system.