So awhile back, I posted the following to twitter:
Thought of the Day: We don’t need to share raw data if we can share meta-data generated using uniform analytical methodologies.
@mortman You can’t test & refine models without raw data, & you can’t ask people with the same orientation to bring diverse perspectives.
We went back and forth a bit until it became clear that this needed an actual blog post, so here it is:
I don’t disagree with Adam that we need raw data. He’s absolutely right that without it, you can’t test models. What I was trying to get at was that, even though I would absolutely love to have access to more raw data to test my own theories, it just isn’t realistic to expect that sort of access in the legal and business environment we have today. So until things change, we have to figure out another way to get at the data.
One thing that has become increasingly popular is for vendors to publish aggregate data about what they’ve seen with their customers or on their networks. Verizon and WhiteHat have used this model to great effect. Not only has it generated a lot of press for them, but we as an industry have learned a lot from these reports.
What would be even better is if people would share the models they are using when generating their data. This way, other organizations could use the models and as reports were published, the rest of us could actually compare apples to apples. This would also allow us to more quickly identify issues/errors in the models, allow for public discussion of necessary tweaks and then test said changes while limiting liability for the data owners.
This is really where I was going with my initial thought above; that we need common models so we can have an intelligent discussion. This is also how things generally work in the sciences (yes, Alex, I know, we’re not a science yet :). Researchers almost never publish their raw data, but just their models, methods and results. I feel strongly that until we can convince people to share raw data more openly, this is our best shot to figuring real information about what’s going on in the security world. It’s also what drove me to start developing the soon to be renamed Mortman/Hutton Model that Alex and I presented at Blackhat and BSides Las Vegas.
More data, even if it’s aggregate, is better then no data.