Category: data

Everybody Should Be Doing Something about InfoSec Research

Previously, Russell wrote “Everybody complains about lack of information security research, but nobody does anything about it.”

In that post, he argues for a model where

Ideally, this program should be “idea capitalists”, knowing some people and ideas won’t payoff but others will be huge winners. One thing for sure — we shouldn’t focus this program only on people who have been “officially” annointed by some hierarchy, some certification program, or by credentials alone.

I agree that a focus on those anointed won’t help, but that doesn’t mean it’s easy to set up such an institution.

The trouble with the approach is that we have such institutions (*ARPA, venture capital) and they’ve all failed for institutional reasons. However high their aspirations, such organizations over time get flack from their funders over their failures, their bizarre and newsworthy ideas and the organizations become conservative. They trend towards “proven entrepreneurs” and incrementalism. The “Pioneer Fellows” idea does not overcome this structural issue. (There is an argument that the MacArthur genius grants overcome it. I’m not aware of any research into the relative importance of work done before and after such grants, but I have my suspicions, prejudices and best practices.)

Of course, I might be wrong. If you have a spare million bucks, please set this up, and we can see how it goes. An experiment, if you will.

Experiments are a big part of why Andrew and I focused on free availability of data. With data, those with ideas can test them. There will be a scrum of entrepreneurial types analyzing the data. Fascinating stuff will emerge from that chaos. With evidence, they will go to the extant ‘big return’ organizations and get funding. Or they’ll work for big companies and shift product directions.

That is, the issue in infosec is not a lack of interesting ideas, it’s the trouble in testing them without data. We need data to test ideas and figure out how they impact outcomes.

Krebs on Cyber vs Physical Crooks

In addition, while traditional bank robbers are limited to the amount of money they can physically carry from the scene of the crime, cyber thieves have a seemingly limitless supply of accomplices to help them haul the loot, by hiring so-called money mules to carry the cash for them.

I can’t help but notice one other important distinction between these two types of bank crimes: The federal government sure publishes a lot more information about physical bank robberies that it makes available about online stick-ups.

Go read “Cyber Crooks Leave Traditional Bank Robbers in the Dust” by Brian Krebs. Then ask why we sweep these crimes under the rug.

Open Security Foundation Looking for Advisors

Open Security Foundation – Advisory Board – Call for Nominations:

The Open Security Foundation (OSF) is an internationally recognized 501(c)(3) non-profit public organization seeking senior leaders capable of providing broad-based perspective on information security, business management and fundraising to volunteer for an Advisory Board. The Advisory Board will provide insight and guidance when developing future plans, an open forum for reviewing community feedback and a broader view when prioritizing potential new services.

I figure readers of this blog should be interested in helping drive open data sources.

Data Not Assertions

There have already been a ton of posts out there about the Verizon DBIR Supplement that came out yesterday, so I’m not going to dive into the details, but I wanted to highlight this quick discussion from twitter yesterday that really sums of the value of the supplement and similar reports:

georgevhulme: I’m glad we have data to refute the “insiders conduct 80% of all attacks” mantra that has been repeated, ad nauseum for at least a decade

adamshostack: @alexhutton @georgevhulme yeah, but… Data, not assertions

This is so awesome, I can barely stand it. We’re actually starting to be able to make data based decisions as opposed to just asserting something is true because we believe it on faith or like the way it sounds.

“Data, not assertions” really sums up so much of what I was trying to get at in the the discussion on securosis last week about password changing time frames. Read the comments over there. It really shows how far we have yet to go.

"80 Percent of Cyber Attacks Preventable"

Threatlevel (aka 27B/6) reported yesterday that Richard Schaeffer, the NSA’s information assurance director testified to the Senate Senate Judiciary Subcommittee on Terrorism, Technology and Homeland Security on the issue of computer based attacks.

If network administrators simply instituted proper configuration policies and conducted good network monitoring, about 80 percent of commonly known cyber attacks could be prevented, a Senate committee heard Tuesday.

The remark was made by Richard Schaeffer, the NSA’s information assurance director, who added that simply adhering to already known best practices would sufficiently raise the security bar so that attackers would have to take more risks to breach a network, “thereby raising [their] risk of detection.”

I’m really curious however on what data Director Schaeffer is basing his testimony on. Is it the DBIR? Another open set of breach data or is it based on data gathered by the NSA? Regardless, it’s great to see more folks talking about what the Verizon DBIR report told us and what we’ve known anecdotally for a long time; which is, we still aren’t even close to doing the basics well.

The article then goes on to tell us:

A 2009 Price Waterhouse Cooper study on global information security found that 47 percent of companies are reducing or deferring their information security budgets, despite the growing dangers of cyber incursions.

The thing is, as we’ve learned from the Verizon study, most of the found issues were due to failing at doing the basics, like not removing default passwords, not revoking accounts when employees leave and misconfigurations. Even in the case of patching, the vast majority of holes exploited had patches available for over a year and 100% had patches available for over 6 months. This is not the stuff of big budgets and sexy technology, but rather about having solid, repeatable and auditable processes, in other words, serious operational discipline. Budget cuts might actually be a good thing because it will force organizations to focus on the people and process portions of security rather then the technology. It’d be really cool to if PWC were to track correlation of budgets to breaches within their survey groups, then we’d have some actual data on potential optimal spend levels.

Botnet Research

Rob Lemos has a new article up on the MIT Technology Review, about some researchers from UC Santa Barbara who spent several months studying the Mebroot Botnet. They found some fascinating stuff and I’m looking forward to reading the paper when it’s finally published. While the vast majority of infected machines were Windows based (64% XP, 23% Vista), 6.4% were running either OS X Tiger or Leopard, demonstrating yet again that just because you have a Mac doesn’t mean you are safe. More interesting to me was:

The researchers also discovered that nearly 70 percent of those redirected by Mebroot–as classified by Internet address–were vulnerable to one of almost 40 vulnerabilities regularly used by the most popular infection toolkits designed to compromise computer systems. About half that number were vulnerable to the six specific vulnerabilities used by the Mebroot toolkit.

The research suggests that users need to update more often, says UCSB’s Vigna.

Unfortunately, until the paper comes out we won’t know which vulnerabilities were being used and how old they are. Hopefully, that will be explained further as it would be really interesting to see how this data compares with what Verizon found in their research.

Models are Distracting

claire-cropped.jpgSo Dave Mortman wrote:

I don’t disagree with Adam that we need raw data. He’s absolutely right that without it, you can’t test models. What I was trying to get at was that, even though I would absolutely love to have access to more raw data to test my own theories, it just isn’t realistic to expect that sort of access in the legal and business environment we have today. So until things change, we have to figure out another way to get at the data.

First off, I don’t disagree with why Dave is going where he’s going, but I think it’s built on a mis-read of where we are, and a strategic error regardless.

Where we are: we do have raw data. It’s coming to us from unexpected sources, and we’re getting more of it day by day. We’d like more details, we’d like more consistency and we’d like more depth, and each of those will come.

But far more important is the strategic error of asking for something that isn’t the fullness of what we want, and the risk that the cover-up club will use it to avoid the real goal by talking about how much progress we’ve made sharing models.

You almost never get anything you don’t ask for. If we have a list of requests, the top of the list is data, data and data.

Further, I declare that this is a realistic request, and attach precisely the level of proof that the good Mr. Mortman did when asserting that “it just isn’t realistic.”

Not that I’m opposed to model sharing. We just need to recognize it for the poor substitute that it is, and keep our eyes on the real goal.

Speaking of where your eyes are, that’s Claire, she’s represented by Specs Model Management. And as the title says, quite distracting.


So awhile back, I posted the following to twitter:

Thought of the Day: We don’t need to share raw data if we can share meta-data generated using uniform analytical methodologies.

Adam, disagreed:

@mortman You can’t test & refine models without raw data, & you can’t ask people with the same orientation to bring diverse perspectives.

We went back and forth a bit until it became clear that this needed an actual blog post, so here it is:

I don’t disagree with Adam that we need raw data. He’s absolutely right that without it, you can’t test models. What I was trying to get at was that, even though I would absolutely love to have access to more raw data to test my own theories, it just isn’t realistic to expect that sort of access in the legal and business environment we have today. So until things change, we have to figure out another way to get at the data.

One thing that has become increasingly popular is for vendors to publish aggregate data about what they’ve seen with their customers or on their networks. Verizon and WhiteHat have used this model to great effect. Not only has it generated a lot of press for them, but we as an industry have learned a lot from these reports.

What would be even better is if people would share the models they are using when generating their data. This way, other organizations could use the models and as reports were published, the rest of us could actually compare apples to apples. This would also allow us to more quickly identify issues/errors in the models, allow for public discussion of necessary tweaks and then test said changes while limiting liability for the data owners.

This is really where I was going with my initial thought above; that we need common models so we can have an intelligent discussion. This is also how things generally work in the sciences (yes, Alex, I know, we’re not a science yet :). Researchers almost never publish their raw data, but just their models, methods and results. I feel strongly that until we can convince people to share raw data more openly, this is our best shot to figuring real information about what’s going on in the security world. It’s also what drove me to start developing the soon to be renamed Mortman/Hutton Model that Alex and I presented at Blackhat and BSides Las Vegas.

More data, even if it’s aggregate, is better then no data.

Notes to the Data People

Over on his Guerilla CISO blog, Rybolov suggests that we ask the folks for infosec data using their Suggest a data set page. It sounds like a good idea to me! I took his request and built on it. Rather than breaking the flow with quotes and edit marks, I’ll simply say the requests are mostly his work, the context mine.

I’d love feedback before I submit this next week.


Thank you for the opportunity to suggest data which would be in the public interest to release or make more available. I applaud your mission of improving and updating your site with a wide variety of data.

Today, the public is sorely lacking in data about information security outcomes, that is, what really goes wrong, and how often. The government gathers a great deal of data on federal enterprise security under FISMA and many regulations. It gathers a good deal of information about consumer issues at the FBI and FTC. It may seem that this sort of information falls under the sensitive data exemption which you call out on your suggestion page. I believe that’s not the case, and so before summarily rejecting this request, I ask that you consider the following.

First, it is understood that enterprise security is a challenge, and security failures are widespread. These range from tax records unsecured to sensitive plans showing up on peer to peer filesharing networks are widespread. A great many government security failures are documented in the, a project of the Open Security Foundation. It is disgraceful that hobbyists must comb through news reports to make this data available in a better way than

Second, security failings are consistent and not improving. Failings are documented as far back as a GAO report, “Computer Related Crimes in Federal Programs” submitted to Congress in April, 1976. Stripped of jargon and brands, and updated to reflect re-organizations of government departments, the issues and recommendations could be issued today and few people would notice. could make available data about what goes wrong that would allow researchers and scientists to assess their advice. The importance of cyber-security has caused President Obama to dedicate a speech to it, order a 60 day special review, etc. The general availability of this data would support and enhance the President’s goals in securing cyber-space.

Third, some small subset of the data may represent on-going issues which are not yet remediated, rather than past issues which have been addressed. These are clearly sensitive, and drawing attention to them would have negative operational consequences. At the same time, there is a public interest in oversight and accountability, and I urge you to consider partial, redacted, or summarized data releases as you balance that sensitivity. For example, information on how many issues each department has open, how long they have been open, and how severe they are is unlikely to change the daily flood of attacks focusing on the Federal information infrastructure.

Therefore, most of the data I am requesting is not sensitive, and its rapid release serves an important public interest.

I am requesting:

Complete responses from the Departments and Agencies to the FISMA reporting requirements for FY2004-2009 based on OMB Memoranda 04-25, 05-15, 06-20, 07-19, 08-21, and 09-29.

Raw incident data for years 2005-2007 as reported to OMB and summarized in their report to Congress on FY2007 FISMA performance and published at

Raw incident data for years 2007 and later in any type and format which would allow a researcher to compile data similar to the Verizon Data Breach Incident Report available at

Data collected by the FTC and/or FBI on identity theft, broken down by type and duration, making clear the differences between credit card and other short term thefts and SSN, drivers license, or other longer term impersonation.

This information is necessary for researchers to study the effectiveness of information security management techniques and regulatory schemes and for industry to propose changes to national-level information security management frameworks and legislation such as FISMA. This information for the most part has been released in a summary format to Congress and the release of the complete dataset on would greatly aid the information security community.