Learning from Near Misses

[Update: Steve Bellovin has a blog post]

One of the major pillars of science is the collection of data to disprove arguments. That data gathering can include experiments, observations, and, in engineering, investigations into failures. One of the issues that makes security hard is that we have little data about large scale systems. (I believe that this is more important than our clever adversaries.) The work I want to share with you today has two main antecedents.

First, in the nearly ten years since Andrew Stewart and I wrote The New School of Information Security, and called for more learning from breaches, we’ve seen a dramatic shift in how people talk about breaches. Unfortunately, we’re still not learning as much as we could. There are structural reasons for that, primarily fear of lawsuits.

Second, last year marked 25 years of calls for an “NTSB for infosec.” Steve Bellovin and I wrote a short note asking why that was. We’ve spent the last year asking what else we might do. We’ve learned a lot about other Aviation Safety Programs, and think there are other models that may be better fits for our needs and constraints in the security realm.

Much that investigation has been a collaboration with Blake Reid, Jonathan Bair, and Andrew Manley of the University of Colorado Law School, and together we have a new draft paper on SSRN, “Voluntary Reporting of Cybersecurity Incidents.”

A good deal of my own motivation in this work is to engineer a way to learn more. The focus of this work, on incidents rather than breaches, and on voluntary reporting and incentives, reflects lessons learned as we try to find ways to measure real world security. The writing and abstract reflect the goal of influencing those outside security to help us learn better:

The proliferation of connected devices and technology provides consumers immeasurable amounts of convenience, but also creates great vulnerability. In recent years, we have seen explosive growth in the number of damaging cyber-attacks. 2017 alone has seen the Wanna Cry, Petya, Not Petya, Bad Rabbit, and of course the historic Equifax breach, among many others. Currently, there is no mechanism in place to facilitate understanding of these threats, or their commonalities. While information regarding the causes of major breaches may become public after the fact, what is lacking is an aggregated data set, which could be analyzed for research purposes. This research could then provide clues as to trends in both attacks and avoidable mistakes made on the part of operators, among other valuable data.

One possible regime for gathering such information would be to require disclosure of events, as well as investigations into these events. Mandatory reporting and investigations would result better data collection. This regime would also cause firms to internalize, at least to some extent, the externalities of security. However, mandatory reporting faces challenges that would make this regime difficult to implement, and possibly more costly than beneficial. An alternative is a voluntary reporting scheme, modeled on the Aviation Safety Reporting System housed within NASA, and possibly combined with an incentive scheme. Under it, organizations that were the victims of hacks or “near misses” would report the incident, providing important details, to some neutral party. This database could then be used both by researchers and by industry as a whole. People could learn what does work, what does not work, and where the weak spots are.

Please, take a look at the paper. I’m eager to hear your feedback.

“Comparing the Usability of Cryptographic APIs”

Obstacles Frame

 

(The abstract:) Potentially dangerous cryptography errors are well documented in many applications. Conventional wisdom suggests that many of these errors are caused by cryptographic Application Programming Interfaces (APIs) that are too complicated, have insecure defaults, or are poorly documented. To address this problem, researchers have created several cryptographic libraries that they claim are more usable; however, none of these libraries have been empirically evaluated for their ability to promote more secure development. This paper is the first to examine both how and why the design and resulting usability of different cryptographic libraries affects the security of code written with them, with the goal of understanding how to build effective future libraries. We conducted a controlled experiment in which 256 Python developers recruited from GitHub attempt common tasks involving symmetric and asymmetric cryptography using one of five different APIs.
We examine their resulting code for functional correctness and security, and compare their results to their self-reported sentiment about their assigned library. Our results suggest that while APIs designed for simplicity can provide security
benefits—reducing the decision space, as expected, prevents choice of insecure parameters—simplicity is not enough. Poor
documentation, missing code examples, and a lack of auxiliary features such as secure key storage, caused even participants
assigned to simplified libraries to struggle with both basic functional correctness and security. Surprisingly, the
availability of comprehensive documentation and easy-to use code examples seems to compensate for more complicated APIs in terms of functionally correct results and participant reactions; however, this did not extend to security results. We find it particularly concerning that for about 20% of functionally correct tasks, across libraries, participants believed their code was secure when it was not. Our results suggest that while new cryptographic libraries that want to promote effective security should offer a simple, convenient interface, this is not enough: they should also, and perhaps more importantly, ensure support for a broad range of common tasks and provide accessible documentation with secure, easy-to-use code examples.

It’s interesting that even when developers took care to consider usability of their APIs, usability testing revealed serious issues. But it’s not surprising. The one constant of usability testing is that people surprise you.

The paper is: “Comparing the Usability of Cryptographic APIs,” Yasemin Acar (CISPA, Saarland University), Michael Backes (CISPA, Saarland University & MPI-SWS), Sascha Fahl (CISPA, Saarland University), Simson Garfinkel (National Institute of Standards and Technology), Doowon Kim (University of Maryland), Michelle Mazurek (University of Maryland), Christian Stransky (CISPA, Saarland University), The Increasingly-misnamed Oakland Conference, 2017.

Adam & Chris Wysopal webcast

(Today) Wednesday, May 24th, 2017 at 1:00 PM EDT (17:00:00 UTC), Chris Wysopal and I are doing a SANS webcast, “Choosing the Right Path to Application Security.” I’m looking forward to it, and hope you can join us!

Update: the webcast is now archived, and the white paper associated with it, “Using Cloud Deployment to Jump-Start Application Security,” is in the SANS reading room.

“…the Elusive Goal of Security as a Scientific Pursuit”

That’s the subtitle of a new paper by Cormac Herley and Paul van Oorschot, “SoK: Science, Security, and the Elusive Goal of Security as a Scientific Pursuit,” forthcoming in IEEE Security & Privacy.

The past ten years has seen increasing calls to make security research more “scientific”. On the surface, most agree that this is desirable, given universal recognition of “science” as a positive force. However, we find that there is little clarity on what “scientific” means in the context of computer security research, or consensus on what a “Science of Security” should look like. We selectively review work in the history and philosophy of science and more recent work under the label “Science of Security”. We explore what has been done under the theme of relating science and security, put this in context with historical science, and offer observations and insights we hope may motivate further exploration and guidance. Among our findings are that practices on which the rest of science has reached consensus appear little used or recognized in security, and a pattern of methodological errors continues unaddressed.

Cyber Grand Shellphish

There’s a very interesting paper on the Cyber Grand Challenge by team Shellphish. Lots of details about the grand challenge itself, how they designed their software, how they approached the scoring algorithm, and what happened in the room.

There’s lots of good details, but perhaps my favorite is:

How would a team that did *nothing* do? That is, if a team connected and then ceased to play, would they fare better or worse than the other players? We ran a similar analysis to the “Never patch” strategy previously (i.e., we counted a CS as exploited for all rounds after its first exploitation against any teams), but this time removed any POV-provided points. In the CFE, this “Team NOP” would have scored 255,678 points, barely *beating* Shellphish and placing 3rd in the CGC.

The reason I like this is that scoring systems are hard. Really, really hard. I know that DARPA spent substantial time and energy on the scoring system, and this outcome happened anyway. We should not judge either DARPA or the contest on that basis, because it was hard to see that that would happen ahead of time: it’s a coincidence of the scores teams actually achieved.

Calls for an NTSB?

In September, Steve Bellovin and I asked “Why Don’t We Have an Incident Repository?.”

I’m continuing to do research on the topic, and I’m interested in putting together a list of such things. I’d like to ask you for two favors.

First, if you remember such things, can you tell me about it? I recall “Computers at Risk,” the National Cyber Leap Year report, and the Bellovin & Neumann editorial in IEEE S&P. Oh, and “The New School of Information Security.” But I’m sure there have been others.

In particular, what I’m looking for are calls like this one in Computers at Risk (National Academies Press, 1991):

3a. Build a repository of incident data. The committee recommends that a repository of incident information be established for use in research, to increase public awareness of successful penetrations and existing vulnerabilities, and to assist security practitioners, who often have difficulty persuading managers to invest in security. This database should categorize, report, and track pertinent instances of system security-related threats, risks, and failures. […] One possible model for data collection is the incident reporting system administered by the National Transportation Safety Board… (chapter 3)

Second, I am trying to do searches such as “cites “Computers at Risk” and contains ‘NTSB’.” I have tried without luck to do this on Google Scholar, Microsoft Academic and Semantic Scholar. Only Google seems to be reliably identifying that report. Is there a good way to perform such a search?

Do Games Teach Security?

There’s a new paper from Mark Thompson and Hassan Takabi of the University of North Texas. The title captures the question:
Effectiveness Of Using Card Games To Teach Threat Modeling For Secure Web Application Developments

Gamification of classroom assignments and online tools has grown significantly in recent years. There have been a number of card games designed for teaching various cybersecurity concepts. However, effectiveness of these card games is unknown for the most part and there is no study on evaluating their effectiveness. In this paper, we evaluate effectiveness of one such game, namely the OWASP Cornucopia card game which is designed to assist software development teams identify security requirements in Agile, conventional and formal development
processes. We performed an experiment where sections of graduate students and undergraduate students in a security related course at our university were split into two groups, one of which played the Cornucopia card game, and one of which did not. Quizzes were administered both before and after the activity, and a survey was taken to measure student attitudes toward the exercise. The results show that while students found the activity useful and would like to see this activity and more similar exercises integrated into the classroom, the game was not easy to understand. We need to spend enough time to familiarize the students with the game and prepare them for the exercises using the game to get the best results.

I’m very glad to see games like Cornucopia evaluated. If we’re going to push the use of Cornucopia (or Elevation of Privilege) for teaching, then we ought to be thinking about how well they work in comparison to other techniques. We have anecdotes, but to improve, we must test and measure.

Usable Security: History, Themes, and Challenges (Book Review)

Simson Garfinkel and Heather Lipford’s Usable Security: History, Themes, and Challenges should be on the shelf of anyone who is developing software that asks people to make decisions about computer security.

We have to ask people to make decisions because they have information that the computer doesn’t. My favorite example is the Windows “new network” dialog, which asks what sort of network you’re connecting to..work, home or coffee shop. The information is used to configure the firewall. My least favorite example is phishing, where people are asked to make decisions about technical minutiae before authenticating. Regardless, we are not going to entirely remove the need for people to make decisions about computer security. So we can either learn to gain their participation in more effective ways, or we can accept a very high failure rate. The former option is better, and this book is a substantial contribution.

It’s common for designers to throw up their hands at these challenges, saying things like “given a choice between security and dancing babies, people will choose dancing babies every time,” or “you can’t patch human stupidity.” However, in a recently published study by Google and UCSD, they found that the best sites only fooled 45% of the people who clicked through, while overall only 13% did. (There’s a good summary of that study available.) Claiming that “people will choose dancing babies 13% of the time” just doesn’t seem like a compelling argument against trying.

This slim book is a review of the academic work that’s been published, almost entirely in the last 20 years, on how people interact with information security systems. It summarizes and contextualizes the many things we’ve learned, mistakes that have been made, and does so in a readable and concise way. The book has six chapters:

  • Intro
  • A brief history
  • Major Themes in UPS Academic Research
  • Lessons Learned
  • Research Challenges
  • Conclusion/The Next Ten Years

The “Major themes” chapter is 61 or so pages, which is over half of the 108 pages of content. (The book also has 40 pages of bibliography). Major themes include authentication, email security and PKI, anti-phishing, storage, device pairing, web privacy, policy specification, mobile, social media and security administration.

The “Lessons Learned” chapter is quite solid, covering “reduce decisions,” “safe and secure defaults,” “provide users with better information, not more information,” “users require clear context to make good decisions,” “information presentation is critical” and “education works but has limits.” I have a quibble, which is Sasse’s concept of mental ‘compliance budgets’ is also important, and I wish it were given greater prominence. (My other quibble is more of a pet peeve: the term “user” where “people” would serve. Isn’t it nicer to say “people require clear context to make good decisions”?) Neither quibble should take away from my key message, which is that this is an important new book.

The slim nature of the book is, I believe, an excellent usability property. The authors present what’s been done, lessons that they feel can be taken away, and move to the next topic. This lets you the reader design, build or deploy systems which help the person behind the keyboard make the decisions they want to make. To re-iterate, anyone building software that asks people to make decisions should read the lessons contained within.

Disclaimer: I was paid to review a draft of this book, and my name is mentioned kindly in the acknowledgements. I am not being paid to write or post reviews.

[Updated to correct the sentence about the last 20 years.]

Indicators of Impact — Ground Truth for Breach Impact Estimation

Ice bag might be a good Indicator of Impact for a night of excess.
Ice bag might be a good ‘Indicator of Impact’ for a night of excess.

One big problem with existing methods for estimating breach impact is the lack of credibility and reliability of the evidence behind the numbers. This is especially true if the breach is recent or if most of the information is not available publicly.  What if we had solid evidence to use in breach impact estimation?  This leads to the idea of “Indicators of Impact” to provide ‘ground truth’ for the estimation process.

It is premised on the view that breach impact is best measured by the costs or resources associated with response, recovery, and restoration actions taken by the affected stakeholders.  These activities can included both routine incident response and also more rare activities.  (See our paper for more.)  This leads to to ‘Indicators of Impact’, which are  evidence of the existence or non-existence of these activities. Here’s a definition (p 23 of our paper):

An ‘Indicator of Impact’ is an observable event, behavior, action, state change, or communication that signifies that the breached or affected organizations are attempting to respond, recover, restore, rebuild, or reposition because they believe they have been harmed. For our purposes, Indicators of Impact are evidence that can be used to estimate branching activity models of breach impact, either the structure of the model or key parameters associated with specific activity types. In principle, every Indicator of Impact is observable by someone, though maybe not outside the breached organization.

Of course, there is a close parallel to the now-widely-accepted idea of “Indicators of Compromise“, which are basically technical traces associated with a breach event.  There’s a community supporting an open exchange format — OpenIoC.  The big difference is that Indicators of Compromise are technical and are used almost exclusively in tactical information security.  In contrast, Indicators of Impact are business-oriented, even if they involve InfoSec activities, and are used primarily for management decisions.

From the Appendix B, here are a few examples:

  • Was there a forensic investigation, above and beyond what your organization would normally do?
  • Was this incident escalated to the executive level (VP or above), requiring them to make resource decisions or to spend money?
  • Was any significant business process or function disrupted for a significant amount of time?
  • Due to the breach, did the breached organization fail to meet any contractual obligations with it customers, suppliers, or partners? If so, were contractual penalties imposed?
  • Were top executives or the Board significantly diverted by the breach and aftermath, such that other important matters did not receive sufficient attention?

The list goes on for three pages in Appendix B but we fully expect it to grow much longer as we get experience and other people start participating.  For example, there will be indicators that only apply to certain industries or organization types.  In my opinion, there is no reason to have a canonical list or a highly structured taxonomy.

As signals, the Indicators of Impact are not perfect, nor do they individually provide sufficient evidence.  However, they have the very great benefit of being empirical, subject to documentation and validation, and potentially observable in many instances, even outside of InfoSec breach events.  In other words, they provide a ‘ground truth’ which has been sorely lacking in breach impact estimation. When assembled as a mass of evidence and using appropriate inference and reasoning methods (e.g. see this great book), Indicators of Impact could provide the foundation for robust breach impact estimation.

There are also applications beyond breach impact estimation.  For example, they could be used in resilience planning and preparation.  They could also be used as part of information sharing in critical infrastructure to provide context for other information regarding threats, attacks, etc. (See this video of a Shmoocon session for a great panel discussion regarding challenges and opportunities regarding information sharing.)

Fairly soon, it would be good to define a light-weight standard format for Indicators of Impact, possibly as an extension to VERIS.  I also think that Indicators of Impact could be a good addition to the up-coming NIST Cybersecurity Framework.  There’s a public meeting April 3rd, and I might fly out for it.  But I will submit to the NIST RFI.

Your thoughts and comments?