The DREAD Pirates

Then he explained the name was important for inspiring the necessary fear. You see, no one would surrender to the Dread Pirate Westley.

The DREAD approach was created early in the security pushes at Microsoft as a way to prioritize issues. It’s not a very good way, you see no one would surrender to the Bug Bar Pirate, Roberts. And so the approach keeps going, despite its many problems.

There are many properties one might want in a bug ranking system for internally found bugs. They include:

  • A cool name
  • A useful mnemonic
  • A reduction in argument about bugs
  • Consistency between raters
  • Alignment with intuition
  • Immutability of ratings: the bug is rated once, and then is unlikely to change
  • Alignment with post-launch/integration/ship rules

DREAD certainly meets the first of these, and perhaps the second two. And it was an early attempt at a multi-factor rating of bugs. But there are many problems which DREAD brings that newer approaches deal with.

The most problematic aspect of DREAD is that there’s little consistency, especially in the middle. What counts as a 6 damage versus 7, or 6 versus 7 exploitability? Without calibration, different raters will not be consistent. Each of the scores can be mis-estimated, and there’s a tendency to underestimate things like discoverability of bugs in your own product.

The second problem is that you set an arbitrary bar for fixes, for example, everything above a 6.5 gets fixed. That makes the distinction between a 6 and a 7 sometimes matters a lot. The score does not relate to what needs to get fixed when found externally.

This illustrates why Discoverability is an odd things to bring into the risk equation. You may have a discoverability of “1” on Monday, and 10 on Tuesday. (“Thanks, Full-Disclosure!”) So something could have a 5.5 DREAD score because of low discoverability but require a critical update. Suddenly the DREAD score of the issue is mutable. So it’s hard to use DREAD on an externally discovered bug, or one delivered via a bug bounty. So now you have two bug-ranking systems, and what do you do when they disagree? This happened to Microsoft repeatedly, and led to the switch to a bug bar approach.

Affected users is also odd: does an RCE in Flight Simulator matter less than one in Word? Perhaps in the grand scheme of things, but I hope the Flight Simulator team is fixing their RCEs.

Stepping beyond the problems internal to DREAD to DREAD within a software organization, it only measures half of what you need to measure. You need to measure both the security severity and the fix cost. Otherwise, you run the risk of finding something with a DREAD of 10, but it’s a key feature (Office Macros), and so it escalates, and you don’t fix it. There are other issues which are easy to fix (S3 bucket permissions), and so it doesn’t matter if you thought discoverability was low. This is shared by other systems, but the focus on a crisp line in DREAD, everything above a 6.5 gets fixed, exacerbates the issue.

For all these reasons, with regards to DREAD? Fully skeptical, and I have been for over a decade. If you want to fix these things, the thing to do is not create confusion by saying “DREAD can also be a 1-3 system!”, but to version and revise DREAD, for example, by releasing DREAD 2. I’m exploring a similar approach with DFDs.

I’m hopeful that this post can serve as a collection of reasons to not use DREAD v1, or properties that a new system should have. What’d I miss?

NTSB on Uber (Preliminary)

The NTSB has released “Preliminary Report Highway HWY18MH010,” on the Uber self-driving car which struck and killed a woman. I haven’t had a chance to read the report carefully.

Brad Templeton has excellent analysis of the report at “NTSB Report implies serious fault for Uber in fatality” (and Brad’s writings overall on the subject have been phenomenal.)

A few important things to note, cribbed from Brad.

  • The driver was not looking at her phone, but a screen with diagnostic information from the self-driving systems.
  • The car detected a need to brake with approximately enough time to stop had it automatically applied the brakes.
  • That system was turned off for a variety of reasons that look bad (in hindsight, and probably could have been critiqued at the time).

My only comment right now is wouldn’t it be nice to have this level of fact finding in the world of cyber?

Also, it’s very clear that the vehicle was carefully preserved. Can anyone say how the NTSB and/or Uber preserved the data center, cloud or other remote parts of the computer systems involved, including the algorithms that were deployed that day (versus reconstructing them later)?

Threat Model Thursday: Google on Kubernetes

There’s a recent post on the Google Cloud Platform Blog, “Exploring container security: Isolation at different layers of the Kubernetes stack” that’s the subject of our next Threat Modeling Thursday post. As always, our goal is to look and see what we can learn, not to say ‘this is bad.’ There’s more than one way to do it. Also, last time, I did a who/what/why/how analysis, which turned out to be pretty time consuming, and so I’m going to avoid that going forward.

The first thing to point out is that there’s a system model that intended to support multiple analyses of ‘what can go wrong.’ (“Sample scenario…Time to do a little threat modeling.”) This is a very cool demonstration of how to communicate about security along a supply chain. In this instance, the answers to “what are we working on” vary with who “we” are. That might be the Kubernetes team, or it might be someone using Kubernetes to implement a Multi-tenant SaaS workload.

What are we working on?

The answers to this are either Kubernetes or the mutli-tenant system. The post includes a nice diagram (reproduced above) of Kubernetes and its boundaries. Speaking of boundaries, they break out security boundaries which enforce the trust boundaries. I’ve also heard ‘security boundaries’ referred to as ‘security barriers’ or ‘boundary enforcement.’ They also say “At Google, we aim to protect all trust boundaries with at least two different security boundaries that each need to fail in order to cross a trust boundary.”

But you can use this diagram to help you either improve Kubernetes, or to improve the security of systems hosted in Kubernetes.

What can go wrong?

Isolation failures. You get a resource isolation failure…you get a network isolation failure, everyone gets an isolation failure! Well, no, not really. You only get an isolation fail if your security boundaries fail. (Sorry! Sorry?)

Oprah announcing everyone gets an isolation failure

This use of isolation is interestingly different from STRIDE or ATT&CK. In many threat models of userland code on a desktop, the answer is ‘and then they can run code, and do all sorts of things.’ The isolation failure was the end of a chain, rather than the start, and you focus on the spoof, tamper or EoP/RCE that gets you onto the chain. In that sense, isolation may seem frustratingly vague for many readers. But isolation is a useful property to have, and more importantly, it’s what we’re asking Kubernetes to provide.

There’s also mention of cryptomining (and cgroups as a fix) and running untrusted code (use sandboxing). Especially with regards to untrusted code, I’d like to see more discussion of how to run untrusted code, which may or may not be inside a web trust boundary, semi-safely, which is to say either attempting to control its output, or safely, which is to say, in a separate web namespace.

What are we going to do about it?

You can use this model to decide what you’re going to do about it. How far up or down the list of isolations should you be? Does your app need its own container, pod, node, cluster?

I would like to see more precision in the wording of the controls — what does ‘some’ control-pane isolation mean? Is it a level of effort to overcome, a set of things you can rely on and some you can’t? The crisp expression of these qualities isn’t easy, but the authors are in a better position to express them than their readers. (There may be more on this elsewhere in the series.)

Did we do a good job?

There’s no explicit discussion, but my guess is that the post was vetted by a great many people.

To sum up, this is a great example of using threat modeling to communicate between supplier and customer. By drawing the model and using it to threat model, they help people decide if GCP is right, and if so, how to configure it in the most secure way.

What do you see in the models?

Joining the Continuum Team

I’m pleased to share the news that I’ve joined Continuum Security‘s advisory board. I am excited about the vision that Continuum is bringing to software security: “We help you design, build and manage the security of your software solutions.” They’re doing so for both happy customers and a growing community. And I’ve come to love their framing: “Security is not special. Performance, quality and availability is everyone’s responsibility and so is security. After all, who understands the code and environment better than the developers and ops teams themselves?” They’re right. Security has to earn our seat at the table. We have to learn to collaborate better, and that requires software that helps the enterprise manage application security risk from the start of, and throughout, the software development process.

Just Culture and Information Security

Yesterday Twitter revealed they had accidentally stored plain-text passwords in some log files. There was no indication the data was accessed and users were warned to update their passwords. There was no known breach, but Twitter went public anyway, and was excoriated in the press and… on Twitter.

This is a problem for our profession and industry. We get locked into a cycle where any public disclosure of a breach or security mistake results in…

Well, you can imagine what it results in, or you can go read “The Security Profession Needs to Adopt Just Culture” by Rich Mogull. It’s a very important article, and you should read it, and the links, and take the time to consider what it means. In that spirit, I want to reflect on something I said the other night. I was being intentionally provocative, and perhaps crossed the line away from being just. What I said was a password management company had one job, and if they expose your passwords, you should not use their password management software.

Someone else in the room, coming from a background where they have blameless post-mortems, challenged my use of the phrase ‘you had one job,’ and praised the company for coming forward. And I’ve been thinking about that, and my take is, the design where all the passwords are at a single site is substantially and predictably worse than a design where the passwords are distributed in local clients and local data storage. (There are tradeoffs. With a single site, you may be able to monitor for and respond to unusual access patterns rapidly, and you can upgrade all the software at once. There is an availability benefit. My assessment is that the single-store design is not worth it, because of the catastrophic failure modes.)

It was a fair criticism. I’ve previously said “we live in an ‘outrage world’ where it’s easier to point fingers and giggle in 140 characters and hurt people’s lives or careers than it is to make a positive contribution.” Did I fall into that trap myself? Possibly.

In “Just Culture: A Foundation for Balanced Accountability and Patient Safety,” which Rich links, there’s a table in Figure 2, headed “Choose the column that best describes the caregiver’s action.” In reading that table, I believe that a password manager with central storage falls into the reckless category, although perhaps it’s merely risky. In either case, the system leaders are supposed to share in accountability.

Could I have been more nuanced? Certainly. Would it have carried the same impact? No. Justified? I’d love to hear your thoughts!