Threat Model Thursday: Architectural Review and Threat Modeling

For Threat Model Thursday, I want to use current events here in Seattle as a prism through which we can look at technology architecture review. If you want to take this as an excuse to civilly discuss the political side of this, please feel free.

Seattle has a housing and homelessness crisis. The cost of a house has risen nearly 25% above the 2007 market peak, and has roughly doubled in the 6 years since April 2012. Fundamentally, demand has outstripped supply and continues to do so. As a city, we need more supply, and that means evaluating the value of things that constrain supply. This commentary from the local Libertarian party lists some of them.

The rules on what permits are needed to build a residence, what housing is acceptable, or how many unrelated people can live together (no more than eight) are expressions of values and priorities. We prefer that the developers of housing not build housing rather than build housing that doesn’t comply with the city’s Office of Planning and Community Development 32 pages of neighborhood design guidelines. We prefer to bring developers back after a building is built if the siding is not the agreed color. This is a choice that expresses the values of the city. And because I’m not a housing policy expert, I can miss some of the nuances and see the effect of the policies overall.

Let’s transition from the housing crisis here in Seattle to the architecture crisis that we face in technology.

No, actually, I’m not quite there. The city killed micro-apartments, only to replace them with … artisanal micro-houses. Note the variation in size and shape of the two houses in the foreground. Now, I know very little about construction, but I’m reasonably confident that if you read the previous piece on micro-housing, many of the concerns regulators were trying to address apply to “True Hope Village,” construction pictured above. I want you, dear reader, to read the questions about how we deliver housing in Seattle, and treat them as a mirror into how your organization delivers software. Really, please, go read “How Seattle Killed Micro-Housing” and the “Neighborhood Design Guidelines” carefully. Not because you plan to build a house, but as a mirror of your own security design guidelines.

They may be no prettier.

In some companies, security is valued, but has no authority to force decisions. In others, there are mandatory policies and review boards. We in security have fought for these mandatory policies because without them, products ignored security. And similarly, we have housing rules because of unsafe, unsanitary or overcrowded housing. To reduce the blight of slums.

Security has design review boards which want to talk about the color of the siding a developer installed on the now live product. We have design regulation which kills apodments and tenement housing, and then glorifies tiny houses. From a distance, these rules make no sense. I didn’t find it sensible, myself. I remember a meeting with the Microsoft Crypto board. I went in with some very specific questions regarding parameters and algorithms. Should we use this hash algorithm or that one? The meeting took not five whole minutes to go off the rails with suggestions about non-cryptographic architecture. I remember shipping the SDL Threat Modeling Tool, going through the roughly five policy tracking tools we had at the time, discovering at the very last minute that we had extra rules that were not documented in the documents that I found at the start. It drives a product manager nuts!

Worse, rules expand. From the executive suite, if a group isn’t growing, maybe it can shrink? From a security perspective, the rapidly changing threat landscape justifies new rules. So there’s motivation to ship new guidelines that, in passing, spend a page explaining all the changes that are taking place. And then I see “Incorporate or acknowledge the best features of existing early to mid-century buildings in new development.” What does that mean? What are the best features of those buildings? How do I acknowledge them? I just want to ship my peer to peer blockchain features! And nothing in the design review guidelines is clearly objectionable. But taken as a whole, they create a complex and unpredictable, and thus expensive path to delivery.

We express values explicitly and implicitly. In Seattle, implicit expression of values has hobbled the market’s ability to address a basic human need. One of the reasons that embedding is effective is that the embedded gatekeepers can advise, interpret in relation to real questions. Embedding expresses the value of collaboration, of dialogue over review. Does your security team express that security is more important than product delivery? Perhaps it is. When Microsoft stood down product shipping for security pushes, it was an explicit statement. Making your values explicit and debating prioritization is important.

What side effects do your security rules have? What rule is most expensive to comply with? What initiatives have you killed, accidentally or intentionally?

Threat Model Thursday: Chromium Post-Spectre

Today’s Threat Model Thursday is a look at “Post-Spectre Threat Model Re-Think,” from a dozen or so folks at Google. As always, I’m looking at this from a perspective of what can we learn and to encourage dialogue around what makes for a good threat model.

What are we working on?

From the title, I’d assume Chromium, but there’s a fascinating comment in the introduction that this is wider: “any software that both (a) runs (native or interpreted) code from more than one source; and (b) attempts to create a security boundary inside a single address space, is potentially affected.” This is important, and in fact, why I decided to hightlight the model. The intro also states, “we needed to re-think our threat model and defenses for Chrome renderer processes.” In the problem statement, they mention that there are other, out of scope variants such as “a renderer reading the browser’s memory.”

It would be helpful to me, and probably others, to diagram this, both for the Chrome case (the relationship between browser and renderer) and the broader case of that other software, because the modern web browser is a complex beast. As James Mickens says:

A modern Web page is a catastrophe. It’s like a scene from one of those apocalyptic medieval paintings that depicts what would happen if Galactus arrived: people are tumbling into fiery crevasses and lamenting various lamentable things and hanging from playground equipment that would not pass OSHA safety checks. This kind of stuff is exactly what you’ll see if you look at the HTML, CSS, and JavaScript in a modern Web page. Of course, no human can truly “look” at this content, because a Web page is now like V’Ger from the first “Star Trek” movie, a piece of technology that we once understood but can no longer fathom, a thrashing leviathan of code and markup written by people so untrustworthy that they’re not even third parties, they’re fifth parties
who weren’t even INVITED to the party…

What can go wrong

There is a detailed set of ways that confidentiality breaks current boundaries. Most surprising to me is the claim that clock jitter is not as useful as we’d expect, and even enumerating all the clocks is tricky! (Webkit seems to have a different perspective, that reducing timer precision is meaningful.)

There is also an issue of when to pass autofilled data to a renderer, and a goal of “Ensure User Intent When Sending Data To A Renderer.” This is good, but usability may depend on normal people understanding that their renderer and browser are different. That’s mitigated by taking user gestures as evidence of intent. That seems like a decent balancing of usability and security, but as I watch people using devices, I see a lot of gesturing to explore and discover the rapidly changing meanings of gestures, both within applications and across different applications and passwords.

What are we going to do about it?

As a non-expert in browser design, I’m not going to attempt to restate the mitigations. Each of the defensive approaches is presented with clear discussion of its limitations and the current intent. This is both great to see, and hard to follow for those not deep in browser design. That form of writing is probably appropriate, because otherwise the meaning gets lost in verbosity that’s not useful to the people most impacted. I would like to see more out-linking as an aide to those trying to follow along.

Did we do a good job?

I’m very glad to see Google sharing this because we can see inside the planning of the architects, the known limits, and the demands on the supply chain (changes to compilers to reduce gadgets, changes to platforms to increase inter-process isolation), and in the end, “we now assume any active code can read any data in the same address space. The plan going forward must be to keep sensitive cross-origin data out of address spaces that run untrustworthy code.” Again, that’s more than just browsers. If your defensive approaches, mitigations or similar sections are this clear, you’re doing a good job.

‘EFAIL’ Is Why We Can’t Have Golden Keys

I have a new essay at Dark Reading, “‘EFAIL’ Is Why We Can’t Have Golden Keys.” It starts:

There’s a newly announced set of issues labeled the “EFAIL encryption flaw” that reduces the security of PGP and S/MIME emails. Some of the issues are about HTML email parsing, others are about the use of CBC encryption. All show how hard it is to engineer secure systems, especially when those systems are composed of many components that had disparate design goals.

The DREAD Pirates

Then he explained the name was important for inspiring the necessary fear. You see, no one would surrender to the Dread Pirate Westley.

The DREAD approach was created early in the security pushes at Microsoft as a way to prioritize issues. It’s not a very good way, you see no one would surrender to the Bug Bar Pirate, Roberts. And so the approach keeps going, despite its many problems.

There are many properties one might want in a bug ranking system for internally found bugs. They include:

  • A cool name
  • A useful mnemonic
  • A reduction in argument about bugs
  • Consistency between raters
  • Alignment with intuition
  • Immutability of ratings: the bug is rated once, and then is unlikely to change
  • Alignment with post-launch/integration/ship rules

DREAD certainly meets the first of these, and perhaps the second two. And it was an early attempt at a multi-factor rating of bugs. But there are many problems which DREAD brings that newer approaches deal with.

The most problematic aspect of DREAD is that there’s little consistency, especially in the middle. What counts as a 6 damage versus 7, or 6 versus 7 exploitability? Without calibration, different raters will not be consistent. Each of the scores can be mis-estimated, and there’s a tendency to underestimate things like discoverability of bugs in your own product.

The second problem is that you set an arbitrary bar for fixes, for example, everything above a 6.5 gets fixed. That makes the distinction between a 6 and a 7 sometimes matters a lot. The score does not relate to what needs to get fixed when found externally.

This illustrates why Discoverability is an odd things to bring into the risk equation. You may have a discoverability of “1” on Monday, and 10 on Tuesday. (“Thanks, Full-Disclosure!”) So something could have a 5.5 DREAD score because of low discoverability but require a critical update. Suddenly the DREAD score of the issue is mutable. So it’s hard to use DREAD on an externally discovered bug, or one delivered via a bug bounty. So now you have two bug-ranking systems, and what do you do when they disagree? This happened to Microsoft repeatedly, and led to the switch to a bug bar approach.

Affected users is also odd: does an RCE in Flight Simulator matter less than one in Word? Perhaps in the grand scheme of things, but I hope the Flight Simulator team is fixing their RCEs.

Stepping beyond the problems internal to DREAD to DREAD within a software organization, it only measures half of what you need to measure. You need to measure both the security severity and the fix cost. Otherwise, you run the risk of finding something with a DREAD of 10, but it’s a key feature (Office Macros), and so it escalates, and you don’t fix it. There are other issues which are easy to fix (S3 bucket permissions), and so it doesn’t matter if you thought discoverability was low. This is shared by other systems, but the focus on a crisp line in DREAD, everything above a 6.5 gets fixed, exacerbates the issue.

For all these reasons, with regards to DREAD? Fully skeptical, and I have been for over a decade. If you want to fix these things, the thing to do is not create confusion by saying “DREAD can also be a 1-3 system!”, but to version and revise DREAD, for example, by releasing DREAD 2. I’m exploring a similar approach with DFDs.

I’m hopeful that this post can serve as a collection of reasons to not use DREAD v1, or properties that a new system should have. What’d I miss?

Threat Model Thursday: Google on Kubernetes

There’s a recent post on the Google Cloud Platform Blog, “Exploring container security: Isolation at different layers of the Kubernetes stack” that’s the subject of our next Threat Modeling Thursday post. As always, our goal is to look and see what we can learn, not to say ‘this is bad.’ There’s more than one way to do it. Also, last time, I did a who/what/why/how analysis, which turned out to be pretty time consuming, and so I’m going to avoid that going forward.

The first thing to point out is that there’s a system model that intended to support multiple analyses of ‘what can go wrong.’ (“Sample scenario…Time to do a little threat modeling.”) This is a very cool demonstration of how to communicate about security along a supply chain. In this instance, the answers to “what are we working on” vary with who “we” are. That might be the Kubernetes team, or it might be someone using Kubernetes to implement a Multi-tenant SaaS workload.

What are we working on?

The answers to this are either Kubernetes or the mutli-tenant system. The post includes a nice diagram (reproduced above) of Kubernetes and its boundaries. Speaking of boundaries, they break out security boundaries which enforce the trust boundaries. I’ve also heard ‘security boundaries’ referred to as ‘security barriers’ or ‘boundary enforcement.’ They also say “At Google, we aim to protect all trust boundaries with at least two different security boundaries that each need to fail in order to cross a trust boundary.”

But you can use this diagram to help you either improve Kubernetes, or to improve the security of systems hosted in Kubernetes.

What can go wrong?

Isolation failures. You get a resource isolation failure…you get a network isolation failure, everyone gets an isolation failure! Well, no, not really. You only get an isolation fail if your security boundaries fail. (Sorry! Sorry?)

Oprah announcing everyone gets an isolation failure

This use of isolation is interestingly different from STRIDE or ATT&CK. In many threat models of userland code on a desktop, the answer is ‘and then they can run code, and do all sorts of things.’ The isolation failure was the end of a chain, rather than the start, and you focus on the spoof, tamper or EoP/RCE that gets you onto the chain. In that sense, isolation may seem frustratingly vague for many readers. But isolation is a useful property to have, and more importantly, it’s what we’re asking Kubernetes to provide.

There’s also mention of cryptomining (and cgroups as a fix) and running untrusted code (use sandboxing). Especially with regards to untrusted code, I’d like to see more discussion of how to run untrusted code, which may or may not be inside a web trust boundary, semi-safely, which is to say either attempting to control its output, or safely, which is to say, in a separate web namespace.

What are we going to do about it?

You can use this model to decide what you’re going to do about it. How far up or down the list of isolations should you be? Does your app need its own container, pod, node, cluster?

I would like to see more precision in the wording of the controls — what does ‘some’ control-pane isolation mean? Is it a level of effort to overcome, a set of things you can rely on and some you can’t? The crisp expression of these qualities isn’t easy, but the authors are in a better position to express them than their readers. (There may be more on this elsewhere in the series.)

Did we do a good job?

There’s no explicit discussion, but my guess is that the post was vetted by a great many people.

To sum up, this is a great example of using threat modeling to communicate between supplier and customer. By drawing the model and using it to threat model, they help people decide if GCP is right, and if so, how to configure it in the most secure way.

What do you see in the models?

Joining the Continuum Team

I’m pleased to share the news that I’ve joined Continuum Security‘s advisory board. I am excited about the vision that Continuum is bringing to software security: “We help you design, build and manage the security of your software solutions.” They’re doing so for both happy customers and a growing community. And I’ve come to love their framing: “Security is not special. Performance, quality and availability is everyone’s responsibility and so is security. After all, who understands the code and environment better than the developers and ops teams themselves?” They’re right. Security has to earn our seat at the table. We have to learn to collaborate better, and that requires software that helps the enterprise manage application security risk from the start of, and throughout, the software development process.

Threat Model Thursday: Q&A

In a comment on “Threat Model Thursday: ARM’s Network Camera TMSA“, Dips asks:

Would it been better if they had been more explicit with their graphics ? I am a beginner in Threat Modelling and would have appreciated a detailed diagram denoting the trust boundaries. Do you think it would help? Or it would further complicate?

That’s a great question, and exactly what I hoped for when I thought about a series. The simplest answer is ‘probably!’ More explicit boundaries would be helpful. My second answer is ‘that’s a great exercise!’ Where could the boundaries be placed? What would enforce them there? Where else could you put them? What are the tradeoffs between the two?

My third answer is to re-phrase the question. Rather than asking ‘would it help,’ let’s ask ‘who might be helped by better boundary demarcation’ ‘when would it help them,’ and ‘is this the most productive thing to improve?’ I would love to hear everyone’s perspective.

Lastly, it would be reasonable to expect that Arm might produce a model that depends on the sorts of boundaries that their systems can help protect. It would be really interesting to see a model from a different perspective. If someone draws one or finds one, I’d be happy to look at it for the next article in the series.

Designing for Good Social Systems

There’s a long story in the New York Times, “Where Countries Are Tinderboxes and Facebook Is a Match:”

A reconstruction of Sri Lanka’s descent into violence, based on interviews with officials, victims and ordinary users caught up in online anger, found that Facebook’s newsfeed played a central role in nearly every step from rumor to killing. Facebook officials, they say, ignored repeated warnings of the potential for violence, resisting pressure to hire moderators or establish emergency points of contact.

I’ve written previously about the drama triangle, how social media drives engagement through dopamine and hatred, and a tool to help you breathe through such feelings.

These social media tools are dangerous, not just to our mental health, but to the health of our societies. They are actively being used to fragment, radicalize and undermine legitimacy. The techniques to drive outrage are developed and deployed at rates that are nearly impossible for normal people to understand or engage with. We, and these platforms, need to learn to create tools that preserve the good things we get from social media, while inhibiting the bad. And in that sense, I’m excited to read about “20 Projects Will Address The Spread Of Misinformation Through Knight Prototype Fund.”

We can usefully think of this as a type of threat modeling.

  • What are we working on? Social technology.
  • What can go wrong? Many things, including threats, defamation, and the spread of fake news. Each new system context brings with it new types of fail. We have to extend our existing models and create new ones to address those.
  • What are we going to do about it? The Knight prototypes are an interesting exploration of possible answers.
  • Did we do a good job? Not yet.

These emergent properties of the systems are not inherent. Different systems have different problems, and that means we can discover how design choices interact with these downsides. I would love to hear about other useful efforts to understand and respond to these emergent types of threats. How do we characterize the attacks? How do we think about defenses? What’s worked to minimize the attacks or their impacts on other systems? What “obvious” defenses, such as “real names,” tend to fail?

Image: Washington Post