Shostack + Friends Blog

 

Data Flow Diagrams 3.0

[no description provided] data flow diagram sample

In the Brakesec podcast, I used a new analogy for why we need to name our work. When we talk about cooking, we have very specific recipes that we talk about: Julia Child's beef bourguignon. Paul Prudhomme's blackened fish. We hope that new cooks will follow the recipes until they get a feel for them, and that they can then start adapting and modifying them, as they generate mental models of what they're doing.

But we talk about threat modeling we don't label our recipes. We say this is how to threat model, as if that's not as broad as "this is how to cook."

And in that podcast, I realized that I've been guilty of definition drift in how I talk about data flow diagrams. Data flow diagrams, DFDs are also called 'threat model diagrams' because they're so closely associated with threat modeling. And as I've used them over the course of a decade, there have been many questions:

  • Do you start with a context diagram?
  • What's a multi-process, and when should I use one?
  • Do I really need to draw single-headed arrows? They make my diagram hard to read!
  • Is this process inside this arc? Is an arc the best way to show a trust boundary?
  • Should I color things?

Those questions I've initiated changes, such as showing a process as a rounded rectangle (versus a circle), eliminating rules such as all arrows are uni-directional, and advocating for trust boundaries as labeled boxes.

What I have not done is been crisp about what these changes are in a way that lets a team say "we use v3 DFDs" the way they might say "we use Python 3." (ok, no one says either, I know!)

I'm going to retroactively label all of these changes as DFD3.0. DFD v1 was a 1970s construct. DFD2 was the critical addition of trust boundaries. And a version 3 DFD is defined as follows:

  1. It uses 5 symbols. A rectangle represents an external entity, a person or code outside your control. A rounded rectangle represents a process. They're connected by arrows, which can be single or double headed. Data stores are represented by parallel lines. A trust boundary is a closed shape, usually a box. All lines are solid, except those used for trust boundaries, which are dashed or dotted. (There is no "multi-process" symbol in DFD3.)
  2. It must not* depend on the use of color, but can use color for additional information.
  3. All elements should have a label.
  4. You may have a context diagram if the system is complex. One is not required.

* Must, must not, should, should not are used per IETF norms.

This also allows us to talk about what might be in a DFD3.1. I know that I usually draw disks with the "drum" symbol, and I see a lot of people using that. It seems like a reasonable addition.

Using specific naming also allows us to fork. If you want to define a different type of DFD, have at it. If we have a bunch, we can figure out how keep things clear. Oh, and speaking of forking, I put this on github: DFD3.

Using specific naming allows us to talk about testing and maturity in the sense of "this is in alpha test." "This has been used for several years, we took feedback, adjusted, and now it's release quality." I think that DFD3 is release quality, but it probably needs some beta testing for the definitions.

Similarly, DREAD has a bunch of problems, including a lack of definition. I use mention of DREAD as a way to see if people are threat modeling well. And one challenge there is that people silently redefine DREAD to mean something other than what it meant when Michael Howard and David LeBlanc talked about it in Writing Secure Code (2nd ed, 2003). If you want to build something new, your customers and users need to understand that it's new, so they don't get confused by it. Therefore, you need to give your new thing a new name. You could call it DREAD2, a DRE4D, DRECK, I don't really care. What I care about is that it's easily distinguished, and the first step towards that is a new name.

[Update: What's most important is not the choices that I've made for what's in DFD3, but the grouping of those choices into DFD3, so that you can make your own choices and our tools can compete in the market.