Data Flow Diagrams 3.0

In the Brakesec podcast, I used a new analogy for why we need to name our work. When we talk about cooking, we have very specific recipes that we talk about: Julia Child’s beef bourguignon. Paul Prudhomme’s blackened fish. We hope that new cooks will follow the recipes until they get a feel for them, and that they can then start adapting and modifying them, as they generate mental models of what they’re doing.

But we talk about threat modeling we don’t label our recipes. We say this is how to threat model, as if that’s not as broad as “this is how to cook.”

And in that podcast, I realized that I’ve been guilty of definition drift in how I talk about data flow diagrams. Data flow diagrams, DFDs are also called ‘threat model diagrams’ because they’re so closely associated with threat modeling. And as I’ve used them over the course of a decade, there have been many questions:

  • Do you start with a context diagram?
  • What’s a multi-process, and when should I use one?
  • Do I really need to draw single-headed arrows? They make my diagram hard to read!
  • Is this process inside this arc? Is an arc the best way to show a trust boundary?
  • Should I color things?

Those questions I’ve initiated changes, such as showing a process as a rounded rectangle (versus a circle), eliminating rules such as all arrows are uni-directional, and advocating for trust boundaries as labeled boxes.

What I have not done is been crisp about what these changes are in a way that lets a team say “we use v3 DFDs” the way they might say “we use Python 3.” (ok, no one says either, I know!)

I’m going to retroactively label all of these changes as DFD3.0. DFD v1 was a 1970s construct. DFD2 was the critical addition of trust boundaries. And a version 3 DFD is defined as follows:

  1. It uses 5 symbols. A rectangle represents an external entity, a person or code outside your control. A rounded rectangle represents a process. They’re connected by arrows, which can be single or double headed. Data stores are represented by parallel lines. A trust boundary is a closed shape, usually a box. All lines are solid, except those used for trust boundaries, which are dashed or dotted. (There is no “multi-process” symbol in DFD3.)
  2. It must not* depend on the use of color, but can use color for additional information.
  3. All elements should have a label.
  4. You may have a context diagram if the system is complex. One is not required.

* Must, must not, should, should not are used per IETF norms.

This also allows us to talk about what might be in a DFD3.1. I know that I usually draw disks with the “drum” symbol, and I see a lot of people using that. It seems like a reasonable addition.


Using specific naming also allows us to fork. If you want to define a different type of DFD, have at it. If we have a bunch, we can figure out how keep things clear. Oh, and speaking of forking, I put this on github: DFD3.

Using specific naming allows us to talk about testing and maturity in the sense of “this is in alpha test.” “This has been used for several years, we took feedback, adjusted, and now it’s release quality.” I think that DFD3 is release quality, but it probably needs some beta testing for the definitions.

Similarly, DREAD has a bunch of problems, including a lack of definition. I use mention of DREAD as a way to see if people are threat modeling well. And one challenge there is that people silently redefine DREAD to mean something other than what it meant when Michael Howard and David LeBlanc talked about it in Writing Secure Code (2nd ed, 2003). If you want to build something new, your customers and users need to understand that it’s new, so they don’t get confused by it. Therefore, you need to give your new thing a new name. You could call it DREAD2, a DRE4D, DRECK, I don’t really care. What I care about is that it’s easily distinguished, and the first step towards that is a new name.

[Update: What’s most important is not the choices that I’ve made for what’s in DFD3, but the grouping of those choices into DFD3, so that you can make your own choices and our tools can compete in the market.

6 thoughts on “Data Flow Diagrams 3.0”

  1. So a bit more on the changes:
    Why box trust boundaries? Because it’s clear what’s inside them, and what’s not. This doesn’t always matter. For example, in
    https://en.wikipedia.org/wiki/Threat_model#/media/File:Data_Flow_Diagram_-_Online_Banking_Application.jpg an arc is fine but in figure 2 of https://technet.microsoft.com/en-us/security/hh855044.aspx are the ui service and the news feed service inside the same boundary? If you always draw boxes you don’t need to re-check to see if they’re clear.

    Rounded rectangles replace circles for space efficiency.

  2. Here you are: Damage Reproducibility Exploitability Containability (how noisy it is, therefore how easy to discover) Knowledge required Now aren’t you sorry for what you’ve done? 🙂

  3. Question, why allow both single-headed and double-headed arrows?
    Double-headed arrows don’t show which direction the data is flowing and ‘who is calling who’ first. Single-headed arrows make it clear that data comes from one thing and goes to another.

    1. Great question! The answer is in practice, people rebel against the rule. I saw people hack up the visio stencils in the MS TM tool to let themselves use two-headed arrows. So we need strong reasoning to block it.

      There are arguments in favor of the dual-head arrows: diagrams get really busy when every data flow is represented by two arrows, and it’s twice the work to draw two arrows.

      I’ve seen people use a convention, “filled arrowhead shows who called.” That works pretty well, and it’s subtle/easy to miss the detail.

      1. Thank you for the reply! That makes sense for building more buy-in. I could imagine many flows where it doesn’t matter ‘who calls who’ as much as that there is a connection there in the first place.

        I guess the main issue for a lot of flows is that step 1 might be the request and then the response might be step 10. A double headed arrow doesn’t lend itself to show the order of calls very well. (Though, denying a perfectly good data flow diagram over a double headed arrow would be a great way to make friends ;P )

        1. You’re welcome!

          I think my key point is not that one method is right and the other is wrong, but rather we should see this like Perl and Python: we can use different languages at different times.

Comments are closed.