Category: measurement

DNS Security

I’m happy to say that some new research by Jay Jacobs, Wade Baker, and myself is now available, thanks to the Global Cyber Alliance.

They asked us to look at the value of DNS security, such as when your DNS provider uses threat intel to block malicious sites. It’s surprising how effective it is for a tool that’s so easy to deploy. (Just point to a DNS server like 9.9.9.9).


The report is available from GCA’s site: Learn About How DNS Security Can Mitigate One-Third of Cyber Incidents

Polymorphic Warnings On My Mind

There’s a fascinating paper, “Tuning Out Security Warnings: A Longitudinal Examination Of Habituation Through Fmri, Eye Tracking, And Field Experiments.” (It came out about a year ago.)

The researchers examined what happens in people’s brains when they look at warnings, and they found that:

Research in the fields of information systems and human-computer interaction has shown that habituation—decreased response to repeated stimulation—is a serious threat to the effectiveness of security warnings. Although habituation is a neurobiological phenomenon that develops over time, past studies have only examined this problem cross-sectionally. Further, past studies have not examined how habituation influences actual security warning adherence in the field. For these reasons, the full extent of the problem of habituation is unknown.

We address these gaps by conducting two complementary longitudinal experiments. First, we performed an experiment collecting fMRI and eye-tracking data simultaneously to directly measure habituation to security warnings as it develops in the brain over a five-day workweek. Our results show not only a general decline of participants’ attention to warnings over time but also that attention recovers at least partially between workdays without exposure to the warnings. Further, we found that updating the appearance of a warning—that is, a polymorphic design—substantially reduced habituation of attention.

Second, we performed a three-week field experiment in which users were naturally exposed to privacy permission warnings as they installed apps on their mobile devices. Consistent with our fMRI results, users’ warning adherence substantially decreased over the three weeks. However, for users who received polymorphic permission warnings, adherence dropped at a substantially lower rate and remained high after three weeks, compared to users who received standard warnings. Together, these findings provide the most complete view yet of the problem of habituation to security warnings and demonstrate that polymorphic warnings can substantially improve adherence.

It’s not short, but it’s not hard reading. Worthwhile if you care about usable security.

Leave Those Numbers for April 1st

“90% of attacks start with phishing!*” “Cyber attacks will cost the world 6 trillion by 2020!”

We’ve all seen these sorts of numbers from vendors, and in a sense they’re April Fools day numbers: you’d have to be a fool to believe them. But vendors quote insane because there’s no downside and much upside. We need to create more and worse downside, and the road there lies through losing sales.

We need to call vendors on these number, and say “I’m sorry, but if you’d lie to me about that, what about the numbers you’re claiming that are hard to verify? The door is to your left.”

If we want to change the behavior, we have to change the impact of the behavior. We need to tell vendors that there’s no place for made up numbers, debunked numbers, unsupported numbers in our buying processes. If those numbers are in their sales and marketing material, they’re going to lose business for it.

* This one seems to trace back to analysis that 90% of APT attacks in the Verizon DBIR started with phishing, but APT and non-APT attacks are clearly different.

There’s an interesting article in the CBC, where journalists took a set of flights, swabbed surfaces, and worked with a microbiologist to culture their samples.

What they found will shock you!

Well, airplanes are filthy. Not really shocking. What was surprising to me was that the dirtiest of the surfaces they tested was the headrest. (They did not test the armrests.) Also, the seat pocket is a nice incubator and rarely cleaned. Not all that surprising, but I hadn’t considered it.

I’m pleased to be able to share work that Shostack & Associates and the Cyentia Institute have been doing for the Global Cyber Alliance. In doing this, we created some new threat models for email, and some new statistical analysis of

It shows the 1,046 domains that have successfully activated strong protection with GCA’s DMARC tools will save an estimated $19 million to $66 million dollars from limiting BEC for the year of 2018 alone. These organizations will continue to reap that reward every year in which they maintain the deployment of DMARC. Additional savings will be realized as long as DMARC is deployed.

Their press release from this morning is at here and the report download is here.

The DREAD Pirates

Then he explained the name was important for inspiring the necessary fear. You see, no one would surrender to the Dread Pirate Westley.

The DREAD approach was created early in the security pushes at Microsoft as a way to prioritize issues. It’s not a very good way, you see no one would surrender to the Bug Bar Pirate, Roberts. And so the approach keeps going, despite its many problems.

There are many properties one might want in a bug ranking system for internally found bugs. They include:

  • A cool name
  • A useful mnemonic
  • A reduction in argument about bugs
  • Consistency between raters
  • Alignment with intuition
  • Immutability of ratings: the bug is rated once, and then is unlikely to change
  • Alignment with post-launch/integration/ship rules

DREAD certainly meets the first of these, and perhaps the second two. And it was an early attempt at a multi-factor rating of bugs. But there are many problems which DREAD brings that newer approaches deal with.

The most problematic aspect of DREAD is that there’s little consistency, especially in the middle. What counts as a 6 damage versus 7, or 6 versus 7 exploitability? Without calibration, different raters will not be consistent. Each of the scores can be mis-estimated, and there’s a tendency to underestimate things like discoverability of bugs in your own product.

The second problem is that you set an arbitrary bar for fixes, for example, everything above a 6.5 gets fixed. That makes the distinction between a 6 and a 7 sometimes matters a lot. The score does not relate to what needs to get fixed when found externally.

This illustrates why Discoverability is an odd things to bring into the risk equation. You may have a discoverability of “1” on Monday, and 10 on Tuesday. (“Thanks, Full-Disclosure!”) So something could have a 5.5 DREAD score because of low discoverability but require a critical update. Suddenly the DREAD score of the issue is mutable. So it’s hard to use DREAD on an externally discovered bug, or one delivered via a bug bounty. So now you have two bug-ranking systems, and what do you do when they disagree? This happened to Microsoft repeatedly, and led to the switch to a bug bar approach.

Affected users is also odd: does an RCE in Flight Simulator matter less than one in Word? Perhaps in the grand scheme of things, but I hope the Flight Simulator team is fixing their RCEs.

Stepping beyond the problems internal to DREAD to DREAD within a software organization, it only measures half of what you need to measure. You need to measure both the security severity and the fix cost. Otherwise, you run the risk of finding something with a DREAD of 10, but it’s a key feature (Office Macros), and so it escalates, and you don’t fix it. There are other issues which are easy to fix (S3 bucket permissions), and so it doesn’t matter if you thought discoverability was low. This is shared by other systems, but the focus on a crisp line in DREAD, everything above a 6.5 gets fixed, exacerbates the issue.

For all these reasons, with regards to DREAD? Fully skeptical, and I have been for over a decade. If you want to fix these things, the thing to do is not create confusion by saying “DREAD can also be a 1-3 system!”, but to version and revise DREAD, for example, by releasing DREAD 2. I’m exploring a similar approach with DFDs.

I’m hopeful that this post can serve as a collection of reasons to not use DREAD v1, or properties that a new system should have. What’d I miss?

John Harrison’s Struggle Continues

Today is John Harrison’s 352nd birthday, and Google has a doodle to celebrate. Harrison was rescued from historical obscurity by Dava Sobel’s excellent book Longitude, which documented Harrison’s struggle to first build and then demonstrate the superiority of his clocks to the mathematical and astronomical solutions heralded by leading scientists of the day. Their methods were complex, tedious and hard to execute from the deck of a ship.

To celebrate, I’d like to share this photo I took at the Royal Museums Greenwich in 2017:

Harrison Worksheet framed

(A Full size version is on Flickr.)

As the placard says, “First produced in 1768, this worksheet gave navigators an easy process for calculating their longitude using new instruments and the Nautical Almanac. Each naval ship’s master was required to train with qualified teachers in London or Portsmouth in order to gain a certificate of navigational competence.” (Emphasis added.)

a fresh look at a 3700-year-old clay tablet suggests that Babylonian mathematicians not only developed the first trig table, beating the Greeks to the punch by more than 1000 years, but that they also figured out an entirely new way to look at the subject. However, other experts on the clay tablet, known as Plimpton 322 (P322), say the new work is speculative at best. (“This ancient Babylonian tablet may contain the first evidence of trigonometry.”)

The paper, “Plimpton 322 is Babylonian exact sexagesimal trigonometry” is short and open access, and also contains this gem:

If this interpretation is correct, then P322 replaces Hipparchus’ ‘table of chords’ as the world’s oldest trigonometric table — but it is additionally unique because of its exact nature, which would make it the world’s only completely accurate trigonometric table. These insights expose an entirely new level of sophistication for OB mathematics.

Do Games Teach Security?

There’s a new paper from Mark Thompson and Hassan Takabi of the University of North Texas. The title captures the question:
Effectiveness Of Using Card Games To Teach Threat Modeling For Secure Web Application Developments

Gamification of classroom assignments and online tools has grown significantly in recent years. There have been a number of card games designed for teaching various cybersecurity concepts. However, effectiveness of these card games is unknown for the most part and there is no study on evaluating their effectiveness. In this paper, we evaluate effectiveness of one such game, namely the OWASP Cornucopia card game which is designed to assist software development teams identify security requirements in Agile, conventional and formal development
processes. We performed an experiment where sections of graduate students and undergraduate students in a security related course at our university were split into two groups, one of which played the Cornucopia card game, and one of which did not. Quizzes were administered both before and after the activity, and a survey was taken to measure student attitudes toward the exercise. The results show that while students found the activity useful and would like to see this activity and more similar exercises integrated into the classroom, the game was not easy to understand. We need to spend enough time to familiarize the students with the game and prepare them for the exercises using the game to get the best results.

I’m very glad to see games like Cornucopia evaluated. If we’re going to push the use of Cornucopia (or Elevation of Privilege) for teaching, then we ought to be thinking about how well they work in comparison to other techniques. We have anecdotes, but to improve, we must test and measure.

What Boards Want in Security Reporting

Sub optimal dashboard 3

Recently, some of my friends were talking about a report by Bay Dynamics, “How Boards of Directors Really Feel About Cyber Security Reports.” In that report, we see things like:

More than three in five board members say they are both significantly or very “satisfied” (64%) and “inspired”(65%) after the typical presentation by IT and security executives about the company’s cyber risk, yet the majority (85%) of board members
believe that IT and security executives need to improve the way they report to the board.”
Only one-third of IT and security executives believe the board comprehends the cyber security information provided to them (versus) 70% of board members surveyed report that they understand everything they’re being told by IT and security executives in their presentations

Some of this is may be poor survey design or reporting: it’s hard to survey someone to see if they don’t understand, and the questions aren’t listed in the survey.

But that may be taking the easy way out. Perhaps what we’re being told is consistent. Security leaders don’t think the boards are getting the nuance, while the boards are getting the big picture just fine. Perhaps boards really do want better reporting, and, having nothing useful to suggest, consider themselves “satisfied.”

They ask for numbers, but not because they really want numbers. I’ve come to believe that the reason they ask for numbers is that they lack a feel for the risks of cyber. They understand risks in things like product launches or moving manufacturing to China, or making the wrong hire for VP of social media. They are hopeful that in asking for numbers, they’ll learn useful things about the state of what they’re governing.

So what do boards want in security reporting? They want concrete, understandable and actionable reports. They want to know if they have the right hands on the rudder, and if those hands are reasonably resourced. (Boards also know that no one who reports to them is every really satisfied with their budget.)

(Lastly, the graphic? Overly complex, not actionable, lacks explicit recommendations or requests. It’s what boards don’t want.)

Navigation