The Fights We Have to Fight: Fixing Bugs

One of the recurring lessons from Petroski is how great engineers overcome not only the challenges of physical engineering: calculating loads, determining build orders, but they also overcome the real world challenges to their ideas, including financial and political ones. For example:

Many a wonderful concept, beautifully drawn by an inspired structural artist, has never risen off the paper because its cost could not be justified. Most of the great bridges of the nineteenth century, which served to define bridge building and other technological achievements for the twentieth century, were financed by private enterprise, often led by the expanding railroads. Engineers acting as entrepreneurs frequently put together the prospectuses, and in some cases almost single-handedly promoted their dreams to the realists. […] Debates over how to pay for them were common. (Engineers of Dreams: Great Bridge Builders and the Spanning of America, Henry Petroski)

Many security professionals have a hobby of griping that products get rushed to market, maybe to be secured later. We have learned to be more effective at building security in, and in doing so, reduce product costs and increase on-time delivery. But some products were built before we knew how to do that, and others are going to get built by companies who choose not to do that. And in that sense, Collin Greene’s retrospective, “Fixing Security Bugs” is very much worth your time. It’s a retrospective on the Vista security program from a pen-test perspective.

Hacking: Exciting.
Finding bugs: Exciting.
Fixing those bugs: Not exciting.
The thing is, the finish line for our job in security is getting bugs fixed¹, not just found and filed. Doing this effectively is not a technology problem. It is a communications, organizational² and psychology problem.

I joined Microsoft while the Vista pen test was finishing up, and so my perspective is complimentary. I’d like to add a few additional perspectives to his points.

First, he asks “is prioritization correct?” After Vista, the SDL team created security bug bars, and then later refined them to align with the MSRC update priorities. That alignment with the MSRC priorities was golden. It made it super-clear that if you didn’t fix this before ship, you were going to have to do an update later. As a security engineer, you need to align your prioritization to the all up delivery priorities. Having everything be “extremely critical,” “very critical,” or “moderately critical” means you don’t know what matters, and so nothing does.

Second, “why security matters” was still a fight to be fought in Vista. By Windows 7, security had completed its “move left.” The spec form contained sections for security and privacy. Threat model review was a gate for start of coding. These process changes happened while developers were “rebelling” against Vista’s “overweight” engineering process. They telegraphed that security mattered to management and executives. As a security engineer, you need to get management to spend time talking about how security is balanced with other priorities.

Third, he points out that escalating to a manager can feel bad, but he’s right: “Often the manager has the most context on priorities.” Management saying “get this fixed” is an expression of prioritization. If you’ve succeeded in your work on “why security matters,” then management will know that they need to reinforce that message. Bringing the issues to them, responsibly, helps them get their job done. If it feels bad to escalate, then it’s worth asking if you have full buy in on security.

Now, I’m talking about security as if it matters to management. More and more, that’s the case. Something in the news causes leadership to say “we have to do better,” and they believe that there are things that they can do. In part that belief is because very large companies have been talking about how to make it work. But when that belief isn’t there, it’s your job as an engineer to, as Petroski says, single-handedly promote your dreams to the realists. Again, Greene’s post is full of good ideas.

Lastly, not everything is a bug. I discussed vulnerabilities versus design recently in “Emergent Design Issues.”

(Photo: https://www.pexels.com/photo/black-and-brown-insect-37733/)

Why is “Reply” Not the Strongest Signal?

So apparently my “friends” at outlook.com are marking my email as junk today, with no explanation. They’re doing this to people who have sent me dozens of emails over the course of months or years.

Why does no spam filter seem to take repeated conversational turns into account? Is there a stronger signal that I want to engage with someone than…repeatedly engaging?

20 Year Software: Engineering and Updates

Twenty years ago, Windows 95 was the most common operating system. Yahoo and Altavista were our gateways to the internet. Steve Jobs just returned to Apple. Google didn’t exist yet. America Online had just launched their Instant Messenger. IPv6 was coming soon. That’s part of the state of software in 1997, twenty years ago. We need to figure out what engineering software looks like for a twenty year lifespan, and part of that will be really doing such engineering, because theory will only take us to the limits of our imaginations.

Today, companies are selling devices that will last twenty years in the home, such as refrigerators and speakers, and making them with network connectivity. That network connectivity is both full internet connectivity, that is, Internet Protocol stacks, and also local network connectivity, such as Bluetooth and Zigbee.

We have very little idea how to make software that can survive as long as AOL IM did. (It’s going away in December, if you missed the story.)

Recently, there was a news cycle around Sonos updating their privacy policy. I don’t want to pick on Sonos, because I think what they’re experiencing is just a bit of the future, unevenly distributed. Responses such as this were common: “Hardware maker: Give up your privacy and let us record what you say in your home, or we’ll destroy your property:”

“The customer can choose to acknowledge the policy, or can accept that over time their product may cease to function,” the Sonos spokesperson said, specifically.

Or, as the Consumerist, part of Consumer Reports, puts it in “Sonos Holds Software Updates Hostage If You Don’t Sign New Privacy Agreement:

Sonos hasn’t specified what functionality might no longer work in the future if customers don’t accept the new privacy policy.

There are some real challenges here, both technical and economic. Twenty years ago, we didn’t understand double-free or format string vulnerabilities. Twenty years of software updates aren’t going to be cheap. (I wrote about the economics in “Maintaining & Updating Software.”)

My read is that Sonos tried to write a privacy policy that balanced their anticipated changes, including Alexa support, with a bit of future-proofing. I think that the reason they haven’t specified what might not work is because they really don’t know, because nobody knows.

The image at the top is the sole notification that I’ve gotten that Office 2011 is no longer getting security updates. (Sadly, it’s only shown once per computer, or perhaps once per user of the computer.) Microsoft, like all tech companies, will cut functionality that it can’t support, like <"a href="https://www.macworld.com/article/1154785/business/welcomebackvisualbasic.html">Visual Basic for Mac and also “end of lifes” its products. They do so on a published timeline, but it seems wrong to apply that to a refrigerator, end of lifeing your food supply.

There’s probably a clash coming between what’s allowable and what’s economically feasible. If your contract says you can update your device at any time, it still may be beyond “the corners of the contract” to shut it off entirely. Beyond economically challenging, it may not even be technically feasible to update the system. Perhaps the chip is too small, or its power budget too meager, to always connect over TLS4.2, needed addresses the SILLYLOGO attack.

What we need might include:

  • A Dead Software Foundation, dedicated to maintaining the software which underlies IoT devices for twenty years. This is not only the Linux kernel, but things like tinybox and openssl. Such a foundation could be funded by organizations shipping IoT devices, or even by governments, concerned about the externalities, what Sean Smith called “the Cyber Love Canal” in The Internet of Risky Things. The Love Canal analogy is apt; in theory, the government cleans up after the firms that polluted are gone. (The practice is far more complex.)
  • Model architectures that show how to engineer devices, such as an internet speaker, so that it can effectively be taken offline when the time comes. (There’s work in the mobile app space on making apps work offline, which is related, but carries the expectation that the app will eventually reconnect.)
  • Conceptualization of the legal limits of what you can sign away in the fine print. (This may be everything; between severability and arbitration clauses, the courts have let contract law tilt very far towards the contract authors, but Congress did step in to write the Consumer Review Fairness Act.) The FTC has commented on issues of device longevity, but not (afaik) on limits of contracts.

What else do we need to build software that survives for twenty years?

The Dope Cycle and a Deep Breath

Back in January, I wrote about “The Dope Cycle and the Two Minutes Hate.” In that post, I talked about:

Not kidding: even when you know you’re being manipulated into wanting it, you want it. And you are being manipulated, make no mistake. Site designers are working to make your use of their site as pleasurable as possible, as emotionally engaging as possible. They’re caught up in a Red Queen Race, where they must engage faster and faster just to stay in place. And when you’re in such a race, it helps to steal as much as you can from millions of years of evolution. [Edit: I should add that this is not a moral judgement on the companies or the people, but rather an observation on what they must do to survive.] That’s dopamine, that’s adrenaline, that’s every hormone that’s been covered in Popular Psychology. It’s a dope cycle, and you can read that in every sense of the word dope.

I just discovered a fascinating tool from a company called Dopamine Labs. Dopamine Labs is a company that helps their corporate customers drive engagement: “Apps use advanced software tools that shape and control user behavior. We know because [we sell] it to them.” They’ve released a tool called Space: “Space uses neuroscience and AI to help you kick app addiction. No shame. No sponsors. Just a little breathing room to help you take back control.” As they say: “It’s the same math that we use to get people addicted to apps, just run backwards.”

Space app
There are some fascinating ethical questions involved in selling both windows and bricks. I’m going to say that you participants in a red queen race might as well learn what countermeasures to their techniques are by building them. Space works as a Chrome plugin and as an iOS and Android App. I’ve installed it, and I like it more than I like another tool I’ve been using (Dayboard). I really like Dayboard’s todo list, but feel that it cuts me off in the midst of time wasting, rather than walking me away.)

The app is at http://youjustneedspace.com/.

As we go into big conferences, it might be worth installing. (Also as we head into conferences, be excellent to each other. Know and respect your limits and those of others. Assume good intent. Avoid getting pulled into a “Drama Triangle.”)

Threat Modeling Password Managers

There was a bit of a complex debate last week over 1Password. I think the best article may be Glenn Fleishman’s “AgileBits Isn’t Forcing 1Password Data to Live in the Cloud,” but also worth reading are Ken White’s “Who moved my cheese, 1Password?,” and “Why We Love 1Password Memberships,” by 1Password maker AgileBits. I’ve recommended 1Password in the past, and I’m not sure if I agree with Agilebits that “1Password memberships are… the best way to use 1Password.” This post isn’t intended to attack anyone, but to try to sort out what’s at play.

This is a complex situation, and you’ll be shocked, shocked to discover that I think a bit of threat modeling can help. Here’s my model of

what we’re working on:

Password manager

Let me walk you through this: There’s a password manager, which talks to a website. Those are in different trust boundaries, but for simplicity, I’m not drawing those boundaries. The two boundaries displayed are where the data and the “password manager.exe” live. Of course, this might not be an exe; it might be a .app, it might be Javascript. Regardless, that code lives somewhere, and where it lives is important. Similarly, the passwords are stored somewhere, and there’s a boundary around that.

What can go wrong?

If password storage is local, there is not a fat target at Agilebits. Even assuming they’re stored well (say, 10K iterations of PBKDF2), they’re more vulnerable if they’re stolen, and they’re easier to steal en masse [than] if they’re on your computer. (Someone might argue that you, as a home user, are less likely to detect an intruder than Agilebits. That might be true, but that’s a way to detect; the first question is how likely is an attacker to break in? They’ll succeed against you and they’ll succeed against Agilebits, and they’ll get a boatload more from breaking into Agilebits. This is not intended as a slam of Agilebits, it’s an outgrowth of ‘assume breach.’) I believe Agilebits has a simpler operation than Dropbox, and fewer skilled staff in security operations than Dropbox. The simpler operation probably means there are fewer usecases, plugins, partners, etc, and means Agilebits is more likely to notice some attacks. To me, this nets out as neutral. Fleishman promises to explain “how AgileBits’s approach to zero-knowledge encryption… may be less risky and less exposed in some ways than using Dropbox to sync vaults.” I literally don’t see his argument, perhaps it was lost in the complexity of writing a long article? [Update: see also Jeffrey Goldberg’s comment about how they encrypt the passwords. I think of what they’ve done as a very strong mitigation; with a probably reasonable assumption they haven’t bolluxed their key generation. See this 1Password Security Design white paper.]

To net it out: local storage is more secure. If your computer is compromised, your passwords are compromised with any architecture. If your computer is not compromised, and your passwords are nowhere else, then you’re safe. Not so if your passwords are somewhere else and that somewhere else is compromised.

The next issue is where’s the code? If the password manager executable is stored on your device, then to replace it, the attacker either needs to compromise your device, or to install new code on it. An attacker who can install new code on your computer wins, which is why secure updates matter so much. An attacker who can’t get new code onto your computer must compromise the password store, discussed above. When the code is not on your computer but on a website, then the ease of replacing it goes way up. There’s two modes of attack. Either you can break into one of the web server(s) and replace the .js files with new ones, or you can MITM a connection to the site and tamper with the data in transit. As an added bonus, either of those attacks scales. (I’ll assume that 1Password uses certificate pinning, but did not chase down where their JS is served.)

Netted out, getting code from a website each time you run is a substantial drop in security.

What should we do about it?

So this is where it gets tricky. There are usability advantages to having passwords everywhere. (Typing a 20 character random password from your phone into something else is painful.) In their blog post, Agilebits lists more usability and reliability wins, and those are not to be scoffed at. There are also important business advantages to subscription revenue, and not losing your passwords to a password manager going out of business is important.

Each 1Password user needs to make a decision about what the right tradeoff is for them. This is made complicated by family and team features. Can little Bobby move your retirement account tables to the cloud for you? Can a manager control where you store a team vault?

This decision is complicated by walls of text descriptions. I wish is that Agilebits would do a better job of crisply and cleanly laying out the choice that their customers can make, and the advantages and disadvantages of each. (I suggest a feature chart like this one as a good form, and the data should also be in each app as you set things up.) That’s not to say that Agilebits can’t continue to choose and recommend a default.

Does this help?

After years of working in these forms, I think it’s helpful as a way to break out these issues. I’m curious: does it help you? If not, where could it be better?

Secure updates: A threat model

Software updates

Post-Petya there have been a number of alarming articles on insecure update practices. The essence of these stories is that tax software, mandated by the government of Ukraine, was used to distribute the first Petya, and that this can happen elsewhere. Some of these stories are a little alarmist, with claims that unnamed “other” software has also been used in this way. Sometimes the attack is easy because updates are unsigned, other times its because they’re also sent over a channel with no security.

The right answer to these stories is to fix the damned update software before people get more scared of updating. That fear will survive long after the threat is addressed. So let me tell you, [as a software publisher] how to do secure upadtes, in a nutshell.

The goals of an update system are to:

  1. Know what updates are available
  2. Install authentic updates that haven’t been tampered with
  3. Strongly tie updates to the organization whose software is being updated. (Done right, this can also enable whitelisting software.)

Let me elaborate on those requirements. First, know what updates are available — the threat here is that an attacker stores your message “Version 3.1 is the latest revision, get it here” and sends it to a target after you’ve shipped version 3.2. Second, the attacker may try to replace your update package with a new one, possibly using your keys to sign it. If you’re using TLS for channel security, your TLS keys are only as secure as your web server, which is to say, not very. You want to have a signing key that you protect.

So that’s a basic threat model, which leads to a system like this:

  1. Update messages are signed, dated, and sequenced. The code which parses them carefully verifies the signatures on both messages, checks that the date is later than the previous message and the sequence number is higher. If and only if all are true does it…
  2. Get the software package. I like doing this over torrents. Not only does that save you money and improve availability, but it protects you against the “Oh hello there Mr. Snowden” attack. Of course, sometimes a belief that torrents have the “evil bit” set leads to blockages, and so you need a fallback. [Note this originally called the belief “foolish,” but Francois politely pointed out that that was me being foolish.]
  3. Once you have the software package, you need to check that it’s signed with the same key as before.
    Better to sign the update and the update message with a key you keep offline on a machine that has no internet connectivity.

  4. Since all of the verification can be done by software, and the signing can be done with a checklist, PGP/GPG are a fine choice. It’s standard, which means people can run additional checks outside your software, it’s been analyzed heavily by cryptographers.

What’s above follows the four-question framework for threat modeling: what are we working on? (Delivering updates securely); what can go wrong? (spoofing, tampering, denial of service); what are we going to do about it? (signatures and torrents). The remaining question is “did we do a good job?” Please help us assess that! (I wrote this quickly on a Sunday morning. Are there attacks that this design misses? Defenses that should be in place?)

Bicycling and Threat Modeling

Bikeshare

The Economist reports on the rise of dockless bike sharing systems in China, along with the low tech ways that the system is getting hacked:

The dockless system is prone to abuse. Some riders hide the bikes in or near their homes to prevent others from using them. Another trick involves photographing a bike’s QR code and then scratching it off to stop others from scanning it. With the stored image, the rider can then monopolise the machine. But customers caught misbehaving can have points deducted from their accounts, making it more expensive for them to rent the bikes.

Gosh, you mean you give people access to expensive stuff and they ride off into the sunset?

Threat modeling is an umbrella for a set of practices that let an organization find these sorts of attacks early, while you have the greatest flexibility in choosing your response. There are lots of characteristics we could look for: practicality, cost-effectiveness, consistency, thoroughness, speed, et cetera, and different approaches will favor one or the other. One of those characteristics is useful integration into business.

You can look at thoroughness by comparing bikes to the BMW carshare program I discussed in “The Ultimate Stopping Machine,” I think that the surprise that ferries trigger an anti-theft mechanism is somewhat surprising, and I wouldn’t dismiss a threat modeling technique, or criticize a team too fiercely for missing it. That is, there’s nuance. I’d be more critical of a team in Seattle missing the ferry issue than I would be of a team in Boulder.)

In the case of the dockless bikes, however, I would be skeptical of a technique that missed “reserving” a bike for your ongoing use. That threat seems like an obvious one from several perspectives, including that the system is labelled “dockless,” so you have an obvious contrast with a docked system.

When you find these things early, and iterate around threats, requirements and mitigations, you find opportunities to balance and integrate security in better ways than when you have to bolt it on later. (I discuss that iteration here and here.

For these bikes, perhaps the most useful answer is not to focus on misbehavior, but to reward good behavior. The system wants bikes to be used, so reward people for leaving the bikes in a place where they’re picked up soon? (Alternately, perhaps make it expensive to check out the same bike more than N times in a row, where N is reasonably large, like 10 or 15.)

Photo by Viktor Kern.

The Ultimate Stopping Machine?

a metal Immobilizer around a tireSecurity is hard in the real world. There’s an interesting story on Geekwire, “BMW’s ReachNow investigating cases of cars getting stuck on Washington State Ferries.” The story:

a ReachNow customer was forced to spend four hours on the Whidbey Island ferry this weekend because his vehicle’s wheels were locked, making the vehicle immovable unless dragged. The state ferry system won’t let passengers abandon a car on the ferry because of security concerns.

BMW’s response:

We believe that the issue is related to a security feature built into the vehicles that kicks in when the car is moving but the engine is turned off and the doors are closed.

I first encountered these immobilizing devices on a friend’s expensive car in 1999 or so. The threat is thieves equipped with a towtruck. It’s not super-surprising to discover that a service like Reachnow, where “random” people can get into a car and drive it away will have tracking devices in those cars. It’s a little more surprising that there are immobilizers in them.

Note the competing definitions of security (emphasis added in both quotes above):

  • BMW is worried about theft.
  • The state ferry system is worried about car bombs.
  • Passengers might worry about being detained next to a broken car, or about bugs in the immobilization technology. What if that kicks in on the highway because “a wire gets loose”?

In “The Evolution of Secure Things,” I wrote:

It’s about the constant imperfection of products, and how engineering is a response to perceived imperfections. It’s about the chaotic real world from which progress emerges. In a sense, products are never perfected, but express tradeoffs between many pressures, like manufacturing techniques, available materials, and fashion in both superficial and deep ways.

Surprise! There’s a way to move a vehicle a long distance with the engine off, and it’s not a tow truck!

Real products, introduced into the real world, will often involve surprises like this. One characteristic of a good security architecture is that there’s the right degree of adjustability in the product, and judging that is still a matter of engineering experience.

Similarly, one of the lessons of entrepreneurship is that the problems you experience are often surprising. Investors look for flexibility in the leaders they back because they know that they’ll be surprised along the way.

Threat Modeling & IoT

Threat modeling internet-enabled things is similar to threat modeling other computers, with a few special tensions that come up over and over again. You can start threat modeling IoT with the four question framework:

  1. What are you building?
  2. What can go wrong?
  3. What are you going to do about it?
  4. Did we do a good job?

But there are specifics to IoT, and those specifics influence how you think about each of those questions. I’m helping a number of companies who are shipping devices, and I would love to fully agree that “consumers shouldn’t have to care about the device’s security model. It should just be secure. End of story.” I agree with Don Bailey on the sentiment, but frequently the tensions between requirements mean that what’s secure is not obvious, or that “security” conflicts with “security.” (I model requirements as part of ‘what are you building.’)

When I train people to threat model, I use this diagram to talk about the interaction between threats, mitigations, and requirements:

Threats mitigations requirements

The interaction has a flavor in working with internet-enabled things, and that interaction changes from device to device. There are some important commonalities.

When looking at what you’re building, IoT devices typically lack sophisticated input devices like keyboards or even buttons, and sometimes their local output is a single LED. One solution is to put a web server on the device listening, and to pay for a sticker with a unique admin password, which then drives customer support costs. Another solution is to have the device not listen but to reach out to your cloud service, and let customers register their devices to their cloud account. This has security, privacy, and COGS downsides tradeoffs. [Update: I said downsides, but it’s more a different set of attack vectors become relevant in security. COGS is an ongoing commitment to operations; privacy is dependent on what’s sent or stored.]

When asking what can go wrong, your answers might include “a dependency has a vulnerability,” or “an attacker installs their own software.” This is an example of security being in tension with itself is the ability to patch your device yourself. If I want to be able to recompile the code for my device, or put a safe version of zlib on there, I ought to be able to do so. Except if I can update the device, so can attackers building a botnet, and 99.n% of typical consumers for a smart lightbulb are not going to patch themselves. So we get companies requiring signed updates. Then we get to the reality that most consumer devices last longer than most silicon valley companies. So we want to see a plan to release the key if the company is unable to deliver updates. And that plan runs into the realities of bankruptcy law, which is that that signing key is an asset, and its hard to value, and bankruptcy trustees are unlikely to give away your assets. There’s a decent pattern (allegedly from the world of GPU overclocking), which is that you can intentionally make your device patchable by downloading special software and moving a jumper. This requires a case that can be opened and reclosed, and a jumper or other DFU hardware input, and can be tricky on inexpensive or margin-strained devices.

That COGS (cost of goods sold) downside is not restricted to security, but has real security implications, which brings us to the question of what are you going to do about it. Consumers are not used to subscribing to their stoves, nor are farmers used to subscribing to their tractors. Generally, both have better things to do with their time than to understand the new models. But without subscription revenue, it’s very hard to make a case for ongoing security maintenance. And so security comes into conflict with consumer mental models of purchase.

In the IoT world, the question of did we do a good job becomes have we done a good enough job? Companies believe that there is a first mover advantage, and this ties to points that Ross Anderson made long ago about the tension between security and economics. Good threat modeling helps companies get to those answers faster. Sharing the tensions help us understand what the tradeoffs look like, and with those tensions, organizations can better define their requirements and get to a consistent level of security faster.


I would love to hear your experiences about other issues unique to threat modeling IoT, or where issues weigh differently because of the IoT nature of a system!

(Incidentally, this came from a question on Twitter; threading on Twitter is now worse than it was in 2010 or 2013, and since I have largely abandoned the platform, I can’t figure out who’s responding to what. A few good points I see include:

How Not to Design an Error Message

SC07FireAlarm

The voice shouts out: “Detector error, please see manual.” Just once, then a few hours later. And when I did see the manual, I discovered that it means “Alarm has reached its End of Life

No, really. That’s how my fire alarm told me that it’s at its end of life. By telling me to read the manual. Why it doesn’t say “device has reached end of life?” That would be direct and to the point. But no. When you press the button, it says “please see manual.” Now, this was a 2009 device, so maybe, just maybe, there was a COGS issue in how much storage was needed.

But sheesh. Warning messages should be actionable, explanatory and tested. At least it was loud and annoying.