Shostack + Friends Blog Archive

 

Another critique of Ponemon's method for estimating 'cost of data breach'

Adam just posted is general critiques of the annual US Cost of Data Breach Study.  I agree with his critique about survey methods, but I have more fundamental objections to their methods used to estimate ‘indirect costs’ due to lost customers (‘abnormal churn’) and the cost of replacing them (‘customer acquisition costs’).

A noble effort, but…

Before I start chopping it up, let me say that I think their annual survey is a good effort and it’s positive that they can get sponsorship and also readership for the results.  I think they have good intentions and try to give a fair, balanced, and reasonable estimate.  Our field would be better off if there were similar data gathering efforts in other areas of InfoSec.  I also don’t believe that any of the errors are due to intentions to ‘spin’ or mislead.  It looks like they didn’t have sufficient expertise on their team in business finance, marketing analysis, and economics.

But I see some serious problems with their methods.  This is a big deal since ‘indirect costs’ make up a majority (68%) of their estimate of total costs.

Problem #1: A fog of buzz words

If their data and analysis were bulletproof, then maybe we could forgive sloppy use of terms.  But it isn’t bulletproof and their use of terms is actually misleading because it gives the impression that the method is well established and well executed when it really isn’t.  Furthermore, it’s a sign that whoever is doing this part of the analysis doesn’t know what they are talking about.  Examples:

  • The survey design relied upon a shadow costing method used in applied economic research.” (p. 36) — There is no such method as ‘shadow costing’.  Do a web search if you doubt me.  The only examples of ‘shadow costing’ are economic studies that use ‘shadow prices‘ multiplied by input quantities to derive ‘shadow costs’ for certain manufacturing or service process.  Having just completed a Mathematical Economics class last semester, I can assure you that the Ponemon method has nothing to do with shadow prices or shadow costs.
  • Utilizing activity-based costing…” (p. 3) and “The diagram below illustrates the activity-based costing schema…” (p. 36) — They do not use activity-based costing.  Activity-based costing (ABC) is a way of allocating overhead costs by measuring some ‘activity’ in operations that are thought to drive those overhead costs.   The ratio of ‘activity’ for each business unit to the total is used to allocate the overhead cost to that business unit (or product line or customer segment or what ever).  You can read more about ABC here and here.   How big a flub is this?  Big.  It’s like labeling a signature-based AV software as an ‘expert system’.  Anyone who uttered such a statement would immediately be dismissed by security experts.  What ever the Ponemon method is, it’s not ABC.  Just because costs are related to activities does not mean you are doing activity-based costing.
  • “…most companies experience opportunity costs associated with a breach incident...” — No, these aren’t ‘opportunity costs’.  The term ‘opportunity cost‘ has very specific meaning in microeconomics.  Basically, an opportunity cost is the cost of giving up your next-best alternative when you make a decision.  They misuse the term here to refer to costs that expected in the future, i.e. beyond the historical frame of the breach incident and post-incident remediation.  Their use of the term here is just sloppy and could easily mislead someone who doesn’t know economics.

Problem #2: Mixing accounting costs with economic costs

This is a subtle but fundamental problem, and it’s why people get degrees in accounting and economics.  Accounting costs are those that appear in a financial statement somewhere and follow specific costing rules, e.g. GAAP.  They have already occurred (historical costs) or they are forecasted to occur (pro forma costs).  In the Ponemon method, they list four categories of accounting costs:

  1. Detection or discovery
  2. Escalation
  3. Notification
  4. Ex-post response

Now it’s probably true that most organizations do not have explicit accounts for these costs so they have to be derived from other accounting costs.  But it’s pretty easy to slice and dice accounting data (i.e. general ledger entries) to get decent estimates of these costs.  It’s also possible estimate costs by using per-resource costs (labor cost per hour) multiplied by the usage of those resources (hours to resolve an incident).  In the Ponemon survey, they ask their point-of-contact for their estimate of these costs.  That’s probably OK, given that the point-of-contact is a privacy/security person directly involved in the incident.

But then they mix in future economic costs (what they mislabel as ‘opportunity costs’):

  1. Turnover intentions of existing customers
  2. Diminished new customer acquisition

(Leave aside for a moment that they are asking about “intentions” of customers to defect.  Adam discussed this in his post.)

These are both economic costs.  (See this slide deck, slide #2.  The wikipedia article on this topic is not good.)  Basically, economic costs are all in the future.  There is no such thing as ‘historical economic costs’.   Only cash flows count in economic costs — no ‘intangibles’, no depreciation, no ‘good will’.  Those can only be included in the form of future cash flows discounted for time and risk (and uncertainty).  Economic costs include opportunity costs (see above), which are the cash flows associated with the next-best alternative.  However, opportunity costs will never appear on a financial statement, now or in any future.  Economic costs are incurred when the commitment is made, not when they are recognized in the accounting system.

Most important: All cash flows are discounted for the time value of money and the riskiness of the cash flow. This feature is essential for rational deicsion-making over time and over risky alternatives, but it also guarantees that no estimate of economic costs will ever equal the corresponding accounting costs because accounting systems to not adjust for the time value of money or risk.  Finally, a full estimate of economic costs includes the present value of ‘real options’ and  should be adjusted for risk (i.e. derating by using the costs of insuring against unexpected/extreme events, cost of lowered credit rating, etc.).

The element of their method that specifically invokes economic cost is ‘lifetime value of customers’ (LTV).  In their method, the cost associated with lost customers is estimated by multiplying the % of breached customer records that will defect (‘abnormal customer churn rate’) multiplied by  LTV.  (LTV originated in direct marketing in the 1980s.  Wikipedia has a decent article that explains it. Here’s a good demonstration.)   LTV is a net present value, discounted by the cost of capital associated with the riskiness of the cash flow.  It’s an economic profit, not an accounting profit.

Putting this all together, either the costing method should use only accounting costs (historical and/or pro forma) or it should only use economic costs (prospective discounted cash flows, risk-adjusted).  Otherwise, the numbers don’t add up, literally.

Problem #3 : Decision polices matter

(i.e cheap short-sighted bastards can have lower costs than prudent socially responsible managers)
Here’s another problem with mixing accounting costs with economic costs.  Let me illustrate this with a story.  There are two companies — Cheap Bastard, Inc. (CBI) and Nice Guys R Us (NGRS).  CBI has decision policies to spend as little as possible on InfoSec, especially in incident detection and incident response.  They push all liability onto their customers, suppliers, and contractors.  They systematically downplay evidence of breaches, and downplay the severity or costs of breaches.  They avoid forensic analysis if they can get away with it.  And so on.

In contrast, NGRS puts a lot of attention on pro-active security and detection and goes out of it’s way to mitigate the costs of insecurity on it’s ecosystem.  They are especially eager to spend money post-breach to restore public trust and to learn from the event to get at root causes.

How would CBI and NGRS show up in the Ponemon survey?  My guess is that NGRS would have cost per record of 2X or 3X greater than CBI, primarily because CBI will have much lower accounting costs (as covered by the survey) by decision policy. It’s also likely because CBI can ‘safely’ ignore the probable future costs of their rapacious behavior (i.e. class action lawsuits, regulatory penalties, even larger security breaches).   I put ‘safely’ in quotes because such corporate behavior is only safe until you get caught or get screwed.

I don’t see a way around this if you only use accounting costs.  If you use a sufficiently broad framework for economic costs, then you stand a chance of understanding the ‘total costs’ that exposes the riskiness of CBI’s decision policy.

Problem #4 : Do respondents really know anything about customer LTV or ‘churn’ intentions?

I’m surprised that no one has brought this up before.  As someone who has calculated and published LTV metrics for a business unit, I can say with some confidence that almost no one who didn’t read those reports would have been able to guess the LTV of customers, including accounting people who knew about the cost and revenue categories but never put them together into LTV.  SWAGs (as in def. 5) were could be off by an order of magnitude.

My opinion is that asking a privacy/security/incident response specialist to estimate LTV is fundamentally flawed unless that person has access to their own company’s management reports that include LTV.  It might be possible to elicit useful estimates from them after their estimates are calibrated through exercises, including exercises that estimate the weighted average cost of capital, average lifetime of a customer, acquisition costs, retention costs, etc.

Same goes for ‘churn’ rate (percentage of customers who leave because their records were breached).  To estimate ‘abnormal churn’ due to the breach, the point-of-contact would need to know something about ‘normal churn rate’ and, as Adam says in his post, the variability of churn rate.  If churn rate varies widely from year to year, then a small increase in churn due to a data breach would be washed out by the other factors driving variability.

It would be much more useful to find out if the company increased their marketing budget as a direct consequence of a given data breach.  If they did, then this would be credible evidence that the number and value of lost customers was great enough for the company to change it’s spending decisions.

Problem #5: Leaving out significant cost categories

This problem may be bigger than all the others combined.  If they left out major categories of cost, then their estimate of cost per breach could be off by 50% or more.

To answer this question, you first need to decide between estimating accounting costs vs. estimating economic costs.  Every economist and every B-school professor will advise you to estimate economic costs.  It may be useful to analyze historical accounting costs as a way to estimate future economic costs, but that is a separate exercise.

As an economic cost analysis, it might be best to frame the decisions this way:

  • Given a breach of customer data of size X records (same size as historical breach), how would the firm’s economic costs change vs. no breach?

I’ll point out two categories of cost that this analysis would include that are currently excluded in the Ponemon survey.

Cost of additional spending on security
If a firm incurs incremental spending on security due to a breach, shouldn’t those costs be included in the ‘cost of a data breach’?  This goes back to my story about fictional companies CBI and NGRU, above.  If CBI is likely to spend more to fix their crappy security in the future if they experience a large breach today, then they will be forced to ‘pay the piper’ in economic terms and their decision policy of spending as little as possible won’t help them avoid the full costs of data breaches.  This will also capture the cost of half-measures, since trying to get off cheap on the security upgrade will still show up as higher expected costs for future breaches.

Of course, this raises sensitive political issues with respondents to the survey.  They may be reluctant to answer questions about actual spending on improvements to security or, even more, to speculate about possible future costs.  For example, what if a company’s outsourcing strategy is hopelessly insecure and the firm is forced to reverse those decisions and insource those processes.  What if a company is forced to exit a line of business because the security risks and costs are too high?  What if a data breach leads to process changes that diminish or eliminate their key competitive advantage?

Factoring these costs could increase the cost of a breach by 0.5X to 10X.  I would make it much harder to do cross-company and cross-industry comparisons.  But wouldn’t it make the true economic costs of data breaches more relevant to management decision-making?

Social costs
The other ‘elephant in the room’ is the cost to consumers or employees for having their private data breached.   Add these all up and you get ‘social cost’, or appropriately adjusted,  ‘social welfare‘.  I understand that the Ponemon survey is estimating only costs that are incurred by a single organization that experiences the breach, not by any other stakeholders in that firm’s ecosystem.

There are plenty of studies on the direct and indirect costs of identity theft and the perceived costs of breaches of privacy.  Drawing on these studies might make it possible to estimate the social cost of a breach.  Then the estimation question is “What portion of social cost will the firm have to bear?”

The answer to this question depends on firm policy (see CBI and NGRU story, above), the legal system, the regulatory system, and also the legislative system.  Basically, if a firm or collection of firms consistently and egregiously impose large costs on their customers or employees, then one or more of these other social/political mechanisms might kick in to impose an ‘equity remedy’.

The most immediate remedy, from the American firm’s point of view, is a class action lawsuit.  Of course, estimating the likelihood of getting sued, the damages sought, and the likelihood of losing such a suit is risky business :-).  But just because it’s difficult to estimate with precision should it be excluded?

Again, including this cost category might increase the cost per breach by 2X to 10X in some cases.  But it might also shift management attention to crucial questions such as “What is our role in our value network regarding information security and risk?”

Problem #6: Unsupportable inferences

Given that their survey method is not statistically robust (see p. 33), they do not have sufficient confidence to make the inferences summarized on p. 28.   I won’t go through these one by one, but anyone who has done statistical sampling and inference knows how sample size and variability affect confidence intervals.  If the difference in question does not exceed the confidence interval, then you cannot support the inference from the data.  The best they can do is say, “we say X% of companies report Y, vs. A% of companies reporting B.  This suggests that…”.  All such suggestions would then need to be subjected to additional tests.

Problem #7: Is ‘Cost per Record’ the best measure?

It appears that only a few costs truly vary by the number of records breached.  These include costs of ‘notification’ and some of ‘ex-post costs’.  But ‘discovery’, ‘escalation’, and ‘indirect costs’ are mostly independent of size of breach measured by number of records.  Some might be fixed costs that are independent of the size of breach.  Some might be increasing functions, perhaps relative to some threshold of that defines ‘big’ or ‘material’ (to use the accountant’s term).

This problem may not be significant compared to the others.  I just think it needs to be justified by comparing it to alternative formulations.

Summary

The summary result ($204 per record) reported in the Ponemon survey is not reliable.  No one should rely on the absolute value of this measure.  Some of the relative measures might be informative, especially the direct costs that the point-of-contact respondents are qualified to answer.  Trend analysis might be somewhat informative.  None of the recommendations reported (i.e. the value of hiring outside IT security consultants) can be supported by statistically significant inferences.

To get a reliable measure of Cost of a Data Breach will require substantial revision to the survey instrument, sampling method, analysis methods, and reliability controls.  I’m guessing that this is beyond the appetite of PGP, the sponsor of the survey.

Call to action

What would it take to launch a Version 2.0 of this study with more robust methods and a stronger team of experts to execute it and analyze the results?  There’s no mystery about how to do Version 2.0.  The only obstical is resources and commitment.

<Addendum:  For a related discussion, see my previous post: “Cost of a Near-miss Data Breach“>

6 comments on "Another critique of Ponemon's method for estimating 'cost of data breach'"

  • Chris Hayes says:

    Well stated sir! A percentage breakdown of what they refer to as “direct, indirect, and opportunity costs” would be welcomed. I have done some analysis (deconstructing response costs / LTV) in this area – and most of the financial folks I have talked to care most about “green” or hard dollars being spent to augment the efforts of internal staff. The internal spend does matter as well – but they consider this effort part of our jobs. Finally, it’s hard to mix data by industry up and still maintain some level of meaning. With more access to their dataset – and some right-sizing – some of this analysis could be useful for simulation / modeling.

  • shrdlu says:

    Very nice work. I have a problem with the whole “cost per record” metric myself, because it implies that all of the breach impact depends on the sheer number of records compromised.

  • Adam says:

    I wanted to comment on this:

    It would be much more useful to find out if the company increased their marketing budget as a direct consequence of a given data breach. If they did, then this would be credible evidence that the number and value of lost customers was great enough for the company to change it’s spending decisions.

    There’s an alternate interpretation, which is that management fears lost customers and decided to spend.

  • Adam says:

    I think it’s worth mentioning two other issues with the report, which are the anchoring value of the $197/$204 reports. At this point, they have become touchstones, and it’s not hard to imagine survey recipients thinking about their total costs and comparing.

    Having been through breaches, they rarely feel ‘minor,’ and so survey recipients may look for ways to ensure that their costs are greater than their anchor points. That could explain the tendency of the ‘customer churn’ numbers to be the only rising numbers.

  • Al says:

    I agree with all of this except the intentions. Let’s call a spade a spade, can we? Ponemon has historically been rife with some of the biggest charlatans to ever grace our industry. Countless vendors buy this research and use it to sell products to corporations.

    I recently had a vendor quote me this exact number, and say “If you lose 500,000 records, this will be a cost of $10 million.” When I corrected him that by his method of calculation the cost would be $100 million, he got embarrassed and realized how inflated the numbers were.

    It’s all nonsense, and Ponemon is making money hand over fist doing it.

  • Heath says:

    Russell, thanks again for your perspective here! I’m communicating to my managers the straight dope on some of Ponemon’s other work from 2010 using the same methods. And the hits keep coming. In January 2011 they published, “The True Cost of Compliance,” using the same methods. I think it may suffer from some of the same problems you mention above, despite good intentions.

    Chris Hayes mentioned needing more access to their data sets. They did some of that in this new paper. Again, good effort, but….On page 26 they do go as far as to offer their correlation coefficients for the implied indirect relationship between their SES scores and “non-compliance cost.” At -.3 it’s possible they support some sort of weak indirect relationship could exists, but I have difficulty swallowing the headline conclusion they reached using correlation on page 4. The narrative in the paragraph is more nuanced. If anyone has taken a look at it I’d love the opinion of someone else on it. They also offered a couple other of their cost data sets in this compliance paper.

    I hate to harp on their obviously massive efforts too much, but I have my managers coming to me asking for perspective on these papers.

    “The True Cost of Compliance” http://www.tripwire.com/ponemon-cost-of-compliance/

Comments are closed.