Sharing Research Data

I wanted to share an article from the November issue of the Public Library of Science, both because it’s interesting reading and because of what it tells us about the state of security research. The paper is “Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results.” I’ll quote the full abstract, and encourage you to read the entire 6 page paper.

The widespread reluctance to share published research data is often hypothesized to be due to the authors’ fear that reanalysis may expose errors in their work or may produce conclusions that contradict their own. However, these hypotheses have not previously been studied systematically.

Methods and Findings
We related the reluctance to share research data for reanalysis to 1148 statistically significant results reported in 49 papers published in two major psychology journals. We found the reluctance to share data to be associated with weaker evidence (against the null hypothesis of no effect) and a higher prevalence of apparent errors in the reporting of statistical results. The unwillingness to share data was particularly clear when reporting errors had a bearing on statistical significance.

Our findings on the basis of psychological papers suggest that statistical results are particularly hard to verify when reanalysis is more likely to lead to contrasting conclusions. This highlights the importance of establishing mandatory data archiving policies.

Despite the fact that the research was done on papers published in psychology journals, it can teach us a great deal about the state of security research.

First, the full paper is available for free online. Compare and contrast with too many venues in information security.

Second, the paper considers and tests alternative hypotheses:

Although our results are consistent with the notion that the reluctance to share data is generated by the author’s fear that reanalysis will expose errors and lead to opposing views on the results, our results are correlational in nature and so they are open to alternative interpretations. Although the two groups of papers are similar in terms of research fields and designs, it is possible that they differ in other regards. Notably, statistically rigorous researchers may archive their data better and may be more attentive towards statistical power than less statistically rigorous researchers. If so, more statistically rigorous researchers will more promptly share their data, conduct more powerful tests, and so report lower p-values. However, a check of the cell sizes in both categories of papers (see Text S2) did not suggest that statistical power was systematically higher in studies from which data were shared. [Ed: “Text S2” is supplemental data considering the discarded hypothesis.]

But most important, what does it say about the quality of the data we so avariciously hoard in information security? Could it have something to do with higher prevalence of apparent errors?

Probably not. It might surprise you to hear me saying that, but hear me out. We almost never have hypotheses to test, and so our ability to perform statistical re-analysis is almost irrelevant. We’re much for fond of saying things like “It calls the same DLLs as Stuxnet, so it’s clearly also by the Israelis.” Actually, there are several implied hypotheses in there:

  1. No code by different authors calls the same DLL
  2. No code calls any undocumented APIs
  3. Stuxnet DLLs are not documented

Stuxnet being written by the Israelis is clearly not a hypothesis, but a fact, as documented by Nostradamus.

More seriously, read the paper, see how good science is done, and ask if anyone is holding us back but ourselves.

Thanks to Cormac Herley for the pointer.

Yes, Google+ Is a Failure

One of the most common bits of feedback about my post “Google+ Failed Because of Real Names” is that Google+ is now a huge service, and that the word failed is an exaggeration, or a trick of the rhetorician.

Some folks might advise me to stop digging a hole, put down the shovel and walk away. But
I’m going to pick up that shovel, and try to convince you that I’m not exaggerating. Google+ may not be a New Coke level failure, it may be a successful failure, but it’s a failure nonetheless.

The goal of Google+ is to dominate the social network space, replacing Facebook, LinkedIn and Twitter, and building a moat around Google’s core business of advertising. That moat ought to consist of Google having more information about you than the CIA does (ok, that’s hyperbole. The CIA can’t store that much info). The moat ought to be that Google can show your wallet-name ads that tug at your wallet-strings.

Do you really think that Google wanted to enter this market to play second-fiddle to Facebook? Do you think that Google is happy that Facebook is going to pop out in the biggest IPO in history real soon now, giving them a massive war chest?

I think that the answer is fairly obviously a no. Now, you could argue that Google+ is en route to topple Facebook. That Google will take three tries to get it right or something, like they did with Search and Mail and Maps. (Oh, wait, they didn’t take three tries on any of those.)

What’s more, I don’t think that no was pre-ordained because of Facebook’s massive user-base. People were willing to show up at Google+ and explore. And that exploration rapidly foundered on the nymwars.

I think the system could and should have done better, if Google wasn’t so hell-bent on controlling what name people could display for themselves.

Twitter Weekly Updates for 2012-01-29

Powered by Twitter Tools

Aviation Safety

The past 10 years have been the best in the country’s aviation history with 153 fatalities. That’s two deaths for every 100 million passengers on commercial flights, according to an Associated Press analysis of government accident data.

The improvement is remarkable. Just a decade earlier, at the time the safest, passengers were 10 times as likely to die when flying on an American plane. The risk of death was even greater during the start of the jet age, with 1,696 people dying — 133 out of every 100 million passengers — from 1962 to 1971. The figures exclude acts of terrorism.

There are a number of reasons for the improvements.

  • The industry has learned from the past. New planes and engines are designed with prior mistakes in mind. Investigations of accidents have led to changes in procedures to ensure the same missteps don’t occur again.
  • Better sharing of information. New databases allow pilots, airlines, plane manufactures and regulators to track incidents and near misses. Computers pick up subtle trends. For instance, a particular runway might have a higher rate of aborted landings when there is fog. Regulators noticing this could improve lighting and add more time between landings.

(“It’s never been safer to fly; deaths at record low“, AP, link to Seattle PI version.)

Well, it seems there’s nothing for information security to learn here. Move along.

Google+ Failed Because of Real Names

It’s now been a few months since the launch of Google+, and it’s now fairly clear that it’s not a mortal threat to Facebook, or even Orkut. I think it’s worth thinking a bit about why Google+ isn’t doing better, despite its many advantages. Obviously, Google wants to link Google+ profiles to things in the physical world that matter to its paying customers: advertisers. To me, the most interesting part is how the real name issue acted as a lens, focusing attention on Google’s plans for the service, the horse-trade Google is asking people to make, and Google’s weighting of a communications platform versus having an online Disneyland where nothing offensive is allowed.

There’s a lot that Google gets right in Google+, most notably the idea of circles. Circles could be a great way for Google to mirror how people interact, and let them present different things to different sets of people, under their control. It’s a simple, understandable metaphor.

But Google hasn’t derailed Facebook, because Google shot themselves in the foot at launch. That’s why TechCrunch has articles like “Raise Your Hand If You’re Still Using Google+.” Let’s be clear, this was an own-goal, and it was avoidable. I know of at least two Googlers who left because they felt Google wasn’t living up to its own values in the internal debate. Google has put their desire to have a real-name driven internet ahead of their user’s desires. Maybe a free name change would make that ok? But it’s not ok, and name changes won’t make it ok.

Within days of Google+ being launched, the positive press was being driven out by stories about the “Nymwars.” A lot of it revolved around Google having claims that your displayed name could be what people called you, but as Skud clearly documented, that was a bizarre and bureaucratic lie. But documenting up your “government name” isn’t enough, as people like 3ric have documented. (It’s pronounced “Three-Rick,” and that’s how I’ve always known him.)

As bad as it is to tell people what they can write on the “Hello, My Name is” badges, it’s worse to be inconsistent and upsetting around something as personal as a name, or to tell someone that a Capulet they’ll no longer be. The very worst part is that Google managed to do it at the wrong time.

What Google did by focusing attention on “real names” when they did was to take attention from the really cool aspects of Google+, and draw it to an emotionally laden set of battles that they can’t win. They managed to calm the waters a bit by declaring that they’d “support” other names, leading to this awesome bit of politically-incorrect-calling-bullshit: “EFF declares premature victory in Nymwars.”

Another way to see this is Google knowingly burned an awful lot of goodwill with one of their key communities, techies. The way that they did it hampered Google+ during its launch, preventing it from getting the momentum it probably deserved.

They did all that in order to get one unique name for everyone. Oops, wait, there’s lots of people named Mike Jones. They did it to get name that links to “the real world you.” They wanted to get a commercial advantage for Google, at the expense of people’s ability to choose how they present themselves.

It hasn’t worked out, and yesterday, Google announced the next set of changes. (EFF has some comments in “Google+ and Pseudonyms: A Step in the Right Direction, Not the End of the Road.”)

Most interesting to me, Yonatan Zunger, Chief Architect of Google+ says:

We thought this was going to be a huge deal: that people would behave very differently when they were and weren’t going by their real names. After watching the system for a while, we realized that this was not, in fact, the case. (And in particular, bastards are still bastards under their own names.) We’re focusing right now on identifying bad behaviors themselves, rather than on using names as a proxy for behavior.

That’s gotta hurt.

The key takeaway: Google spent a huge amount of goodwill on an attractive, but untested idea, which Yonatan summarizes as “Bastards won’t be bastards under their real name.” (As an aside, there’s a lean startup lesson there, but Google has yet to pivot.) You shouldn’t make the same mistake.

Names are personal. They shouldn’t be subject to policies for vague, untested reasons. They shouldn’t be subject to policies at all unless your idea is even better than Google can do. Don’t make your new thing fail by sacrificing it on the altar of real names.

Some follow-on posts: “Yes, Google+ Is a Failure” and “More on Real Name Policies.”

Vendor shout out: Gourmet Depot

You know those random parts of kitchen appliances that break, and the manufacturer is no longer making, and so you buy a new one that breaks after 4 months? Yeah, you know what I’m talking about.

Next time, look to Gourmet Depot and see if they have replacement parts.

It was easy to find their recommendation for our specific coffee machine, the recommended pot fits great, and was cheap.

Check out Gourmet Depot next time you’re in this bind.

Kudos to Ponemon

In the past, we have has some decidedly critical words for the Ponemon Institute reports, such as “A critique of Ponemon Institute methodology for “churn”” or “Another critique of Ponemon’s method for estimating ‘cost of data breach’“. And to be honest, I’d become sufficiently frustrated that I’d focused my time on other things.

So I’d like to now draw attention to a post by Patrick Florer, “Some Thoughts about PERT and other distributions“, in which he says:

What follows are the results of an attempt to answer this question using a small data set extracted from a Ponemon Institute report called “Compliance Cost Associated with the Storage of Unstructured Information”, sponsored by Novell and published in May, 2011. I selected this report because, starting on page 14, all of the raw data are presented in tabular format. As an aside, this is the first report I have come across that publishes the raw data – please take note, Verizon, if you are reading this!

So I simply wanted to offer kudos to the Ponemon Institute for doing this.

I haven’t yet had a chance to dig into the report, but felt that given our past critiques I should take note of a very positive step.

Twitter Weekly Updates for 2012-01-22

  • What's the best history of @Defcon Capture the Flag? (cc @rileycaezar @thedarktangent ) #
  • RT @thedarktangent What's the best history of #DEFCON Capture the Flag? @adamshostack asks, & we need to update the site. Send your links! #
  • RT @jccannon7 My sci fi book launches today. More info at #
  • RT @mortman New posts: The "Continuous Deployment and Security" of "Chocolate Waffles" #
  • RT @_nomap CCTV operators in London have remote access to the surveillance network, so that they can work from home. #
  • How did @kuow find a guest on technology who's not familiar with @GreatDismal (William Gibson)? Impressive! #
  • Congrats to @metrixcreate for being the first to show @greatdismal a 3d printed object. #
  • RT @TimKarr Not so free at last: Sony issues takedown notices for those posting MLK's "I have a dream speech" #
  • Is it just me or is AT&T wireless displaying a fail whale in Seattle? #
  • My friend @votescannell is running for the Alaska State Legislature. You should follow her & vote for her! #
  • RT @pmocek Eric Rachner settles his suit against Seattle over 2008 false arrest by the ever-out-of-control @SeattlePD #
  • NoScript routes around Wikipedia blackout. There's a metaphor here. #
  • New blog: "Seattle in the snow" #
  • RT @josephmenn Former HBGary Federal CEO @aaronbarr, who drew @LulzSec fire for doxing anons, has left his subsequent job at Sayres #
  • New blog: "Seattle in the Snow" Give us your 2mm! 🙂 #
  • RT @hmason E-mail is still the biggest social network << Always will be. De-centralized, choose your name, many accounts per person. #
  • MT @rmogull Finished a call on exec protection. Not something I usually cover, told them hire an expert << Not thought leader behavior! 😉 #
  • New bloggage: "More on the weather" #
  • Kahneman prefers short words since long ones take up more mental space, & slow thinking. Jargonistas, take note! #
  • RT @oneraindrop Money Games for teaching kids financial literacy, just as important to teach parents budgets, saving #

Powered by Twitter Tools