A Few Data Points

First, for those who might have missed it, Google has released Google Refine, a free tool for cleaning dirty data sets.  It allows you to pull in disparate data, then organize and clean it for consistency.

Next, some interesting thoughts on how “anonymized” data sets aren’t, and some thoughts on the implications of this from a risk perspective.  None of this is groundbreaking, but it’s nice to see some sane thinking about two facts that aren’t going away, no matter how much people might like them:  that data will continue to be accumulated and that it will be shared with varying levels of consideration for the risks of doing so.

Finally, yet another real-world example of risk homeostasis at work:  People who take vitamins make poorer health decisions in other areas.  Based on the number of times I’ve been asked questions along the lines of, “I don’t need to worry about x because I’ve {patched|installed anti-virus|switched to Apple|etc.}, right?” I’d say this still holds true for computing, too.

Now if you’ll excuse me, I have to go clean my anonymized data set so I can share it far and wide, which is OK since I’m going to encrypt it before I send it, right?