The COVID testbed and AI

There’s a really interesting article in MIT Tech Review, Hundreds of AI tools have been built to catch covid. None of them helped.

Oops, I think I gave away the ending. But there’s a lot of fascinating details:

  • Many unwittingly used a data set that contained chest scans of children who did not have covid as their examples of what non-covid cases looked like. But as a result, the AIs learned to identify kids, not covid.
  • Because patients scanned while lying down were more likely to be seriously ill, the AI learned wrongly to predict serious covid risk from a person’s position.
  • AIs were found to be picking up on the text font that certain hospitals used to label the scans.

The most important internal problem is that no one seemed to have asked “what could go wrong,” or “are there things in the data other than what we care about?” Of course no one asks that, because there always are, and the nifty thing about machine learning is that it can sometimes overcome such problems.

That’s exacerbaged by the lack of inter-disciplinarity on teams. Some teams were strong on ML, some were strong on medicine. Developing a good working relationship, especially across inter-disciplinary boundaries, takes time and energy.

Lastly, the most important problem is not a lack of forethought, it’s a lack of independent analysis. Models are being developed for a variety of reasons, none of which lead to people wanting to say “let’s have someone else look at this.” That’s expensive. It’s slow. It may result in critiques of your work. It may violate corporate desire to keep secrets. But as models get used to make more and more decisions, we’re going to need to sort out how to do independent evaluation at scale.

If we don’t, the natural tendency of systems will be to privatize gain, and externalize the costs. That’s not a critique of anyone’s motives, just a natural result of incentive distribution.