There’s an interesting article by Phil Bull, “Why you can ignore reviews of scientific code by commercial software developers“. It’s an interesting, generally convincing argument, with a couple of exceptions. (Also worth remembering: What We Can Learn From the Epic Failure of Google Flu Trends.)
The first interesting point is the difference between production code and exploratory code. Production code is intended to do the same thing every time it’s run. Exploratory code can reasonably be intended to do different things from year to year or even day by day.
The cost of going from exploratory code to production code is large. Fred Brooks taught us that a program is 1/9th of a product. This distinction really hurts in the ML space, where ML specialists are used to producing exploratory code (and models) which are often pushed to production.
The world doesn’t have a large amount of production-quality pandemic modeling code. We can bemoan this state of affairs ad nauseam (literally), but we can’t rapidly change it.
The second interesting point relates to test suites, and here, I actually disagree respectfully. Of all the practices discussed, of maintainability, documentation, error checking, I think that automated system tests, included in the ‘makefile,’ are the least excusable. Forgetting to run tests is human. Failing to understand the impact of a change across a program is easy.
When the models were used purely for science, the errors were egg on the scientists’ faces. When the models were used purely for science, the pressure to get results was, relatively speaking, negligible. Many of the demands, for documentation, maintainability, and the like, are fine ideas. In a perfect world, each might be addressed. But like demands for security, many of these things involve tradeoffs – work to improve the model’s accuracy might be sacrificed for work to improve its maintainability.
Talking about tradeoffs, I’ll digress to add that we live in a world where sites like Twitter and Facebook reward ‘engagement’, by which I generally mean yelling at each other, rather than thoughtful commentary. Responding to those attacks is additional work that has to be prioritized against the work that will get us out of this pandemic faster, with fewer lives lost.
Interesting work involves making tradeoffs between different meanings of quality. What qualities matter, and how to best achieve them, is obvious far less often than we think.