Continuous Deployment and Security
From an operations and security perspective, continuous deployment is either the best idea since sliced bread or the worst idea since organic spray pancakes in a can. It’s all of matter of execution. Continuos deployment is the logical extension of the Agile development methodology. Adam recently linked to an study that showed that a 25% increase in features lead to a 200% increase in code complexity, so by making change sets smaller we dramatically decrease the complexity in each release. This translates to a much lower chance of failure. Smaller change sets also mean that rolling back in the case of a failure state is also much easier. Finally, smaller change sets make identifying what broke unit and integration tests easier and far easier to code review which increases the chances of catching serious issues prior to deployment. All of this points to building systems that are more stable, more reliable, have less downtime and are easier to secure. This assumes, of course, that you are doing continuos deployment well.
In order for continuous deployment (and DevOps in general) to be successful there needs to be consistent process and automation. There are lots of other factors as well, such as qualified developers, proper monitoring, the right deployment tools but those are for another discussion.
Consistent processes are essential if you are to guarantee that the deployment happens the same way every time. To put it bluntly, when it comes to operations and security, variation is evil. Look to Gene Kim’s research (Visual Ops, Visual Ops Security) or more traditional manufacturing methodologies like Six-Sigma for a deep dive into why variation is so very very bad. The short version though is that in manufacturing, variation means products you can’t sell. In IT, variation means downtime, performance issues, and security issues. At the most basic level, if you are making changes and you are making changes to how you make the changes, you create a much harder situation from which to troubleshoot. This translates to longer incident response times and longer times to recovery which nobody wants. Especially in an online business.
The easiest way to keep deployment process consistent is to remove the human element as much as possible. In other words, automate as much it as possible. This has the added advantage of saving the humans for reviewing errors and identifying potential issues faster. It doesn’t matter which automation mechanism you use as long as it’s stable and supports your operating environment well. Ideally, it will either be the same system as currently being used the by the operations and applications teams (e.g. chef, puppet, cfengine) or be one that can integrated with those systems (e.g. hudson/jenkins).
With good check-in/build release messages, you even get automated logging for your change management systems and updates to your configuration management database (CMDB).
Great article. I used this heavily on a side-project last year and became a total convert. We used a git-hook to call Hudson that ran the barrage of tests (rspec, cucumber etc) and then only pushed via git to heroku if everything passed. We often deployed 10 – 15 times a day as a two man team! If you ever committed any code that failed a test you would get a mail and the CI would fail until you fixed it and made all tests pass.
I am writing about this in a book “Practical Software Security” now.
I tried this as work by making the dev team deal with deployment until it became so painful they wanted to automate it BTW. It was an interesting experiment but MSFT is so atypical I refused to be drawn on any conclusions 😉
Mark,
It’s great to hear of more success stories, especially at a large shop like MSFT. It goes to show that being enormous doesn’t get in the way of experimentation and new frameworks/techniques.
Your first paragraph needs more thinking IMHO. Basically what you’re saying is that a smaller change leads to less complexity and less risks – that’s obvious.
But do 25 small changes (1% features) really bring less complexity/mess/technical debt than one bigger (25% features) change? Sorry but until I see hard figures I won’t take that for granted.
Another way to look at it: if we deploy after every commit, then there are less risks than if we deploy only every 25 commits? It’s the same code at the end, isn’t it? Sorry, but deploying more often doesn’t automagically correct bugs in your code.
Granted, continuous deployment forces you to automate more (esp the testing), which is what drives quality up (ie it is not continuous deployment per se). Nothing prevents you from performing the same automation without continuously deploying.
Moreover, as you explicitely mention security, I am still to see how you integrate human knowledge (activities such as security testing, guided by manual analysis and review, not even mentioning gates) into such a continuous deployment process.
Its really interesting. but i have to say you have brought out an important issue regarding security .. and deployment.
Making changes predictably and automating changes is good. Breaking releases down into small pieces is good. But nothing that you say here shows how Continuous Deployment makes a system easier to secure or more secure.
Everything that I’ve seen so far on Continuous Deployment, and evidence from the continuing security and privacy problems at sites that use it, points towards it adding to security problems, not minimizing them.
I wrote about this almost two years ago:
http://swreflections.blogspot.com/2010/03/continuously-putting-your-customers-at.html
The problems with Continuous Deployment today are the same as they were then. Passing a set of automated tests isn’t enough to prove that a system is secure – at least not with the kinds of tests that we have available to us today.