Blue Green Deployments

I’m surprised that in 2017 more developers and IT departments haven’t heard about blue-green deployment. But apparently that’s been a problem since at least 2010, when Martin Fowler noted that blue-green deployment hadn’t gotten the recognition it deserved. This is a DevOps pattern used widely for testing and deploying websites and webservices with minimal downtime in the cloud. Amazon Web Services now has a Code Deploy template for blue-green deployments. Microsoft lists the blue-green deployment technique first in its Azure Continuous Deployment whitepaper. It’s safe to say that blue-green deployments have graduated from an edge technique to current state-of-the-art.

The idea behind blue-green deployment strategy is simple. When you want to test a new build of your website, you create a copy of your “blue” production instance in a stand-by or “green” environment. Then you deploy the latest code to the new green environment and perform a battery of tests on it. Once the green environment passes your load, performance, security and other tests, your direct traffic from the live “blue” infrastructure to the “green” instead.

You can leave the original blue infrastructure in place temporarily, just in case you want to roll back to the previous build. You could also decommission the blue infrastructure to save operating costs. When it is time for the next release, you make a new blue environment, deploy the release to it, and perform the tests again. If the release passes, you cut over from the currently live green environment to blue.

If releases are frequent enough or if you are doing this with physical servers, you can skip the decommissioning step, and simply alternate releases between the two environments. As long as you make sure to “reset” the standby environment back to a known baseline before your deploy your latest release to it.

There are three huge advantages to performing releases this way:

  1. Your standby environment is a copy of production and fully scaled. You can run full acceptance, performance and load tests on this environment and be guaranteed that it will work the same in production — because it will be production, as soon as you direct traffic to it.
  2. Cutover to a new build is almost instantaneous, since all you need to do is redirect traffic. There should be no downtime.
  3. Rolling back to the previous is almost instantaneous, since all you need to do is redirect traffic to the previous environment. Disaster recovery is easy and your disaster recovery process is tested with every release.

Of course, there are some challenges to this approach:

  1. You do have to have a fully scaled standby environment, which will increase hardware and software costs. You can minimize these in a cloud setting, however, because that standby environment can be decommissioned when it isn’t needed.
  2. You have to invest in DevOps. If you are building and tearing down environments on a regular basis, you’ll need automated scripts for provisioning infrastructure, deploying code, and making configuration changes.
  3. Your platform should be as stateless as possible. When a cutover happens, a user’s next request will be handled by the new environment. Any state that must be maintained from one request to the next must be shared by both environments.
  4. Handling schema updates for databases must be handled differently from regular code deployments.

Of the challenges, the most resistance to the idea usually comes with the hardware and software costs, usually because companies haven’t embraced consumption pricing yet. With physical hardware or perpetual software licensing, you do wind up paying double for your production setup. But for critical applications it’s probably worth the expense.

Schema updates remain a difficult engineering challenge too, but they always have been, regardless of the deployment model.

Here’s a few other links on the blue-green deployment pattern, if you’d like to learn more: