May 24 / 2017
Automate Everything! (Except Platform Upgrades)
firstname.lastname@example.org (Christian Duvall, Group Vice President, Enterprise Services)
AKA: Backwards Progression – Keeping Control of Your Solution Environment
Software development has always had its struggles. From dropping your perfectly ordered stack of punch cards to the nightmare known as "DLL Hell," developers are not strangers to environmental, platform and ecosystem pains.
In today's world, we have plenty of new challenges. Challenges such as dealing with integrations across physical and virtual boundaries or maintaining identity and access control – no matter where you are working. Interestingly, these are all problems that have arisen due to a thematic force in technology: Progress. Over the last month or so, I was reminded of this very force on two individual occasions – and both of these instances were the definite cause of some major struggles for a large development team (but fortunately, not its customers).
The first incident happened when Google automatically rolled out a shiny new version of Chrome – to mostly everyone. In this update, Google chose to remove support for the validation of domains using the common name for x.509 (SSL) certificates. So, certificates that did not come with a subject alternative name were immediately rendered invalid.
As the browsers began to automatically update, you could see the trickle turn into a stream ( … and then into a gigantic wave). The QA team started to raise defects, the developers lost their ability to work, and the infrastructure team had to go into firefighting mode to figure out how to get the certs reissued quickly. Nearly everyone was affected.
Of course, Google published an Intent to Remove in February, but like most companies, the client did not have its devops team looking at platform risks.
Only a couple of weeks later, some new iOS devices made their way into circulation. These new devices wreaked major havoc in a process that was very (very, very) heavily tested over the last couple of months. While prior devices were locked down by policy, the new ones had iOS 10.3.1 on them, which has some show-stopping issues with the WebSocket protocol. Fortunately, this situation was more isolated, but it definitely cost thousands in dev-hours to isolate and mitigate.
What should we take away from all of this? Here are a few tips to make sure you don’t get bitten when your platforms “progress.”
Freeze the Production Versions of Your Platforms
All things that could impact your solution should be locked down to the patch level. This includes OS, browsers, and any other framework that could be updated. Turn off automatic updates by policy, setting or privileges. If possible, freeze development and QA environments as well, keeping things as controlled as possible.
Perform integration testing prior to rolling out any platform upgrade. If your QA process doesn’t include an environment with the ability to isolate integrations, you’ll find yourself behind when it really counts.
Mature Your DevOps Processes
At a minimum, your new processes should include monitoring and proactive risk mitigation for all dependencies in your chain. Many platforms allow for alpha- and beta-level access to upcoming versions. A DevOps team firing on all cylinders will vet versions early, then provide feedback to the vendors to try to block potential issues in their platform upgrades.
A little planning can save you a lot of time and the expense of disaster recovery. Don’t hinder your progress … with progress.