Many teams using cloud computing are adopting immutable servers to simplify configuration management and improve reliability of their infrastructure as code (IaC) systems. The basic premise is that, rather than making a configuration change on a running server, a new server is created with the change. This allows the new server to be tested before it goes live, but now the team needs a more sophisticated process for building and testing server images.
The time it takes to make a change through immutable servers can be a challenge. I'm going to share some techniques for keeping the change process quick. To do this, first let’s consider how immutable servers are built.
Baking immutable server images
The purest implementation of immutable servers is to bake all of the server’s software and configuration into a server image and use that to create new servers. For example, a team running applications on Tomcat has a Packer template file that is used to build a separate AWS AMI image for each application. It installs the JVM, the Tomcat server, and the latest version of the relevant web app. When there is a new version of the web app or an update to the JVM, Tomcat server, or OS packages, Packer is used to build a new version of the AMI. Each running server for that application is replaced by a server built from the new AMI.
New AMIs can be thoroughly tested before being rolled out to production. If the Packer template file is managed in version control, changes can automatically trigger a CI or CD server job. This job spins up a test server from the new AMI and runs automated tests against it using a tool such as Serverspec. If the server fails the testing, the AMI will be rejected, but if it passes, it can be rolled out, either automatically or after a human approves it for use.
Release techniques such as blue-green deployment or canary releasing can be used so that servers are replaced without interrupting service. This is essential with immutable servers because servers are replaced frequently.
Where does the time go?
The normal process for creating a server image, including automatically testing it, includes the following steps:
Boot a server instance from an origin image (alternately, boot it from the previous version of the image).
Run scripts or a configuration tool to update and configure the server instance into the desired state.
Save the server instance to a new server image.
Boot a test server instance from the new image.
Run automated tests against the test server instance.
Note that steps 1-3 can be managed using Packer in my previous example, which would ideally run in a CI or CD tool. Steps 4 & 5 would typically be a separate job in the CI or CD tool and would then trigger jobs to roll out the change to other environments.
Booting server images tends to take the most time. With an IaaS cloud platform such as AWS, it can take a few minutes from making the API call to create the instance to being able to connect to the instance to run configuration tools. So this process offers the most opportunity for optimization, especially since we’re doing it twice.
The process for updating and configuring the instance can also take a while, depending on how much is done. For example, running apt-get upgrade -y can take quite a while, as can downloading and installing large custom applications.
Potential solutions to make images quickly
The following are solutions that teams may consider to build and test server images more quickly.
Test while baking
An obvious optimization for the above process is to run automated tests on the instance before saving it as a new image—inserting step 5 (testing) between steps 2 and 3. This would cut out the time to boot a new instance and may work well in some cases.
However, in other situations, tests may make changes to the instance, and at times teams won't want those changes included in the server image that is used to create production instances. Cleaning up files, user accounts, keys, etc. may mitigate this problem.
Build images on chroot
Rather than booting a server instance to configure it, it may be possible to mount a boot disk onto an existing machine (i.e. the CI/CD agent running Packer), and make changes to it as a static directory structure. Running installation and update tools in a chroot jail can help make this possible. The Packer amazon-chroot builder can do this for AWS AMI images.
Reduce test bloat
Automated test suites can become heavy and slow-running over time. Make sure to keep tests pruned and fast-running so they don’t overwhelm the change process.
Work on changes in a sandbox
Using automation to make changes becomes truly painful when people need to run through the CD pipeline to see whether a simple change works. It’s essential that people working on code—and this definitely includes infrastructure code—are able to make and test changes locally before committing them to version control and kicking off the pipeline. With infrastructure, this typically requires Vagrant.
It’s relatively easy to configure Packer to build a Vagrant box along with other server images—not surprising since Hashicorp makes both Packer and Vagrant. A team making a change that will lead to a new server image being made can spin up the current version of the server locally, so they can try out changes and make sure they have it right before committing. It should be possible to run the automated tests locally, to avoid “edit - commit - test - fail - edit - commit” loops.
Cache installation files
If the time needed to run updates and installation on the server image is significant, teams should find ways to optimize it. One way is to cache the source files closer to where the image is built. This could mean mirroring package repositories, using caching proxies, or moving in-house software repositories to a closer location on the network or cloud.
Layer images
Another way to reduce the time needed to update servers is to use multiple layers of images. For example, a base server image could have the OS, with all of the packages installed and updated. New images are created starting with this base image, so only the newest changes need to be applied. The base image may be updated from time to time, especially when OS packages and other common files are released.
This approach works particularly well when teams have many different images for different applications and services, all based on the same OS distribution. In some cases it can make sense to have multiple layers of images. For example, a team may have a base OS image used to create all of their servers types, then a Java server image with the JDK and an application server installed, which is then used to create server images for individual applications.
Minimize the OS image
The time needed to boot server instances and save server images increases with the size of the OS installation. So teams can optimize the process by stripping the base OS down to the bare minimum—files and packages actually required for their use case. This has many added benefits, including reducing the surface area for security attacks and lowering the time to boot servers for automated scaling and recovery.
However, a risk of starting with a truly minimal image is that it may increase the number and size of packages installed during the image update. This can be handled through the layered approach described above. Start with a barebones OS distribution, then add the files needed by the team into a base image, which is in turn used to build role-specific images.
Make (some) changes at boot time
Teams with many, frequently changing services (or microservices) may find that baking each service application onto its own server image results in a large number of images and significant time spent waiting for server images to be built. One approach people take is to loosen their approach to immutability.
These teams create a single server image that can run any microservice that conforms to their packaging and installation format, and have a boot-time installer run. For example, a simple cloud-init script can be passed as a parameter telling it where to download the relevant microservice package, and then install it.
This can still be considered immutable in that the server’s configuration is not modified after it has been booted. But it weakens the testing benefit because it’s possible that a server image and application package combination may not have been tested in the pipeline before being deployed to production. Teams will need to decide whether this tradeoff makes sense for them.
Containers make immutable infrastructure changes go faster
Containers can dramatically change the dynamic of immutable infrastructure. Applications are packaged into a container image, which is promoted through a pipeline. This follows the immutable configuration model, since a new image is built whenever any of the files or configuration in the container is changed. And it is much quicker to build and deliver a container image than a full server image.
This leaves the question of how to build and configure the host servers used to run the containers. Many teams will build these hosts following the immutable model. In this case, the process tends to be easier to manage than one where applications are run directly on servers, without containers. Container hosts are simplified, only needing the software used to run and manage container instances. So they can be smaller, and tend to change less often.
Learn more about infrastructure as code (IaC)
Immutable servers are an increasingly popular way for teams to improve the consistency and predictability of their infrastructure. Hopefully your team now has some good ideas for how to implement them without seriously impacting the time it takes to roll out changes. You can read more about server images, infrastructure testing, and pipelines for infrastructure in my book Infrastructure as Code (O’Reilly Media).
Image credit: Flickr
Keep learning
Choose the right ESM tool for your needs. Get up to speed with the our Buyer's Guide to Enterprise Service Management Tools
What will the next generation of enterprise service management tools look like? TechBeacon's Guide to Optimizing Enterprise Service Management offers the insights.
Discover more about IT Operations Monitoring with TechBeacon's Guide.
What's the best way to get your robotic process automation project off the ground? Find out how to choose the right tools—and the right project.
Ready to advance up the IT career ladder? TechBeacon's Careers Topic Center provides expert advice you need to prepare for your next move.