Using Puppet to configure new and existing servers in the Cloud

Introduction

Cloud Computing grants us access to incredible amount of compute resources right at our fingertips - just hanging there, waiting to be tapped. Nonetheless, it is often overlooked that while provisioning new servers is incredible easy the trick is to get them to do something actually useful (a virgin OS happily idling in the cloud only generates expenses - not business value or revenue).

In this post we’ll discuss the art and science of making the required configurations and customizations in order to bring our cloud-provisioned servers to a state where it is manifesting itself into business value.

First, Let’s review our options

Let’s start the discussion by outlining a simple scenario in which we set upon provisioning a new, fully-functioning web server in an existing environment.

To that end, we will typically need to perform the following steps:

Install the web server on the base OS.
Install any required modules/frameworks (like Rails, .NET, mod-php etc.)
Import the web application itself, along with any required static content.
Import and configure any relevant server identifications such as hostname, public DNS name and X.509 certificates.
Configure the server to take it’s place in the application tiers by opening firewall ports, creating an ODBC connection to the database back-end and establish connection with relevant application servers.
Fire-up the server, insert it into the load-balancing scheme and validate it’s actually functioning properly.

To fulfill these requirements, we could go down one of the following paths:

Manually install and configure the required components (as done in many, if not most, IT shops do today).
Bundle all of the requirements into an Image (with physical servers this is often done via using Symantec Ghost or the equivalent) that our cloud provider will use to provision new machines.
Provision a vanilla OS image and as a part of it’s initialization fetch on-the-fly the desired components and perform the necessary customizations to bring it to production.

So which configuration method is best?

Manually configuring our servers is time consuming at best and error-prone at worse - making it by far the worse of the three. The simple truth is that we humans are simply poor at performing repetitive tasks - and so much better off delegating it to the consistent and cheap computers we have at our disposal.

Imaging is easy to set up (just manually customize once, and let the provider duplicate this configuration time and time again), but is limited in the sense that we need to create a separate Image for every different configuration - often resulting in a large number of nearly-identical images that all need to be manually maintained.

I strongly suggest implementing a scripted installation and configuration mechanism (we’ll discuss the actual implementation in just a bit) for software and components. It may require more work up-front, but it offers ?such great benefits easily return that investment:

First and foremost, unlike imaging: We can apply the desired configurations to existing machines! A machine that has been around for a month is not fundamentally different than one that has been around for a minute as far as running scripts is concerned.
No need to create and maintain multiple images. All of the customization is based on the same one baseline - think about having to patch and test only one baseline (also applies to configuration removals, such as uninstalling Apache Webserver).
Allows us to create new configurations literally on-the-spot (as they are made-to-order to begin with) - resulting in a much more dynamic and agile IT.
It is hardware agnostic (to a greater degree), so the same configuration could be reused on different types of virtual/physical machines.

Scripted configurations have been around for a while, why should we do things differently in the Cloud?

First off, I strongly recommend scripted installations to manage all of your machines - be them physical, virtual or in the Cloud.

It’s the most efficient way to reign control on your IT infrastructure, and with some researches suggest that system configuration and installation tasks are consuming 60%-80% of IT departments’ time (making it a very hard-hitting method of lowering operational costs).

True, we can insist on configuring Cloud machines the old-fashioned way (read: the inefficient way), but that won’t allow us to really leverage the elastic and dynamic nature of the Cloud infrastructure - resulting in nothing more than a glorified pay-by-the-hour hosting.

Although in traditional environments we could get away with doing things inefficiently - the only way to fully leverage the rapidly provisioned Cloud resources and offer highly dynamic and scalable solutions is to rely heavily on a strong configuration and automation toolkit.

Puppet to the rescue

Puppet 101

Puppet is an open source project, backed by a commercial company aimed at automatically configuring Linux (and soon Windows!) systems from a centralized location (providing all of the benefits I’ve mentioned in the previous section).

Puppet revolves around resources which represent various components of a system (such as a file or a process) and enables the administrator to define how does a particular resource be configured on a particular server (or group of similar servers such as “all of the web servers”.

For example, the following resource definition represents Apache webserver:

service { “webserver”:
require => Package[“httpd”],
ensure => running,
hasstatus => true,
hasrestart => true,
}

The two lines highlighted in bold are the main piece of it: the first line ensures that the webserver package (“httpd” in RedHat or “apache2” in Debian) is installed on the system while the second line ensures that it is always running.

These various statements are handled by various providers - components of the system that provide configuration functionality such as the package manager and service manager in the above example.

We don’t dig any deeper into the implementation - but suffice to say that it leverages the existing distribution components, is completely configurable and allows for the addition of custom providers to further extend Puppet’s abilities.

Deploying Puppet In The Cloud

Puppet is a client/server framework with Puppet Agents running on every computer identifying themselves to an aptly named Puppetmaster server that centrally holds all of our configurations - for obvious reasons we’ll need to make Puppetmaster available to our Cloud machines.

Naturally, we probably don’t want all of our machines to have the same configuration - to this end I recommend using AWS’s user-data parameter when starting EC2 Instances to provide the name of the configuration we’d like Puppet to assign to this instance.

For example: if we want this instance to be a webserver then we could pass “webserver-ApplicationA” , or if we want a DB machine we could pass “mysql-ApplicationA” and have Puppet install and configure MySQL - The important benefit is that once the instance has been launched Puppet does all of the heavy lifting for us required to bring this Instance into full production use!

For advanced users, I also recommend pairing Puppet with Subversion to enable rollback and auditing of configurations.

For EC2 users, there is also a freely available EC2 Puppet configuration recipe (enabling some nifty functionality such as having the instance map Elastic IP on boot) freely available at the Puppet website.

Conclusion

The beauty of Cloud Computing is that we switch our focus from Servers to Services. Obviously servers are still the building blocks of services, but instead of maintaining and configuring them one-by-one - we choose to couple Cloud computing with automation to alleviate the need to interact with individual machines and instead focus on the endgame: Services.