close

Day 9: Deploy your projects with Git

To make error is human. To propagate error to all server in automatic
way is #devops.

— @DEVOPS_BORAT

In the last few years, spurred no doubt by the usefulness and enormous
popularity of GitHub, even the most casual developers have become hooked
on version control, and on Git in particular. It’s a wonderful
development; Git has finally lowered the barrier of entry for version
control, bringing its benefits even to those who become filled with
dread at the mere thought of installing a version control daemon.

What gets a little less airtime, though, is the subject of what to do
once your code is in Git. Since Git is distributed, everyone who
develops websites with Git is faced with the problem of how to get their
code from their development environment onto their live server. Just
what is best practice for deploying your code from Git?

Principles of deployment

When we talk about deployment, we’re talking about the logistical
problem of getting your code from one place to another. The “from” is
typically your own development machine; the “to” might be a staging
server, where it’s tested before going live, or the live server, where
it will run in production, or a thousand servers across several
loadbalancers from which your hugely popular web app is served.

Whatever the scale, the fundamental principles are the same. You want to
be able to deploy the code easily, ideally with a single command. You
want to be able to roll back the deployment should something go wrong.
You want to be able to automate any tasks that go along with deploying
the code — restarting servers, minifying static assets, clearing caches,
etc.

There are lots of ways to achieve this, from the simple to the
incredibly complex. I’m going to focus on three: first, manual
deployment; second, a hook-based workflow; and finally, a more robust
system that uses Capistrano.

Manual deployment with Git

When you think about the criteria a good deployment system has, the
sharper among you might have noticed that they’re also the qualities
that make a good version control system — and we know that Git is
a great example of one of those. Most importantly, it can be automated
and it supports rolling back to previous revisions.

So, for uses at the simplest end of the spectrum, it might well be
feasible to deploy using nothing more than Git and Git alone. How might
we do this?

Well, we need a Git repository, obviously. We’ll assume that the first
Git repository that you have is on your local machine, and it’s the one
that you’ve been commiting to as you’ve been developing your project.

Next, you’ll need to clone this repository to the remote machine. Now,
there’s two ways to do this; you can do a straight clone, in which case
your local repository will be added as a remote on the live server.
However, this means that your local development repository must be
accessible, usually over SSH, from the live server.

This is something that’s usually either impossible or undesirable. You
would generally have no way of knowing where your local repository was;
you might want to deploy from the airport departure lounge in which
you’re doing a spot of development, or from a friend’s house; if you
hardcoded the IP address of your machine, then it would quickly become
obsolete.

If we can’t access our development machine from the live server, then
we’ll need an intermediary. Luckily, Git offers just what we need, in
the form of bare repositories. A “bare” repository is just Git’s
technical term for a repository that doesn’t actually have its own copy
of the tracked files, but rather just has the repository information —
that is, that only has the information that’s usually stored in the
.git directory in an ordinary Git repository.

Using a bare repository means adapting the process slightly, but only
slightly: instead of going to the live repository and pulling directly
from your local repository, you would push from your local repository to
the bare repository, and then go to the live repository and pull from
the hub. A small change, but one that is far more flexible —
especially when you’re working with multiple developers.

So, a typical deployment might look like this:

local % git commit
local % git push
local % ssh [email protected]
remote % cd /sites/example.com/
remote % git pull

Fairly simple, eh? In fact, it’s so simple that it’s all that lots of
people end up doing, tolerating the repetition of these commands each
time they want to deploy a change to their site. But we don’t have to
tolerate it: we can be clever. Let’s automate this!

Automatic deployment with Git hooks

Git — like many version control systems — comes with a system of
hooks that allow you to automate many common tasks. Hooks are no more
than scripts that are run before or after certain events, and they’re
fairly self-explantory: after doing a merge, for example, Git looks to
see if there’s a hook called post-merge and, if it exists, runs it;
before committing, it looks for the pre-commit hook; and so on. An
exhaustive list of Git’s hooks can be found
here.

Fundamentally, the process of deploying your site using Git hooks is no
different to the manual process outlined above; all that we’re going to
do is to automate it, so that we only have to do the first two commands;
that is, to commit our changes and then push them to the hub. After
that, the hub itself should take care of putting the changes live.

Everything that we need to do can be done in a single hook,
post-update, which runs after a push has completed. So, if we create
a script called post-update in our hub repository’s hooks directory,
it will be executed each and every time someone pushes to the hub.

Let’s think about what this hook needs to do.

First, it needs to be discerning about what it deploys. Typically, we’d
only want to deploy our stable branch — usually master, but often
release or similar. So, we’ll need to check what branch is being
pushed. Helpfully, Git passes the name of the branch being pushed as the
first argument to the post-update script, so we can check it easily.
In Bash, we’d do:

if [ "$1" = "refs/heads/master" ]; then
    # The master branch is being pushed
fi

What next? Well, not very much. If your hub is on the same server as
your live site, then all that remains is to change to the directory in
which your live site resides and pull from the hub:

cd /sites/example.com/ && git pull

If your hub is on a different server to the server that you’re deploying
to, though, then the process is slightly different. First, you’ll need
to make sure that your user on the hub server has its public key on the
live server; this will allow it to log in and issue the Git pull command
without having to enter a password. (More about configuring passwordless
SSH can be found
here.)

Assuming that the hub user’s public key is in place on the live server,
our hook code would look like this:

ssh [email protected] "cd /sites/example.com && git pull"

This command will log into the remote server as “user”, navigate to the
live site’s directory, and then pull from the hub.

That’s it: now, whenever you push to the hub, you should see some output
showing you that the live site was automatically updated.

Going further: Capistrano

Capistrano is, at its root, a way to run commands on remote systems from
your own. You can define servers — there could be one, or one thousand
— and then define tasks to run on them. Anything you might normally
SSH into the server and run from the command line can be broken down and
automated into what Capistrano calls “tasks”.

In that sense, Capistrano serves two purposes. First, it standardises
your deployment procedure; when you have multiple people deploying code,
you can make sure that they do so in the same way.

But second, it offers a high degree of automation. Sets of commands that
you perform regularly can be distilled into a single task, and these
tasks can even be chained. This has huge implications for managing your
servers generally, but for deployment the benefits should be obvious:
if, after deploying, you always need to do supplementary things —
restarting daemons, compiling SASS or CoffeeScript files, clearing
caches — then with Capistrano you can do these automatically.

It also offers some peace of mind. Inbuilt to Capistrano is the ability
to roll back a deployment, instantly (or near enough, anyway) reverting
to the previous version. It does this by maintaining physical copies of
the previously deployed versions, and linking to the latest one; if
a deployment goes wrong, it can simply update that link to point to the
second-to-last one.

Capistrano is something that runs entirely on your local machine; it
requires only SSH access to your server(s), and doesn’t need anything
installing or configuring on them.

Full instructions on using Capistrano are far beyond the scope of this
article, but the basics are quite straightforward. First, you must
install Capistrano on your local machine; instructions on how to do that
can be found on the Capistrano wiki.

By convention, Capistrano stores its settings in a file called
deploy.rb in the config directory within your project. Let’s walk
through an incredibly barebones config file that will get us started:

set :application, "Foo"
set :scm, :git
set :deploy_to, "/sites/example.com"

Here, we’re telling Capistrano what our application is called, telling
it that we’re deploying via Git (rather than, say, Subversion), and
letting Capistrano know the directory that we’re deploying to on the
remote server.

set :branch, "master"
set :deploy_via, :remote_cache

We tell Capistrano here that our master branch is the one that we’re
deploying from, and that — rather than cloning all of the files every
time — it should maintain its own Git repository and do a pull when
updating. This is almost always what you’ll want to do.

set :user, "deploy"
set :use_sudo, true
server "example.com", :app

Here, we tell Capistrano about the server itself. We want to connect
using the user deploy; we want to use sudo where appropriate; and
finally, we’re giving the hostname of the server we want Capistrano to
connect to. We’re only defining one server here, but some setups might
define tens or even hundreds, each potentially with different roles.

set :keep_releases, 3
after "deploy:update", "deploy:cleanup"

Since Capistrano keeps its own history of past deployments, we need to
tell it how many to keep. Three should be enough; that gives us the
ability to roll back to the previous commit if our deploy goes wrong,
plus an extra revision again for peace of mind. We also tell Capistrano
here that, after deploying, we should clean up old releases, leaving
just three.

after :deploy, "compass:compile"
after :deploy, "admin:restart"

Here, I’ve defined some tasks that should run after each deployment. In
this fictional project, I’ve assumed that we’ll need to compile Compass
and then restart some server software, so these are the two tasks that
we run after the deploy task is complete.

Defining these tasks is quite simple. Here’s what the compass:compile
task might look like:

namespace :compass do
    task :compile do
        run "cd #{deploy_to}/current/web && compass compile"
    end
end

Our namespace is the first part of our command; it lets us group
similar tasks together. Then we define our task. Here, there’s nothing
more to our task than running a couple of commands on the remote server,
so we use run to do exactly that: we change into the “web” directory
of the current release, and then compile compass.

It should hopefully seem clear, even from this incredibly basic
explanation, that Capistrano tasks are enormously flexible; anything you
might conceivably do repeatedly on a server can be automated, which has
huge benefits anyway but becomes a life-saver if you’re dealing with
deploying to multiple servers.

Better living through deployment

If nothing else, I hope this article has made you think a little about
your deployment process. Is it robust enough? If you had to add a second
webserver into your infrastructure for performance reasons, could you
scale the process easily, or would you be doubling your work? If
a deployment went wrong, how painful — or even possible — would it
be to go back to the previous version?

What process you end up choosing depends on so many factors. The size of
your team, and of your codebase (or codebases); the size of your server
infrastructure; the criticality of what you’re deploying.

Whatever you end up doing, though, you can feel safe in the knowledge
that Git will scale with you; whether you’re deploying your personal
blog or a million-user web app, Git can cope.