Aegir Support for Multi-Server Site Deployment and Management for Drupal

Blog

Estimated
10 min read

The next Aegir release will enable managing sites across multiple servers. This means that you no longer need to have Aegir on the same box as your site, and you can even migrate your sites between servers. Big sites can now also be spread across multiple web servers, giving your hosting infrastructure the ability to grow with your sites.

2bfec580a330  0dcsmsAgUW8qCzel7

Below are the details of what’s come from our June sprint where we worked closely with Chapter Three’s team. While all code will not be officially packaged until mid July in Aegir 0.4 Alpha 9, you can check out the latest code from our git repository at git.aegirproject.org.

And then there were servers

The heart of this sprint was introducing the concept of “servers” and “services” into Aegir. We have rewritten both the front end and the backend of Aegir with a new OOP API to properly make use of the entities. So out with the “db_server” and “web_server” node types, and in with the new “server” node type.

Each server can now have one or more service associated with it, such as a HTTP service or a DB service. What’s more is that there can be multiple implementations of a service type. So, for example, you can now have Apache and Nginx support, and not only can you have different implementations of a service, you can also create entirely new service types, such as a DNS service.

On a conceptual level this means that we can cleanly distinguish situations where everything is on the same server, as you will only have a single server node with http_apache and db_mysql services associated to it. If you then add a remote database server, you will have a second server node with just the db_mysql service associated to it. Just being able to cleanly model these relationships is already a significant improvement to our codebase.

This new architectural change has been taken into the backend too, allowing us to much more easily extend the functionality of Aegir. When you create a platform or a site, all it cares about is that it has it has a service it can connect to — it doesn’t care about the internals of how it’s all put together. This means that for the first time it’s completely feasible to use provision without needing to have it manage your virtual hosts in apache.

It also means that we can implement some fairly complex situations we never could accomplish before. The prime example of this is the ‘web cluster’ service. If you have multiple web servers, you can create a new server to represent a cluster of web servers. When you create a new site, you can simply select the new cluster to host your site on.

Remote procedure calls

So how does publishing sites on remote web servers work? This is directly related to another of the architectural changes we have made to Aegir. You may already be aware of the ‘remote site alias’ functionality that exists in Drush 3.x, which I wrote about before. That functionality in Drush is actually a user interface on some nifty magic juju that was developed by the Aegir project and contributed to the Drush community in the form of a function called drush_backend_invoke(). Basically it allows Drush commands to call each other as if you were working with web services online, with a pretty RESTful API. Because of how it was implemented, it also allowed you to call Drush commands on remote servers over ssh connections.

It’s incredibly cool and useful stuff, which is why it was such a difficult decision in the end to decide that we would not use remote Drush calls. Even though we developed this functionality for later use and had oriented a lot of our architecture around it, we came to realize that it was not the correct tool for the job.

Going back to basics

One of our base assumptions to this point had been that Aegir’s correct architecture would involve having Drush and provision installed on every server we manage. This is the base requirement for using Drush for remote procedure calls using the site alias functionality.

Our assumption in this case has been regularly and consistently challenged throughout the entire history of the project. The biggest point of contention even early on was why on earth we would need to have all the requirements set up on a server we were only intending to use to host databases? It also became very clear to us that we had muddled up the concept of the web server with the server on which Drush and provision would actually be installed on.

After we went through our code base and cataloged exactly when and where we needed to run code on the web server specifically, we realized for a large part that we really didn’t, and the few places where we legitimately needed to could be scripted using ssh. So now when we need to issue a restart command for the apache server, and we detect it is not the local server, we simply issue that and only that command remotely.

Work locally, publish remotely

This freed us up tremendously and allowed us to use an architecture where Drush simply generates the configuration files locally and then distributes them to their required locations on the remote servers. At the moment this is done via rsync, but it could be made pluggable in the future.

You manage all the different platforms on your hostmaster server, and when you select a web server to publish the platform to, it will intelligently sync it to the remote server. The same with sites created on those platforms. Migrate a site between platforms hosted on different servers and it will automagically move it between the servers for you.

This design is what is referred to as a Spoke model. We found that we were able to meet the majority of our use cases by using the assumption that there will only be one central ‘worker’, and allowed us to get the basics right so that we could eventually evolve to the Mesh model if we ever needed to.

Testing ground for future Drush functionality

What’s interesting about provision is that for a large part it serves as the testing ground for future Drush functionality. It’s a two-way street though, as is evidenced by my semi-frequent love letters to Drush as a tool.

The new release of Aegir makes extensive re-use of the ‘site alias’ functionality found in Drush 3.x, but we’ve taken the base functionality and extended and integrated with it in in many interesting ways, so much so that some of the concepts we’ve been working on will likely form the basis for an object oriented rewrite of Drush for it’s 4.0 release.

First some background on what has traditionally been the relationship between the Aegir front end and backend. Changes to the nodes/entities in the front end were traditionally recorded and a task was created — for instance ‘provision-install’. For every task we collected all the possible information that the task could possibly need, using that information inside the Drush command.

Here’s where that breaks down. When you are working with multiple objects of the same type, how are you to know which fields belong to which? If you have two servers, with http and db services? When you call provision-install, how do you decide which settings to pass through to the Drush command? Inside the Drush command in the backend it gets even more confusing because you are essentially receiving a bunch of values with no real concept of which entity they belong to. In some ways it is similar to programming using only global variables!

We also found that we were introducing a lot of complexity into the hosting front end, figuring out which variables to pass, how to manipulate them, and so forth. We had essentially designed a command line interface that could only sensibly be used in concert with the front end.

Provision-save me

When you create or modify a server, platform, or site in the front end now, it’s primary and initial function is to pass the node’s new values to the backend, using the new provision-save command.

So what is provision-save? It forms the basis of how we interact with the backend. It’s also the key to making provision useful without needing to install the front end. First we call provision-save with the right parameters and instead of having to specify the right root when we run any of the commands that require that information, we can simply run the command against the alias.

`drush provision-save @platform_name --context_type=platform --root=/var/aegir/platform
drush @platform_name provision-verify # do the permission check and make sure it's usable`

More importantly, we can also use this functionality to represent relationships between objects. For example, to create a new site on this platform we would do:

`drush provision-save @mysite.com --context_type=site --platform=@platform_name --uri=mysite.com
drush @mysite.com provision-install # this is the actual site install.`

We make use of this in many useful and interesting ways, such as to migrate a site between different platforms:

`# set up a new platform on a remote web server
drush provision-save @platform_live --context_type=platform --root=/var/aegir/live --web_server=@server_live1
drush @platform_live provision-verify

#migrate the site onto this new remotely hosted platform
drush @mysite.com provision-migrate @platform_live`

Also, these are all real Drush site aliases, so you can run any pre-existing Drush commands on them.

Conclusion

This is relatively thorough look at some of the changes we have been implementing and will package up relatively soon in Aegir 0.4 Alpha 9. These are also only the most major of changes. There are plenty of other surprises for new and old users alike. Anybody who is interested in using Aegir to manage their Development -> Staging -> Production workflow should definitely be watching this space : )

Background notes

For those who haven’t been involved in or haven’t read about the project, for every Aegir release we choose a goal. Our goal with the first release (0.1) was to be able to install and manage Drupal sites using Aegir, while our goal for the second release (0.2) was to be able to safely and intelligently manage sites across several Drupal releases, allowing you to upgrade and migrate these sites between major and minor Drupal upgrades. For the next release (0.3) we focused on porting our existing features to Drupal 6 while fixing as many bugs as we could. For Aegir to be able to manage sites across multiple servers and handle situations like being able to move sites between servers, we needed to take a long hard look at how we were representing these entities internally, which also directly affected how we allowed our users to manage them.

When we were working on our earlier releases, we understood that we would need to manage multiple servers eventually, and therefore introduced the ‘db_server’ and ‘web_server’ node types. We may have called these generic names, but in reality they were ‘mysql’ and ‘apache’ node types. These assumptions made it all the way to our backend, where no clear abstraction was even attempted. Another thing that we realized too late was that even though we were maintaining separate nodes for a database server and a web server, in reality these two entities were representing specific aspects of the same thing, namely a server.

Having been newly freed from the constraints of CVS, we set out to resolve these limitations with a “feature” branch in our git repository. This allowed multiple developers to collaborate on the new major release, while still maintaining and releasing new alpha releases. Over the last several months we have been merging components of our new architecture into our alpha releases, and very recently we have merged in the last of the major refactoring.

What we're doing.

Latest