community.aegirproject.org

Architecture Guide

Tagged:

Sysadmin

This section deals with providing coherence and understanding of various aspects of Aegir - what it is, and what it is not.

It tries to explain some of the concepts with which Aegir is developed for and how all the bits hook together.

Aegir design and terminology

Tagged:

@TODO - this document is out of date and needs updating.

This page documents all the different terms used when referring to components of the Hostmaster system, and how the different entities relate to each other.

Front End

The user interface used to administrate your sites. The front end is provided by the Hostmaster install profile, and the Hosting contributed module. It defines a complete Drupal based data model for managing the various aspects of your installation.

Entities

These entities are used primarily in the front end to manage the configuration and data. Upon calling the back end, the system breaks down the properties of these entities to be passed as options for the command line, so the back end only has a flat data structure to work with. During this process, all relationships are automatically retrieved, making it a one step process for the developer.

Client: The person or group that runs the site. This information is usually required for billing and access purposes, to assure that only certain people are able to view the information for sites they run. If you do not intend on having more than one client access the system, you will not need to create any additional clients for your purposes.
Site: An instance of a hosted site. It contains information relating to the site, most notably the domain name, database server and platform it is being published on. A site may also have several aliases for additional domains the site needs to be accessible on.
Task: The mechanism whereby Hostmaster keeps track of all changes that occur to the system. Each task acts as a command for the back end, and contains a full log of all changes that have occurred. If a task should fail, the administrator will be notified with an explanation of exactly what went wrong, and how to fix it.
Platform: The file system location on a specific web server on which to publish sites. Multiple platforms can co-exist on the same web server, and need to do so for upgrades to be managed, as this is accomplished by moving the site to a platform hosting an updated release. Platforms are most commonly built for specific releases of Drupal.
Server: The server that provides various services.
Service: The service that runs on the server. This might be HTTP or a Database server. Each web server hosts one or more platforms, which act as publishing points for the hosted sites. If you are not intending to use Hostmaster in a distributed fashion, you will not need to create additional web servers for your purposes. Most web servers and database servers are on the same machine, but for performance reasons external database servers might be required. It is not uncommon for one database server to be shared amongst all site instances. If you are not intending to use an external database server, or multiple database servers, you will not need to create any additional database servers for your purposes.
Service type: The type of service that runs on the server. In the example of 'HTTP' as a service, a service *type* is Apache or Nginx, for example. MySQL is a *type* of service 'Database'.
Package: An installable component for a site. Hostmaster keeps tracks of all the modules, themes and install profiles installed across your entire network. These packages most commonly link to their project on drupal.org, but might not in the case of custom modules. Users are not able to create nodes of this type, as they are generated automatically during verify and sync tasks.
Package Release: The release of a specific package. Each package on your network has at least one installed release, and each platform (including Drupal itself) may have several releases installed across your network.
Package Instance: Where a package release has been installed. A package release may be installed in multiple places on a platform, and sites may have access to multiple versions of packages. This entity also tracks which modules are enabled, and what version of the schema (if any) they have installed.

Task types

These are "hosting commands" that the server can accomplish. These are mapped to back end commands by the task queue processor.

Verify platform: Verify that a platform is correctly installed. Automatically created when new platforms are added, to ensure that they are capable of having new sites installed on them. Operates on: platforms
Import sites: Import existing sites onto a platform. Operates on: platforms
Install site: Install a new site. This is automatically generated when a site node is created. Operates on: sites
Sync site: Synchronize changes to the site record with the back end. Automatically generated on editing of the site node. Operates on: sites
Backup site: Generate a backup of an existing site. This backup contains everything needed to run the site. Operates on: sites
Restore backup: Restore a site to a previous backed up version. Operates on: sites
Disable site: Disable a running site. This is commonly used in the case of non-payment from clients. Operates on: sites
Enable site: Enable a disabled site. The opposite of Disable. Operates on: sites

This is an incomplete list, and will be updated as more tasks are added

Queues

The front end uses and maintains several queues, which are then processed through the drush.php hosting dispatch command. More queues can be added by optional contributed modules, and are often used to provide regularly scheduled back end commands, such as backing up sites.

Task Queue: The primary and most important queue. Whenever a change is made to the data set, the front end creates a "Task" node. These task relate to such things as installing new sites, regenerating the configuration files of sites, verifying that a platform has been correctly configured and importing existing sites on a newly added platform.
Cron Queue: This queue is used to manage the regular execution of the cron process required by most sites. It is configurable with a frequency you want all the sites to be cronned within. This is commonly 1 hour, but might be higher or lower. Higher frequency will cause higher system load. This command will iterate through all enabled sites, and batch the cron calls, based on how many sites are available, and how frequently it is running.
Backup Queue: This queue operates in the same way as the cron queue, in that it iterates through all installed sites, based on a configurable frequency. This executes the same functionality as requesting a site backup through the user interface. You are able to revert back to a previous backup of your sites.
Statistics Queue: This queue operates on the same mechanism as the cron queue, and alows you to retrieve statistics of your installed sites. This will retrieve such metrics as number of registered and active users, and number of posts / comments, which will then be viewable on the front end.

Back End

The back end is a command line interface provided by the Provisioning Framework, of all the tasks it is capable of performing.

It is designed to operate separately from the front end, so that you can have multiple back ends on one or more servers.

The back end handles the server level configuration of the system, such as creating new databases and virtual host records and restarting the apache process for new installations to take effect.

When the queues are processed by the front end, these are turned into calls to the back end, for whichever server they happen to be on. The back end communicates to the front end through a very simple mechanism, whereby it returns a serialized string containing state information and log messages / error return codes.

Talk

Aegir Architecture

Tagged:

Sysadmin

This wiki page documents the Aegir architecture in schematic form. It is intented to complement a related page on the typical file system structure on an Aegir installation.

Click on image for a larger version.

Preview	Attachment	Size
	aegir-architecture-multiserver-2010-09-29.png	246.43 KB
	aegir-architecture-2010-09-29.pdf	174.58 KB

Aegir file system structure

Tagged:

Sysadmin

This page documents the typical file system structure on an Aegir installation. It is intended to complement a related wiki page on Aegir Architecture. The following paths are based on an Aegir 0.4-x installation and assume that all directories are within the /var/aegir folder.

Scripts and Configuration Files

Path	Notes
/backups	site-specific tar balls containing a database dump and folders under path/to/platform/sites/example.com
/config
/config/server_master
/config/server_master/apache
/config/server_master/apache/conf.d	non-aegir or non-drupal virtual hosts files
/config/server_master/apache/platform.d	contains .htaccess information for each aegir platform
/config/server_master/apache/vhost.d	apache virtual host files for aegir platforms
/config/server_master/apache/vhost.d/aegir.example.com	virtual host file for aegir web site - specifies path to platform directory and site database settings (so that database credentials are not exposed directly in site settings.php file)
/config/server_master/apache/vhost.d/site-1.com	virtual host file for deployed web site
/config/server_master/apache/vhost.d/site-2.com
/config/server_master/apache/vhost.d/site-3.com
/drush	drush folder
/drush/drush.php	drush script
/.drush	drush extensions and server, platform and site aliases
/.drush/drush_make	drush_make project folder - used for building web sites from .make files
/.drush/provision	provision folder
/.drush/server_master.alias.drushrc.php	settings for the master server where the main aegir database, hosting platform and aegir site reside
/.drush/platform_hostmaster.alias.drushrc.php	settings for the hostmaster platform on which the aegir site is based
/.drush/hostmaster.alias.drushrc.php	settings for the aegir site
/.drush/platform_platform1.alias.drushrc.php	settings for platform1 on which site-1.com is based
/.drush/site-1.com.alias.drushrc.php	settings for site-1.com

Hostmaster Platform and Aegir (front-end) Site

Path	Notes
/hostmaster-0.x	hostmaster platform
/hostmaster-0.x/profiles
/hostmaster-0.x/profiles/default
/hostmaster-0.x/profiles/hostmaster	hostmaster profile
/hostmaster-0.x/profiles/hostmaster/modules
/hostmaster-0.x/profiles/hostmaster/modules/admin_menu
/hostmaster-0.x/profiles/hostmaster/modules/hosting	hosting module
/hostmaster-0.x/profiles/hostmaster/modules/install_profile_api	install_profile_api – facilitates provisioning of sites based on a non-default profile
/hostmaster-0.x/profiles/hostmaster/modules/jquery_ui
/hostmaster-0.x/profiles/hostmaster/modules/modalframe
/hostmaster-0.x/profiles/hostmaster/themes
/hostmaster-0.x/profiles/hostmaster/themes/eldir	eldir theme – provides Aegir front end look and feel
/hostmaster-0.x/profiles/hostmaster/hostmaster.profile	profile file – used in site provisioning to configure a drupal database
/hostmaster-0.x/profiles/hostmaster/hostmaster.make	make file – used to include modules, themes, libraries etc. from various sources
/hostmaster-0.x/modules
/hostmaster-0.x/themes
/hostmaster-0.x/sites
/hostmaster-0.x/sites/aegir.example.com	aegir web site folders

Deployed Platforms

Note: the directory /platforms is optional but can be useful to separate deployed platforms from directories for scripts, config files and hostmaster platform.

Path	Notes
/platforms
/platforms/platform-1
/platforms/platform-1/profiles
/platforms/platform-1/profiles/default
/platforms/platform-1/profiles/custom-profile
/platforms/platform-1/profiles/custom-profile/modules
/platforms/platform-1/profiles/custom-profile/themes
/platforms/platform-1/profiles/custom-profile/custom.profile	profile file – used in site provisioning to configure a drupal database
/platforms/platform-1/profiles/custom-profile/custom.make	make file – used to include modules, themes, libraries etc. from various sources
/platforms/platform-1/modules
/platforms/platform-1/themes
/platforms/platform-1/sites
/platforms/platform-1/sites/site-1.com
/platforms/platform-1/sites/site-1.com/modules
/platforms/platform-1/sites/site-1.com/themes
/platforms/platform-1/sites/site-1.com/files
/platforms/platform-1/sites/site-1.com/settings.php	site-specific drupal configuration file
/platforms/platform-1/sites/site-1.com/drushrc.php	site-specific aegir-specific configuration file
/platforms/platform-1/sites/site-2.com
/platforms/platform-1/sites/site-3.com
/platforms/platform-1/sites/site-n.com
/platforms/platform-2
/platforms/platform-3
/platforms/platform-n

Platform permissions

Path	Directory	File	Notes
`./*`	`aegir:aegir` `drwxr-xr-x`	`aegir:aegir` `-rw-r--r--`	The webserver has no business writing or moving files in the Drupal codebase.
Inside ./sites/example.com:
`drushrc.php`		`aegir:aegir` `-r--------`	Web server shouldn't be able to read drushrc.php, it's not a component of the Drupal site.
`settings.php`		`aegir:www-data` `-r--r-----`	Web server can read this file, but otherwise tight control over this file which can contain sensitive information.
`libraries` `modules` `themes`	`aegir:aegir` `drwxrwsr-x`	`aegir:aegir` `-rw-r--r--`	Because of the sticky bit (s) on the parent directory, each child directory will inherit attributes of the parent. The attribute that is consistently inherited is the group. This means that a developer in the "aegir" group can add files which will retain the "aegir" group ownership of the parent directory.
`files` `private`	`aegir:www-data` `drwxrws---`		Aegir sets these directories with a sticky bit (s) so that under certain conditions new folders and files will inherit parent permissions. There are only a few cases where this happens though.
`files/` `private/`	`www-data:www-data` `drwxr-sr-x`	`www-data:www-data` `-rw-r--r--`	The permissions shown here are how files and directories created by www-data will look. When verifying a platform, Aegir won't "correct" these files and directories to match the parents. If you have trouble with permissions/ownership on these directories, you can safely run the following commands (in this case on the files directory): `# chown -R aegir:www-data /path/to/site/files/* # chmod -R 775 /path/to/site/files/*`

Talk

Provision - the Aegir backend.

Tagged:

Sysadmin

Each of the major entities managed by Aegir (namely servers, platforms and sites) are represented by nodes in the front end, which in turn generate Drush 'named contexts' in the backend. Named contexts are similar to what Drush 3.0 refers to as 'site aliases', but are more structured and cover more than just sites.

Whenever you create or edit a node on the front end, Aegir will create or update the named context on the backend, and any tasks run on the entity will be run on that named context, ie:

drush @server_master provision-verify

All of the actual work related to managing your system is done by the backend, and the front end primarily serves as a way to manage these aliases. You can very easily use the backend on it's own if you so choose.

Creating or updating named contexts.

Contexts are created and updated through the provision-save command. The format of the command is :

drush provision-save @contextname --context_type=type --property=value --other_property=value

Every time this command is called, it will update the named context and only modify the properties that have been specified. It is important to note that the provision-save command never has a site alias or context name before the command, but requires you to specify which context you are modifying as the first argument.

Deleting named contexts.

To delete an existing named context, you simply need to pass the 'delete' option to the provision-save command, ie:

drush provision-save @contextname --delete

Using named contexts

Named contexts are used in 3 ways in provision: as aliases, arguments or options.

Aliases : Aliases appear to the left of the drush command, and are making use of the Drush 'site alias' functionality.

# Verify the site represented by @example.com
drush @example.com provision-verify

Options : Certain options to provision-save require you to pass the alias of another named context of a specific type, and uses this information to build relationships between objects.

 # generate a new site context in the directory represented by @platform_test
 drush provision-save @example.com --context_type=site --uri=example.com --platform=@platform_example

Arguments : A select few provision commands require you to pass the alias of another specific type of named context as an argument.

# Migrate the site to the platform represented by @platform_newversion
drush @example.com provision-migrate @platform_newversion

Context types

There are (currently) 3 different entities represented as named contexts in the aegir backend. You determine which type of entity you are working with by specifying the context_type option to the provision-save command. You dont need to specify it every time, but it is best to always explicitly specify it, just in case.

Because the namespace of all contexts is shared, they each have different naming conventions to avoid collisions. There are certain named contexts of each type which are always present in the system.

Each of the context types have different semantics and properties, and certain provision commands will only operate on certain types of entities.

Server contexts

Naming guideline : Server with a hostname of server.example.com becomes @server_serverexamplecom.

Special contexts : @server_master always refers to the local server where aegir is installed.

Required properties :

remote_host : This should always be set to the FQDN of the server being managed.

Services : Servers are the only entities which have services associated to them. Services are things such as http, db or dns.

Each service may select one service_type, which chooses which implementation to load for this service. Examples of these are the 'mysql' implementation of the 'db' service, or the 'apache' and 'apache_ssl' implementations of the http service.

All services that are found by Aegir are enabled, but unless you specifically select one of the implementations, the 'null' service type is loaded. Each service type has additional required properties and values that can be passed to it, but describing them all is outside of the scope of this document.

Important commands :

provision-verify : Verify that the configuration is correct and fix them if they aren't.

Example :

# Enable a new remote server hosting sites with apache on port 8080
drush provision-save @server_webexamplecom --context_type=server \
  --remote_host=web.example.com \
  --http_service_type='apache' \
  --http_port=8080

# Verify that the settings are correct
drush @server_webexamplecom provision-verify --debug

Platform contexts

Platforms are how aegir refers to a specific copy of Drupal checked out on a file system. Platforms are hosted on web servers and sites are hosted on platforms.

Naming guideline : A platform in /var/aegir/codebase-6.1 becomes @platform_codebase61 , but this is not strictly enforced. We only use that naming convention when we do not have a node representing the platform available. If a node is available, we use the title of the node to generate the context name.

Basically, you can name your platform anything but it should still be prefixed with 'platform_'.

Special contexts : There are no special contexts per se, but the hostmaster-install and hostmaster-migrate commands generate aliases for the hostmaster platform in the form of '@platform_hostmaster$version'. If the front end is present, one or more of these contexts are usually available too.

Required properties :

root : The absolute path to a Drupal directory, generally in the form /var/aegir/platformdir.

Optional properties :

web_server : The name of a server context that provides the http service. The default is to @server_master
makefile : The absolute path or url to a drush make makefile. If the root directory does not exist and the makefile property does, provision will call drush_make to attempt to create the directory.

Important commands :

provision-verify : This will ensure that all the files are the correct permissions, and are all present on the correct remote servers. Also rebuild package database. Run this command whenever the contents of the platform changes.

Example :

# New open atrium platform in /var/aegir/atrium-head that will be built from a remotely hosted
# makefile and published on an external web server.
drush provision-save @platform_atriumhead --context_type=platform \
        --root=/var/aegir/atrium-head \
        --web_server=@server_webexamplecom \
        --makefile='http://omega8.cc/dev/atrium_stub.make.txt'

# Verify settings and generate the directory with drush make, pushing to the remote server after.
drush provision-verify @platform_atriumhead

Site contexts

Naming guidelines : A site with the url 'site1.example.com' becomes @site1.example.com. The context name is always the url.

Special contexts : @hostmaster always refers to the site running the hostmaster front end. This helps with debugging and makes it more easily scriptable.

Required properties :

uri : The FQDN of the site. should match the context name.
platform : The name of a platform context that the site will be hosted on.

Optional properties :

db_server : The name of a server context which provides the db server. defaults to @server_master.
client_email : The email address to send the login information for the admin user.
profile : The installation profile the site is running. dependent on the platform supporting it.
language : The ISO language code of the site. dependent on the platform and profile supporting it.

Important commands :

Almost all of the provision commands are related to sites, so this is just the very core of them. Check drush help for more.

provision-install : Install the site represented by the context. Just creating the context doesn't install the site yet.
provision-verify : Verify the site is running correctly, and ensure the files are all published to the correct servers.
provision-delete : Delete the site and all it's information. The context will still be there.

Example :

# Set up a new site to be installed on the open atrium platform (hosted on a remote server),
# using an external database server , with the openatrium profile and localize it into spanish.
drush provision-save @atrium.example.com --context_type=site \
        --uri=atrium.example.com \
        --platform=@platform_atriumhead \
        --db_server=@server_dbexamplecom \
        --client_email=client@example.com \
        --profile=openatrium \
        --language=es

# Install the new site to the specifications requested above.
drush @atrium.example.com provision-install --debug

# Verify the site to make sure everything is ok.
drush @atrium.example.com provision-verify --debug

Importing contexts into the front end

Hostmaster provides a drush command which will attempt to generate or updates front end nodes that match your defined contexts. To import a context into the front end, just run :

drush @hostmaster hosting-import @contextname

This will import that context, and every context it depends on into the front end. This is useful for integration with other systems such as the puppet-aegir module for Puppet, which can add newly configured servers directly to the front end.

Talk

Messaging systems evaluation

So we've been talking seemingly forever about possible queuing system instead of our crappy "PHP/Vixie Cron based" one, but we have yet to document pro/cons of possible alternatives. Here are the goods (or the beginning of em).

Note that this document was written before (in november 2009) the current multiserver implementation was designed and completed.

Current design

Tasks are basically serialized in a MySQL database and polled at regular interval by a cron job, running drush hosting dispatch, which bootstraps Drupal, fetches the object, and fires up provision commands.

The possibility here is to have another dispatcher that would always run, and maybe not even talk to the mysql database, to fire up the provision commands.

Another requirement is that the queuing system may be network-aware. We want multi-server support and that is a critical issue for the next release (0.4 at the time of writing). One thing to consider is how we pass files around...

Roll our own

Pro:

full control over the API, functionality and whatnot (ie. function of manpower invested)

Con:

rolling our own is hard! let's go shopping!

Meta: AMQP vs JMS

So yeah, there's a standard in development for all this stuff: http://www.AMQP.org/ . There is also a Drupal module being developed to work with it: http://drupal.org/project/amqp

JMS on the other hand is the "Java Messenging Service", that creates a language- (as opposed to network-) based API.

Maybe we should just not care about which messaging system you use in the backend and allow our frontend to talk to any backend, with the stupid cronjob mysql poller as a default...

Gearman

Pros (mostly out of the home page):

supports PHP in frontend and backend
Multi-language - There are interfaces for a number of languages, and this list is growing. You also have the option to write heterogeneous applications with clients submitting work in one language and workers performing that work in another.
Flexible - You are not tied to any specific design pattern. You can quickly put together distributed applications using any model you choose, one of those options being Map/Reduce.
Fast - Gearman has a simple protocol and interface with a new optimized server in C to minimize your application overhead.
Embeddable - Since Gearman is fast and lightweight, it is great for applications of all sizes. It is also easy to introduce into existing applications with minimal overhead.
No single point of failure - Gearman can not only help scale systems, but can do it in a fault tolerant way.
Production ready - ran on digg, yahoo, livejournal, ...
Gearman uses an efficient binary protocol and no XML. There's an a line-based text protocol for admin so you can use telnet and hook into Nagios plugins.
Persistent queues are available using mysql, memcached or sqlite
There's pretty good PHP support and it's easy to get up and running quickly.

Cons:

"somewhat higher latency, signal-and-pull architecture"
The system makes no guarantees. If there's a failure the client is told about the failure and the client is responsible for retries.
gearman-specific binary/network protocol
Notice how sixapart.com maintains both gearman and the shwartz, below

OHLOH:

Mostly written in Perl
Increasing year-over-year development activity
Established codebase
Few source code comments

Resque

Pro:

redis-based (therefore somehow language agnostic deep in there, but certainly network-centric)
github's (aka: it's got to work at least somewhere in production)

Con:

ruby-centric
I haven't actually looked at the thing or tested it

Active MQ

Apache ActiveMQ:

Pros:

Supports a variety of Cross Language Clients and Protocols from Java, C, C++, C#, Ruby, Perl, Python, PHP
Supports many advanced features such as Message Groups, Virtual Destinations, Wildcards and Composite Destinations
Supports pluggable transport protocols such as in-VM, TCP, SSL, NIO, UDP, multicast, JGroups and JXTA transports
REST API to provide technology agnostic and language neutral web based API to messaging
Persistent

Cons:

uses too many acronyms on the frontpage (really.)
uses a lot of Java jargon
production-ready?

OHLOH:

Mostly written in Java
Large, active development team

RabbitMQ

Pros:

"high reliability, availability and scalability along with good throughput and latency performance that is predictable and consistent"
"based on the emerging AMQP standard"
in debian
Drupal module - AMQP
apparently very fast, according to dellintosh

Cons:

erlang? do you know erlang? (Arguably not really necessary, since all publishers and consumers can be written in other languages that have AMQP support)

OHLOH

Mostly written in Erlang
Large, active development team
Well-commented source code

Open Messenging Queue

https://mq.dev.java.net/overview.html

Pros:

SOAP/HTTP interface
Scalable

Con:

Java/C specific

The Schwartz

Pros:

Perl (come on, perl is everywhere... isn't it?)
"reliable job queue"
has retry
"lightweight"

Cons:

Perl (aka write only)
"library, so some assembly required"
"light on documentation"
Notice how sixapart.com maintains both the shwartz and gearman, above

Open AMQ

Pros:

Multi-language (but not PHP)
"learn in a day or so" then "another day or two for wireAPI", development will take "some weeks"...
"remote admin tools, one-line failover, instant federation, protection against slow clients, detailed logging"
AMQP standard
"Linux, AIX, Solaris, Mac OS/X, other UNIX"

Cons:

"for C/C++ and JMS"

OHLOH

Mostly written in C
Well-commented source code
Short source control history
Only a single active developer
Apache Software License may conflict with GPL
Apache License 2.0 may conflict with GPL

Beanstalkd

http://kr.github.com/beanstalkd/

Drupal Module for Drupal 6 and 7 is at http://drupal.org/project/beanstalkd

Pros: * used in production ("causes" application in facebook)

Cons:

http://www.ohloh.net/p/beanstalkd

Mostly written in C
Large, active development team
Few source code comments

SimpleMQ

http://code.google.com/p/simple-mq/

Pros:

persistent or in-memory

Cons:

java-only.

Zero MQ

http://www.zeromq.org/

Pros:

fast
lightweight messaging implementation.
supports different messaging models.
already very fast. We're getting 13.4 microseconds end-to-end latencies and up to 4,100,000 messages a second today.
very thin. Requires just a couple of pages in resident memory.
provides C, C++, Java, Python, .NET/Mono, Ruby, Fortran, COBOL, Tcl, Lua and Delphi language APIs.
supports different wire-level protocols: TCP, PGM, AMQP, SCTP.
runs on AIX, FreeBSD, HP-UX, Linux, Mac OS X, OpenBSD, OpenVMS, QNX Neutrino, Solaris and Windows.
supports i386, x86-64, Sparc, Itanium, Alpha and ARM microarchitectures.
fully distributed: no central servers to crash, millions of WAN and LAN nodes.

Cons:

no PHP whatsoever
production?

OHLOH:

Mostly written in C++

Jenkins

There's a good vibe around using Jenins (formerly known as Hudson) in place of cron for those kind of things. In our scenario, Aegir would queue up jobs to Jenkins, which would take care of dispatching them to remote servers.

Pros:

supports tons of different notification modes
scales well
supports remote servers

Cons:

application-specific (unclear?) protocol to queue up tasks
Java (so a bit heavy)

OHLOH: * has had 63,860 commits made by 708 contributors * representing 893,049 lines of code * is mostly written in Java * with an average number of source code comments * has a well established, mature codebase * maintained by a very large development team * with decreasing year-over-year commits

Others to evaluate?

Amazon's SQS. The latencies for this service tend to be high and variable so it may not be appropriate for all tasks.
Spread Queue ([[http://search.cpan.org/~jmay/Spread-Queue-0.4/Queue.pod|alpha software]]
Starling

twitter ditched starling

scala

Houston From Zivtech, not much documentation yet

Others mentionned here: ActiveMessaging, BackgroundJob, DelayedJob, and Kestrel.

Requirements

All the above must have:

Open Source - It's free as in beer and speech

Talk

Provision contexts

This section is currently being written, so its probably going to be full of errors, and possibly not that useful. Feel free to edit.

I'm probably going to pop a nice table of contents here, for now look to the right...

Purpose

Contexts are simply a way for Aegir to store certain pieces of information that it thinks may be useful later. They are named so that they can be referred to easily.

Every server, platform and site in Aegir has an associated context. Each one of these contexts stores information about the associated entity. So, for example, a site context stores the base URI of the site. Later, the code responsible for informing the web server of the site's existence can look at this context and determine the requested URI.

In theory contexts allow most of the back-end commands in Aegir to operate just on the name of the context. This is an important concept: the alternative would be to try to guess everything that the command being invoked might want to know and pass it in as a series of command line arguments.

For example, the command that Aegir uses to clone a site just needs to be given the context of the site to clone and the platform to clone it to. From there the code can work out where the destination platform actually is, what servers it needs to talk to, and with what credentials.

Drush contexts

Note that Drush also has the concept of a context, but this is not to be confused with Aegir contexts which are different. Aegir contexts are closer to, and stored as, Drush 'aliases'.

Implementation

Contexts are always named to start with a '@' symbol, except for in the filename of the file in which they are stored.

Contexts in Aegir are stored as a simple array of data in a file within the backend of Aegir. However this array of data is accessed and modified with a set of objects and accessors. Here is an example of what a context file looks like:

<?php
$aliases['hostmaster'] = array (
  'context_type' => 'site',
  'platform' => '@platform_hostmaster',
  'server' => '@server_master',
  'db_server' => '@server_localhost',
  'uri' => 'aegir.example.com',
  'root' => '/var/aegir/hostmaster-0.4-beta2',
  'site_path' => '/var/aegir/hostmaster-0.4-beta2/sites/aegir.example.com',
  'site_enabled' => true,
  'language' => 'en',
  'client_email' => 'aegir@example.com',
  'aliases' =>
  array (
  ),
  'redirection' => false,
  'profile' => 'hostmaster',
);

These files are stored in the /var/aegir/.drush directory on the master Aegir server.

In the above example we can see the properties of the context, like the 'uri' of the site, and the 'profile' that the site is running.

There are also a number of more interesting properties in the example, note the 'db_server' property, which has a value that is another context. We shall see later that these properties are special, and allow developers to easily access functions of the 'db_server' without needing to care which db server they are talking to.

Contexts are used within drush commands as objects, which subclass provision_context. This allows much more flexibility and cleverness, though can make them very confusing to use sometimes! As a developer you will only need to worry about the context objects, as Aegir handles storing them in the files for you. But it is important to note that each context must be representable in a flat text file (or be prepared to do some serious leg-work) by that I mean you probably don't want to be storing massive amounts of relational data in them, use a database for that!

Using contexts in your code

Getting the current context is really easy, just call the 'd' accessor:

<?php
$current_context = d();
?>

What the current context is will depend on the drush command that is running and the context that was passed into that drush command. For example if you ran a site verify task, then the caller would name a site context, and thus initially calls to d() would return the site context. The values of the context are accessible as simple properties of this object.

So, suppose a verify task is started for the main Aegir frontend, which has a context that is always called '@hostmaster', the drush command invocation would look like this:

drush @hostmaster provision-verify

Then the drush command for provision-verify could do this:

<?php
function drush_provision_verify() {
  $context = d();
  drush_print('Passed context of type: ' . d()->type);
}
?>

Which would print the type of the context passed to the verify command, in this case 'site'.

The d accessor is not to be feared! It is used all over the place in Aegir and if you're not sure what context you're going to get then you can always print d()->name and you'll get the name of the context that you're dealing with.

Loading a named context

You can pass an optional argument to the d accessor of the name of the context that you want. So, if you feel compelled to access a property about the main Aegir frontend site anywhere you can do this:

<?php
$hostmaster_context = d('@hostmaster');
?>

though most of the time you'll be dealing with the current context, and contexts that it references.

'Subcontexts'

Here's an example context:

<?php
$aliases['hostmaster'] = array (
  'context_type' => 'site',
  'platform' => '@platform_hostmaster',
  'server' => '@server_master',
  'db_server' => '@server_localhost',
  'uri' => 'aegir.example.com',
  'root' => '/var/aegir/hostmaster-0.4-beta2',
  'site_path' => '/var/aegir/hostmaster-0.4-beta2/sites/aegir.example.com',
  'site_enabled' => true,
  'language' => 'en',
  'client_email' => 'aegir@example.com',
  'aliases' =>
  array (
  ),
  'redirection' => false,
  'profile' => 'hostmaster',
);

Notice that some of the properties are the name of contexts, such as 'db_server' which has a value of '@server_localhost'.

Suppose now that I have some code that wants to get a property of the db_server associated with the @hostmaster context, I can simply do this:

<?php
$db_server_context = d('@hostmaster')->db_server;
// $db_server_context is now a fully populated context object, not the string '@server_localhost'
drush_print('DB server on port: . $db_server_context->db_port);
?>

The provisionContext object will return full context objects for properties that store the names of other contexts. The how and why of determining which to return as strings and which to return as a context object will be covered later. It is entirely possible to have the name of a context stored as a property in a context, that when accessed returns the string, rather than the context named in the string.

Properties and services

Services are the way in which new properties are added to contexts, in fact if you wish to indicate you want to store additional properties in a context, this must be done by a service. There is no other way to store data in a context. One might assume that you can just ask provision to save an additional value on a context, but this will not work, and you'll probably get very frustrated indeed!

For example, the provisionService_pdo service adds a 'master_db' property to the server context, it does this in a method thus:

<?php
function init_server() {
  parent::init_server();
  $this->server->setProperty('master_db');
}
?>

This means that the 'master_db' property can now be set in a provison-save drush command, and will be persisted when the context is saved.

A service may also define properties on the context as actually representing another context, so that you may access that subcontext directly. You can see an example of using this subcontext in the 'Implementation' section of this Provision Contexts guide, but the an example of how you let provision know that a property name is not just a string, but a named context is with a method on the service:

<?php
  /**
   * Register the http handler for platforms, based on the web_server option.
   */
  static function subscribe_platform($context) {
    $context->setProperty('web_server', '@server_master');
    $context->is_oid('web_server');
    $context->service_subscribe('http', $context->web_server->name);
  }
?>

Here the provisionService_http is being asked to 'subscribe' to a platform, it sets a property on the context, as above, but additionally uses the is_oid() method to indicate that the property is a named context. It then uses that immediately when it gets the name of the web_server context: $context->web_server->name. We'll worry about what 'subscribing' to a platform means later,.

Using services from contexts

This is stub page that will describe the way that you can use the service() method on a context.

Provision services

We should add some documentation about Provision services, what they are and how they work and interact with contexts.

UNIX group limitations

While working on the security enhancements, I stumbled upon a few fundamental UNIX limitations with groups. These limitations similarly apply to GNU/Linux.

Table of Contents

1. Group name length limits
2. Group membership limits

1. Group name length limits

A group name (e.g. dialout) has a length limit, usually hardcoded in the operating system. This limit varies across different UNIX systems and implementations.

NIS: 8 characters (source)
HPUX: 8 characters, 16 starting from HPUX 10 (source, source)
AIX: 8 characters, 255 for AIX 5.3 and above (source)
Debian: 16 characters (source, defined in shadow at compile-time)
Redhat: 16 characters, 32 starting from 2005 (source, source, source)
Solaris: 8 characters? (source)
Active Directory: 11 or 64 characters (source unclear)

In general, we can probably assume that 16 characters is a safe limit, and that's what's being used in hosting_client_sanitize.

2. Group membership limits

A UNIX user has a primary group, defined in the passwd database, but can also be a member of many other groups, defined in the groups database. UNIX has a hardcoded limit of 16 groups a user can be a member of (source). This means that if we translate Aegir clients into UNIX groups and you have access to multiple Aegir clients, you'll be able to access only 16 of those clients at a time. Rather annoying.

The problem is related with NFS, or rather AUTH_SYS (source), which has an in-kernel data structure where the groups a user has access to is hardcoded as an array of 16 identifiers (gids to be more precise, see this post for details). Even though Linux now supports 65536 groups, it is still not possible to operate on more than 16 from userland, through NFS - on a regular ext3 filesystem, it works without problems (tested in Debian Squeeze).

Solutions for this are not obvious. The blog post on sun.com proposes a technical enhancement to OpenSolaris. It's unclear whether that fix was implemented. This SAGE post explains that using GSS_API (used in NFSv4) should (should!) resolve the issue.

This more thorough analysis of the problem is more direct, to shamelessly quote it:

Use NFSv4
Use RPCSEC_GSS authentication
Use ACLs

As mentionned above, this only applies to NFS mounts. NFSv4 should (should!) be simple enough and available on Linux. It is also unclear exactly how to use RPCSEC_GSS (is that Kerberos?). As for ACLs, we are fully supporting this now with the provision ACL extension, so that shouldn't be an issue.

Running drush on Aegir sites as a regular user

N.B. The ACL-based approach, described below, has been implemented as an extension to Provision in http://drupal.org/project/provisionacl.

What's the problem here?

As things stand, you cannot run tasks as a regular UNIX user on your Aegir Drupal sites:

anarcat@marcos:~$ drush @hostmaster cc all
A Drupal installation directory could not be found                                            [error]
The drush command '@hostmaster cc all' could not be found.                                    [error]

The reason the site is not found is because the alias cannot be loaded. If we change our home directory to find the alaises, we then get permission problems:

anarcat@marcos:~$ env HOME=/var/aegir drush @hostmaster cc all
include(/var/aegir/.drush/platform_hostmaster.alias.drushrc.php): failed to open stream:      [warning]
Permission denied sitealias.inc:433
include(): Failed opening '/var/aegir/.drush/platform_hostmaster.alias.drushrc.php' for       [warning]
inclusion (include_path='.:/usr/share/php:/usr/share/pear') sitealias.inc:433
include(/var/aegir/.drush/hostmaster.alias.drushrc.php): failed to open stream: Permission    [warning]
denied sitealias.inc:433
include(): Failed opening '/var/aegir/.drush/hostmaster.alias.drushrc.php' for inclusion      [warning]
(include_path='.:/usr/share/php:/usr/share/pear') sitealias.inc:433
A Drupal installation directory could not be found                                            [error]
The drush command '@hostmaster cc all' could not be found.                                    [error]

... so that doesn't seem to be a good approach, although one could just fix the permissions on the alias files for the site (which don't contain the DB password, oddly enough) and platforms (which have been safe since the "server" seperation):

cd /var/aegir/.drush
chmod a+r ls *alias* | grep -v &#039;server&#039;

Then we end up with another problem:

Cannot open drushrc "/var/aegir/hostmaster-HEAD/sites/aegir.anarcat.ath.cx/drushrc.php",      [warning]
ignoring.
Cannot open drushrc "/var/aegir/hostmaster-HEAD/drushrc.php", ignoring.                       [warning]
Cannot open drushrc "/var/aegir/hostmaster-HEAD/sites/aegir.anarcat.ath.cx/drushrc.php",      [warning]
ignoring.
include_once(/var/aegir/hostmaster-HEAD/sites/aegir.anarcat.ath.cx/settings.php): failed to   [warning]
open stream: Permission denied bootstrap.inc:400
include_once(): Failed opening './sites/aegir.anarcat.ath.cx/settings.php' for inclusion      [warning]
(include_path='.:/usr/share/php:/usr/share/pear') bootstrap.inc:400
Cannot modify header information - headers already sent by (output started at                 [warning]
/usr/share/drush/includes/drush.inc:820) install.inc:618
Cannot modify header information - headers already sent by (output started at                 [warning]
/usr/share/drush/includes/drush.inc:820) install.inc:619
Drush command could not be completed.                                                         [error]

... and this is really the core of the issue here: we need to allow more than one user to access those files, and aegir must be able to fix those permissions when the site is created/verified.

So to sum up:

a regular user doesn't have permissions to access the drushrc.php and settings.php files
the aliases are not accessible unless you change your home directory (note that -i doesn't work either, for some reason)

Hackish solution: add user to more groups

Miguel indicates here that files should be writable by the Aegir group, and, if we start with this idea, we can actually get pretty far. The drushrc.php file is owned by the Aegir group, so adding yourself to that group and fixing up the file to be group-readable actually goes a long way. You also need to add the user to the www-data group since drush expects to be able to open the settings.php file also:

adduser anarcat aegir
adduser anarcat www-data
chmod g+r drushrc.php

With the above changes, we can actually run drush commands!

anarcat@marcos:~$ env HOME=/var/aegir drush @hostmaster cc all
Cannot open drushrc "/var/aegir/hostmaster-HEAD/drushrc.php", ignoring.                       [warning]
WD php: include(/var/aegir/.drush/server_localhost.alias.drushrc.php): failed to open stream: [error]
Permission denied in /usr/share/drush/includes/sitealias.inc on line 433.
WD php: include(): Failed opening '/var/aegir/.drush/server_localhost.alias.drushrc.php' for  [error]
inclusion (include_path='.:/usr/share/php:/usr/share/pear') in
/usr/share/drush/includes/sitealias.inc on line 433.
WD php: include(/var/aegir/.drush/server_master.alias.drushrc.php): failed to open stream:    [error]
Permission denied in /usr/share/drush/includes/sitealias.inc on line 433.
WD php: include(): Failed opening '/var/aegir/.drush/server_master.alias.drushrc.php' for     [error]
inclusion (include_path='.:/usr/share/php:/usr/share/pear') in
/usr/share/drush/includes/sitealias.inc on line 433.
include(/var/aegir/.drush/server_localhost.alias.drushrc.php): failed to open stream:         [warning]
Permission denied in /usr/share/drush/includes/sitealias.inc on line 433.
include(): Failed opening '/var/aegir/.drush/server_localhost.alias.drushrc.php' for inclusion[warning]
(include_path='.:/usr/share/php:/usr/share/pear') in /usr/share/drush/includes/sitealias.inc
on line 433.
include(/var/aegir/.drush/server_master.alias.drushrc.php): failed to open stream: Permission [warning]
denied in /usr/share/drush/includes/sitealias.inc on line 433.
include(): Failed opening '/var/aegir/.drush/server_master.alias.drushrc.php' for inclusion   [warning]
(include_path='.:/usr/share/php:/usr/share/pear') in /usr/share/drush/includes/sitealias.inc
on line 433.
'all' cache was cleared                                                                       [success]

We get a bunch of warnings, because provision can't load the server aliases (for a good reason - they contain DB passwords), but the command runs fine! In fact, if we want, we can avoid the luxury of using the aliases and just run the drush command in the site directory:

anarcat@marcos:aegir.anarcat.ath.cx$ drush cc all
Cannot open drushrc "/var/aegir/hostmaster-HEAD/drushrc.php", ignoring.                       [warning]
'all' cache was cleared                                                                       [success]

That warning can be removed by simply fixing the permissions on the drushrc.php file to be group-readable.

However, the problem with that approach is that the user has access to all sites, from there on. No regard to what client the user has access to. Obviously, this can work for a dedicated server (and it would need patches to provision so that drushrc.php files and optionnally aliases are group-readable) but it will not work for shared hosting, which is my prime objective here.

Patches necessary:

group-readable drushrc.php (changes in provision)
add users to the aegir and www-data groups manually
optional: group-readable site aliases (so that users can access the aliases)

Issues:

all or nothing: the user has access to all sites or none as we have a single group

Traditionnal UNIX permission-based approach

In order for this to work really correctly, we would need the following permissions setup. Instead of having the typical following:

-r--r-----  1 aegir aegir    55456 16 mar 13:35 drushrc.php
drwxrws--- 10 aegir www-data  4096 16 mar 13:34 files
drwxrwsr-x  2 aegir aegir     4096 16 mar 13:34 modules
-r--r-----  1 aegir www-data  3276 16 mar 13:35 settings.php

(I am skipping libraries, themes and private directories for simplicity - libraries and themes are the same as modules and private is the same as files.)

... We would need to have the following:

-r--r-----  1 aegir    client   55456 16 mar 13:35 drushrc.php
drwxrws--- 10 www-data client    4096 16 mar 13:34 files
drwxrwsr-x  2 aegir    client    4096 16 mar 13:34 modules
-r--r-----  1 www-data client    3276 16 mar 13:35 settings.php

That is, all files and directories are readable by the "client" group (which allow inter-client isolation) and some would be readable or writable by the webserver (which allows for uploads and the webserver to read the settings.php and so on).

In this scenario, the webserver has access to all sites all the time: it can read/write uploaded files of other modules, but, because of database cloaking, it can't access other sites' databases. If we want to fix this we would have to go with ACLs.

Note that this setup doesn't require the webserver running any special patch or daemon to run the PHP process as a normal user: it still runs as the www-data user.

Patches necessary:

patches mentionned above (group-readable drushrc and aliases)
creating the client group and adding users to it
adding the aegir user to the client group
patching provision so it changes the group of some files to the client group
patching provision so it changes the owner of some files to the www-data user (this actually requires root: ouch!)

Issues:

requires aegir to run as root
aegir need to be a member of every group

The ACL approach

This is based on this excellent comment from malclocke.

This is one of the simplest and elegant approach, as it:

doesn't require root access
fits well with the n to n client access list model of aegir
works without having to change the existing permissions (so would fit well in a contrib module)

To configure ACLs, you need to add the option to your mountpoint in fstab and remount the mountpoint (or reboot):

apt-get install acl
mount -o remount,acl /var # and edit /etc/fstab to add the acl option to the mountpoint

The rest is fairly simple: we just need to add access to the required groups to the drushrc and settings.php files:

setfacl -m g:client:r-- settings.php drushrc.php

... where "client" is the client group.

Optionnally, we can also give access to the alias files the same way, but we end up with the same warning on the server level files.

Note: this is a good introduction to ACLs, in Redhat

Patches necessary:

make platform-level drushrc readable by all (or add an ACL for each client with access to the platform..)
patch provision to add relevant ACLs on verify and install, to drushrc.php and settings.php:
system-level configuration of mountpoints and extra package
creating the client group and adding users to it

Issues:

requires system-level modifications
NFS has not supported ACLs very well traditionnally (see the NFSv4 implementation notes and the issues page for ACLs for more information) - but according to my tests (vanilla Debian squeeze), it just works

Other similar research

This is actually a well-discussed topic in the community, although nobody came up with a clean and complete solution for this.

Brilliant comment from mig5 explaining group-ownership of key files
A similar solution proposed here actually shows which chunks of code need to be modified in 0.4-alpha6
Aegir permissions best practices thread - mentions everything from the "aegir group" to ACLs
My original research into this problem