The best recipes for disaster and how to avoid them

This is a short list of all those bad things you should avoid while still learning how to use Aegir system properly, compiled from many real-world issues, so you could enjoy reading about creative ways to destroy your sites or Aegir system instead of experience it.

How to get all kinds of failed migrate or clone tasks?

Very easy. Simply use sites/domain-name/modules space for your code. This will guarantee many half-broken migrations with duplicates of the same site existing in two platforms, with fatal errors because some required include file in some module no longer exists, or not yet exists, as all paths to sites/domain-name/modules change on the fly when either domain-name or platform used changes, so some entries in your site's system table will cause either failed and reverted tasks in Aegir or some nice WSOD, until you will repair registry with http://drupal.org/project/registry_rebuild.

No, really, don't use sites/domain-name/modules for anything and save yourself headaches and frustration.

To help you understand this better, let's quote mig5 - Aegir core developer: "Aegir doesn't track site-specific modules/themes in /sites /$yoursite /(modules|themes) because it's not possible to ever upgrade those modules (they are tarballed up and moved with the site on a Migration). For this reason it's unwise to store site specific modules in this directory as you'll never be able to upgrade them if you do your upgrades properly (via Migrations, to new target platforms). To have site-specific modules not in that directory, you should learn install profiles, and store the modules/themes in /profiles /$yourprofile /(modules|themes) for a site using that install profile. (...) I don't see us ever handling the upgrades of modules in /sites/$site, since that entire dir gets tarballed up and moved across. In fact we made a deliberate design decision to literally ignore those modules in the Aegir package system." We highly recommend to read all comments there. There is also an excellent handbook entry you should read to understand this better.

How to break the site so even the Restore task will not help?

To make that happen you need to forget about correct upgrade workflow and good habits, use Backup task in Aegir to create a pseudo-safe copy of your site, upload new contrib modules to your new platform and then use Migrate task to move/upgrade your site to the new platform. To make it even worse, you could then try to use Restore task if anything went wrong with the migration and this will either fail completely or it will revert your database, but it will leave the site in the new platform – because Restore task will never move the site back to the old platform, so all your previous backups are immediately useless, and you are locked in the new platform without any chance to rollback automatically.

OK, so what is the correct workflow for sites upgrades in Aegir? It highly depends on how you manage your code, but some general rules are always valid and we will list them below:

  1. Create or choose new platform.
  2. Upload all your contrib modules and themes to the new target platform.
  3. Re-verify the target platform in Aegir.
  4. Clone your live site with some working subdomain in the old platform.
  5. Re-verify old platform and also just cloned site.
  6. Migrate cloned site to the new platform.
  7. Check if the cloned site works without any issues.
  8. If the step 7 above works, you can safely migrate also the live site.

Note: if this is a Drupal core version upgrade, it is always better to split the upgrade in two parts: first migrate to the new core, but using the same contrib modules versions, and then migrate to the final target platform with newer contrib modules. This may help if you experience failed upgrades with mysterious errors like "Call to undefined function node_types_rebuild()" without any hint on which module causes the issue.

How to break the site and lock it in the broken state in a one step?

This sounds frightfully, but it is really easy to do. All you need is to skip the steps with creating and testing the site’s clone in the existing platform first and simply use the Clone task to clone the site directly to the new platform. While this appears to be a handy shortcut, it is in fact one of the best recipes for disaster. Why? Because Aegir will not be able to properly check for all possible issues related to the contrib code and db updates to alert you before running the Clone+Migrate task. Yes, this kind of Clone task is in fact Clone and Migrate in a one step. It is very handy, but only when you already tested it with both old and new platform before. To stay safe, just avoid cloning the site to the different platform directly. Instead, clone it first in the existing platform, then migrate to the new platform, and finally re-verify it.