This site is a static archive of the Aegir community site. Documentation has moved to http://docs.aegirproject.org. Other community resources can be found on the Contacting the community page.
Skip navigation

How this site crashed when it was last verified - lessons learned from migrating a "standard site"

Help

How this site crashed when it was last verified - lessons learned from migrating a "standard site"

We just had a pretty nasty outage on this site, about an hour ago (thanks to smthomas on IRC for the heads up!). All modules were disabled and the frontpage was unthemed and saying only "Page not found".

Ouch.

While distributions can make hosting and porting sites between providers a breeze (I didn't have to guess what Drupal modules I needed to install to import the site), it can have some tricky implications most people are not used to when doing certain operations. I have taken some time to explain here how I fixed this and how to avoid the problem in the future so that others can profit from that nasty experience...

After freaking out (Koumbit is the one taking care of hosting this site, so those things often fall on me), I started looking at the problem, and found out that all modules, or more precisely, all OpenAtrium-related modules were disabled. After a "what the heck" headbang, I realized I had the exact same issue back when we deployed the site during the original migration. Back then, the problem was that I didn't pass the --profile=openatrium option to provision-deploy when installing the new site. That, in turn, made the update script disable all OpenAtrium modules, because the Drupal bootstrap can't find them, because they are in the profiles/openatrium/modules/...

(Now you're supposed to have that "aaaaaaah... i seeeeee!!" moment.)

I fixed our procedure (in french, a bit chaotic) by adding the --profile option to the provision-deploy call, which fixed it during that original deployment. But then, the site got imported, and it picked up the profile not from the settings.php (which was correctly configured by provision-deploy) but from the alias, which was created earlier with provision-save, and which then defaulted to the default install profile. On import, the site node was created to the frontend with the default profile.

So when the site was verified after a migrate, the profile was reset back to default again and all modules were disabled when the cache was cleared.

The proper way of doing that deployment was to set the profile right in the alias (through provision-save) in the first place - that way all would have been right.

If you ever end up in a similar situation, you have a few options:

  1. restore from a backup - not possible, we had live data in there and changes since the last backup
  2. enable all missing modules manuall - yuck: how do I know? i can look at "Disabled" (as opposed to "Not installed") modules in drush pm-list, but that's not really reliable
  3. partially restore from backup - what I ended up doing

I took the original backup that was used to deploy the site in the first place and extracted the database.sql:

tar zxf backup.tgz ./database.sql

Then I edited the file (in vi!!) to remove everything but the system table instructions (including the DROP TABLE system). And I loaded the dump in the site:

drush @community.aegirproject.org sqlc < database.sql

That way, the system table, and only the system table, was restored from backups and all modules were back in their original enabled/disable state. A little cache clear (drush @community.aegirproject.org cc all) and the site was fully functional again.

Bottomline: be careful with distributions when you move them around. Setting the profile in settings.php and in provision-save is essential for the site to work.

Need help?

Discussion

The discussion area lets your team communicate by posting updates and discussing issues. It is a great place for sharing progress, discussing challenges, and exploring ideas.