This site is a static archive of the Aegir community site. Documentation has moved to http://docs.aegirproject.org. Other community resources can be found on the Contacting the community page.
Skip navigation

Revision of first quarterly report of Aegir funding work from Thu, 07/26/2012 - 21:48

Help

first quarterly report of Aegir funding work

Since may 15th, I have been working one full day a week on Aegir development thanks to monthly funding of Aegir development, sponsored jointly by Omega8cc and Koumbit.org. Rather than fixing random (and usually urgent) issues that prop up in our regular use of Aegir, that funding has allowed me to concentrate on actual development. We have now agreed to publish a work log every two months of the results of the sponsorship.

Here is the first report.

1. Executive summary

Koumbit has spent around 48 hours of raw work (excluding breaks and overhead). It is roughly equivalent to 6 hours a week. That is all work where I have been siting in front of a computer or a piece of paper hacking at Aegir. So that's already a huge gain over the previous situation where I didn't have time to work on the project, and is in line with the objectives Koumbit had planned for this project, which is to allow me to work freely around one day a week on Aegir.

After reviewing the work, I am quite satisfied with the results. I have been able to push significant changes in the 2.x codebase, brought the hosting-queue-runner in core and made everything work with Drush 5 properly. I was also able to provide support to the community, something which was lacking in recent times although I have had trouble doing this in the last month because of offline traveling.

2. Notable accomplishments

Most of my work has been to contribute code, comments and ideas back to the community, publicly on Drupal.org and community.aegirproject.org.

All the gory details are on my tracker here:

Here's an outline of significant contributions.

2.1. Issue tracking and queue support

I have tried to start the issue queue review I had been doing in the past. I have reviewed, closed and responded to a lot of issues. Since the beginning of the sponsorship, I have tried to read all issue updates, although since my travel in June, I have a huge backlog of issues to catch up with, which I will try to do in the coming weeks.

Nevertheless, in that first month of followup, I have been able to participate in over 50 issues on Drupal.org. I have also made around 50 commits to the Aegir and Drush projects over that period.

2.2. The Drush sql credentials leak

Issue #671906

I have tried (again) to resolve the serious security issue with the way Drush calls mysql, which leaks the mysql password. I have been able to create an exploit that actually works and demonstrates the problem. We (with ergonlogic) have also found that the archive-dump command is also vulnerable (#1600782).

I have provided a patch to the drush folks, but it was reverted as it was breaking sql-sync and similar remote commands (#1605998). I have tried to fix the patch to workaround that problem, but came to the conclusion that this code is too tangled to be fixed without a significant refactoring.

In other words, I have currently given up on this, out of frustration, but I will try to get back to it as this seems like a really critical issue to fix.

On top of the archive-dump issue, this research also allowed me to discover several more issues with drush:

  • interruptions should launch rollbacks (#590634)
  • Drush doesn't clean up temporary files if interrupted (#1589108)
  • can't create two temporary files (#1603744, needs review)
  • the drush test suites pops up a terminal (#1603688)

2.3. Drush 5 debian package

Issue #1611934

I have uploaded the Drush 5 Debian package into the Debian archive. I have removed the debian from the main drush git repository on Drupal.org to move it to "alioth", the collaborative maintenance server for Debian packages:

That way, people not part of the Drush core team can participate in packaging, see this wiki page for more information.

The main reason for the move, however, was to separate the Debian-specific files from the main repository, allowing me to have different branches for the different Drush releases. Before, there was a special "debian" branch in the drush git repository, which made it difficult to maintain different versions of the package.

Right now, there are two branches in that new repository: the 4.x and master branch, the latter following the upstream master releases (which are currently 5.x).

Now the git repository for the Debian package only contains the debian repository, in what is called a "overlay" setup: the directory is simply deployed over the upstream source, instead of having the git repository contain all the upstream history. This make upgrades trivial, as we do not need to pull and merge the upstream code in our repository.

Since this was done before the Wheezy freeze, this means that Drush 5 will be the first official Drush release to be officially shipped with a Debian release, hurray!

2.4. Provision ACL release

Issue #1585664

I have done a minor release of the Provision ACL module, which fixes ACLs for the webserver, which was a key blocker for the Koumbit infrastructure.

2.5. Provision mergelog published

Provision Mergelog homepage

I have also taken the time to share with the community the code for our "provision mergelog" implementation. To quote the homepage:

This Aegir extension allows you to automate fetching and merging logs From multiple servers part of the same cluster in a single logfile for later analysis with various analysis tools (awstats, webalizer, piwik, etc). It is aimed to be a simple tool that does one thing well.

The code was originally written by Koumbit.org before the subsidy, but the publishing work was done during sponsored time.

2.6. SSL and IP allocation work

Issue #1126640 and related

I dug my head in the ugly SSL and IP allocation code. I have mostly reviewed the proposed patches and issues, and provided a workaround for bug #1603722 (deleting a site doesn't delete its SSL certificate) and reported #1603702 (allows creation of SSL site even if there are no IPs available).

It seems most of our problem revolve around being stuck with the IP allocation system in the backend. To quote the issue:

The IP allocation is done by touching files in the backend now, and From what I understand, there is no good reason for that. This mechanism should be removed and recreated in the frontend, which should pass the allocated IP to the backend, which should add it to the vhost.

This should also be made to support IP-less allocation, for example by allowing "*" to be passed as an IP. We can assume the frontend sends us proper data and just inject this in the vhost.

And actually, the "receipt files" in the backend, which are the current allocation system for IPs/certificates, is a duplicate of a hosting_ip_addresses table that exists in the frontend. We therefore need to do some refactoring there to cleanup this stuff.

2.7. Hosting_queue_runner merged!

Issue #1189556

I have spent a significant amount of time to improve the third-party module hosting_queue_runner. It is now fully integrated in the 2.x series, as hosting_queued. I have updated the upgrade path information accordingly.

This also involved porting the daemon to Drush 5 (#1548490)

Thanks to the Debian package, the init script is also installed automatically on install, which is a nice improvement. The script was also improved to have its "status" argument actually work, which improves interoperability with configuration management tools like Puppet. Those improvements should also resolve the problems with the daemon stalling on upgrade (#1261800). The daemon also now has a "reload" command that just reloads the daemon without stopping it by sending a signal. The daemon is also "niced" by default which should help with the load on Aegir servers.

By shipping a specific init.d script for the Debian package, we are now able to ship a different init.d script, more platform-neutral, for other platforms along with the code. This will improve support for CentOS and Arch Linux, which both had patches waiting to be supported properly (#1335776 and #1493300).

2.8. Aegir core completely ported to Drush 5

Issue #1612042

Thanks to the hard work of tstoeckler, darthsteven and greg.1.anderson, Aegir was already working pretty well with Drush 5, so I didn't really have much work to do here. On top of the port of hosting-queue-runner to Drush 5, I had to work with a verbosity issue with hosting-dispatch that was flooding the terminal, and we can now consider Aegir 2.x to be fully compatible with Drush 5, although we are probably not taking full advantage of the new features of that release.

Things that would need to be pushed forward for better Drush 5 compatibility:

  • declare the options to our drush commands
  • more code reuse (site-install, archive-dump, etc)

This, however, seems to be a lower priority.

2.9. Retiring the drush_make package

With the arrival of Drush 5, the drush_make Debian package became irrelevant, and was therefore removed from the official Debian archive. The package should survive in the squeeze-backports archive, but will not be shipped with wheezy, as it is now part of Drush 5.

2.10. Aegir 2.x upgrade path assured

Upgrade report

I have worked on making sure we can upgrade from the 1.x branch to 2.x. This required some patches to provision-migrate due to new Drush 5 idiosyncrasies (drush now has a commandfile cache that needs to be flushed, see #1612044).

2.11. Nginx testing and cleanup

2.x Install notes

For the first time, I have spent some time to try out the Nginx code on my Debian Wheezy laptop. I have noticed some issues with the way the nginx configuration is managed that depart from the current practices in the project ("tools not policy" and minimal configurations). I have filed a few bug reports about this and provided patches for this on a separate branch (dev-nginx-cleanup) out of the respect for the maintainer of the Nginx code. The issues are:

  • Nginx config include files should be merged into one (#1622846, needs review)
  • remove unused config file (#1635552, needs review)
  • remove duplicate nginx fastcgi params (#1635586, needs review)
  • nginx: do not decide the policy for users (#1635596, needs work)
  • default nginx config doesn't talk to the default php-fpm config in Debian Wheezy and sid (#1635622)

Most of those issues have fixes in the dev-nginx-cleanup branch and are awaiting review.

2.12. Provision backup and migrate optimizations

Issue #1484214

My last trip on the train allowed me to spend some time looking at some optimization work for the provision-migrate i had been postponing for a while. My idea was to move files out of the sites directory, and make provision-backup follow symlinks (or not) depending on whether we are doing a real backup/clone (or a migrate), which would make migrate much faster as it wouldn't copy the files directory.

Unfortunately, this introduced a serious security issue (a symlinks traversal) that would allow a user to hijack the data of any other site in the aegir site. I have reported my findings and solutions in the above issue, but I believe the proper fix is to rewrite provision-migrate significantly so that it doesn't use backup and deploy. This may involve quite a lot of work, so I have postponed this work for now.

I am still looking at how to move files out of the sites directory to facilitate SFTP account management, and this may yet allow migrate optimizations, especially if we rewire the backup system to use zip files, which can be incrementally updated, so would allow backups to be performed from multiple locations.

We should also obviously look at how drush archive-dump does its magic before going to far here.

See also issue #1205458.

2.13. Fix syncing problems with files/

Issue #1083366

I have had the privilege of sitting down with an Aegir contributor (jmcclelland) that actually sent meaningful patches to fix the files/ sync problems with remote servers. Since he is using those patches in production, I figured it was worthwhile to take extra time to look at those patches with him.

The result is that he rerolled a new patch based on my feedback that fixes all the concerns I had with the patch. I have therefore asked the core team to approve an exception to the API freeze to fix this bug that is plaguing remote server support.

I am waiting for feedback from the community (for tests) and for the core team (for the exception) before merging this in.

3. Upcoming work

This is the work I will be looking at doing in the coming months as part of the sponsorship project.

3.1. Publish the 2.x Debian package

Issue #1599606

I have been wanting to do this forever, and we are pretty much ready to do this, we are only missing a separate Debian archive for publishing that code. Since we are reconfiguring the Jenkins server, I figured it was worth waiting a while before setting this up however.

3.2. 2.x redesign

2.x redesign notes document

During my work on the above issues, I had some time to reflect on possible refactoring and redesign of the Aegir core, and ended up formulating a few recommendations. The above document has extensive reflections on the internals of the provision backend and the queuing systems.

I need to basically take a look at this again and figure out the way forward and start pounding out patches for this quite abstract problem.

3.3. Moving intelligence to the spokes

I'm quite eager to start moving more logic down to the spokes, now that we have Debian packages in order. This goes in hand with the 2.x redesign analysis, but it is a practical issue that could be adressed right now, which would be to start looking at having one queue per server, and having those servers run those queues locally instead of having the hub do everything.

The first step for this is the above issue with the files/ directory which should be fixed shortly. The next steps are to move the dispatcher to the backend and start maintaining aliases on each server.

3.4. Nginx cleanup

I would very much like to finish the work I started on this and start merging those patches in. I am waiting for feedback on those issues, so please let me know how those things can be improved!

3.5. Fix security issues in Drush

This is still a priority for me, and so I will look again at the credentials leak in the Drush IPC code.

3.6. Merge the platform auto code

One of the things we have in 2.x is a reference to the "platform auto" module which gets pulled in during the drush_make process. However, I feel that we shouldn't depend on contrib modules in the core makefile, so we should either remove that from the makefile or merge that code. I think we should just merge it. :)

3.7. Subsite support

I really need to take a look at the work mig5 has done to support subdirectory sites.

Need help?

The revisions let you track differences between multiple versions of a post.

Discussion

The discussion area lets your team communicate by posting updates and discussing issues. It is a great place for sharing progress, discussing challenges, and exploring ideas.