first quarterly report of Aegir funding work
Since may 15th, I have been working one full day a week on Aegir development thanks to monthly funding of Aegir development, sponsored jointly by Omega8cc and Koumbit.org. Rather than fixing random (and usually urgent) issues that prop up in our regular use of Aegir, that funding has allowed me to concentrate on actual development. We have now agreed to publish a work log every two months of the results of the sponsorship.
Here is the first report.
- 1. Executive summary
- 2. Notable accomplishments
- 2.1. Issue tracking and queue support
- 2.2. The Drush sql credentials leak
- 2.3. Drush 5 debian package
- 2.4. Provision ACL release
- 2.5. Provision mergelog published
- 2.6. SSL and IP allocation work
- 2.7. Hosting_queue_runner merged!
- 2.8. Aegir core completely ported to Drush 5
- 2.9. Retiring the drush_make package
- 2.10. Aegir 2.x upgrade path assured
- 2.11. Nginx testing and cleanup
- 2.12. Provision backup and migrate optimizations
- 2.13. Fix syncing problems with files/
- 3. Upcoming work
1. Executive summary
Koumbit has spent around 48 hours of raw work (excluding breaks and overhead). It is roughly equivalent to 6 hours a week. That is all work where I have been siting in front of a computer or a piece of paper hacking at Aegir. So that's already a huge gain over the previous situation where I didn't have time to work on the project, and is in line with the objectives Koumbit had planned for this project, which is to allow me to work freely around one day a week on Aegir.
After reviewing the work, I am quite satisfied with the results. I have been able to push significant changes in the 2.x codebase, brought the hosting-queue-runner in core and made everything work with Drush 5 properly. I was also able to provide support to the community, something which was lacking in recent times although I have had trouble doing this in the last month because of offline traveling.
2. Notable accomplishments
Most of my work has been to contribute code, comments and ideas back to the community, publicly on Drupal.org and community.aegirproject.org.
All the gory details are on my tracker here:
Here's an outline of significant contributions.
2.1. Issue tracking and queue support
I have tried to start the issue queue review I had been doing in the past. I have reviewed, closed and responded to a lot of issues. Since the beginning of the sponsorship, I have tried to read all issue updates, although since my travel in June, I have a huge backlog of issues to catch up with, which I will try to do in the coming weeks.
Nevertheless, in that first month of followup, I have been able to participate in over 50 issues on Drupal.org. I have also made around 50 commits to the Aegir and Drush projects over that period.
2.2. The Drush sql credentials leak
I have tried (again) to resolve the serious security issue with the way
Drush calls mysql, which leaks the mysql password. I have been able to
create an exploit that actually works and demonstrates the problem. We
(with ergonlogic) have also found that the archive-dump
command is
also vulnerable (#1600782).
I have provided a patch to the drush folks, but it was reverted as it was breaking sql-sync and similar remote commands (#1605998). I have tried to fix the patch to workaround that problem, but came to the conclusion that this code is too tangled to be fixed without a significant refactoring.
In other words, I have currently given up on this, out of frustration, but I will try to get back to it as this seems like a really critical issue to fix.
On top of the archive-dump
issue, this research also allowed me to
discover several more issues with drush:
- interruptions should launch rollbacks (#590634)
- Drush doesn't clean up temporary files if interrupted (#1589108)
- can't create two temporary files (#1603744, needs review)
- the drush test suites pops up a terminal (#1603688)
2.3. Drush 5 debian package
I have uploaded the Drush 5 Debian package into the Debian archive. I
have removed the debian
from the main drush git repository on
Drupal.org to move it to "alioth", the collaborative maintenance server
for Debian packages:
- git://git.debian.org/git/collab-maint/drush.git
- http://git.debian.org/git/collab-maint/drush.git
That way, people not part of the Drush core team can participate in packaging, see this wiki page for more information.
The main reason for the move, however, was to separate the Debian-specific files from the main repository, allowing me to have different branches for the different Drush releases. Before, there was a special "debian" branch in the drush git repository, which made it difficult to maintain different versions of the package.
Right now, there are two branches in that new repository: the 4.x
and
master
branch, the latter following the upstream master
releases
(which are currently 5.x).
Now the git repository for the Debian package only contains the debian
repository, in what is called a "overlay" setup: the directory is simply
deployed over the upstream source, instead of having the git repository
contain all the upstream history. This make upgrades trivial, as we do
not need to pull and merge the upstream code in our repository.
Since this was done before the Wheezy freeze, this means that Drush 5 will be the first official Drush release to be officially shipped with a Debian release, hurray!
2.4. Provision ACL release
I have done a minor release of the Provision ACL module, which fixes ACLs for the webserver, which was a key blocker for the Koumbit infrastructure.
2.5. Provision mergelog published
I have also taken the time to share with the community the code for our "provision mergelog" implementation. To quote the homepage:
This Aegir extension allows you to automate fetching and merging logs From multiple servers part of the same cluster in a single logfile for later analysis with various analysis tools (awstats, webalizer, piwik, etc). It is aimed to be a simple tool that does one thing well.
The code was originally written by Koumbit.org before the subsidy, but the publishing work was done during sponsored time.
2.6. SSL and IP allocation work
Issue #1126640 and related
I dug my head in the ugly SSL and IP allocation code. I have mostly reviewed the proposed patches and issues, and provided a workaround for bug #1603722 (deleting a site doesn't delete its SSL certificate) and reported #1603702 (allows creation of SSL site even if there are no IPs available).
It seems most of our problem revolve around being stuck with the IP allocation system in the backend. To quote the issue:
The IP allocation is done by touching files in the backend now, and From what I understand, there is no good reason for that. This mechanism should be removed and recreated in the frontend, which should pass the allocated IP to the backend, which should add it to the vhost.
This should also be made to support IP-less allocation, for example by allowing "*" to be passed as an IP. We can assume the frontend sends us proper data and just inject this in the vhost.
And actually, the "receipt files" in the backend, which are the current
allocation system for IPs/certificates, is a duplicate of a
hosting_ip_addresses
table that exists in the frontend. We therefore
need to do some refactoring there to cleanup this stuff.
2.7. Hosting_queue_runner merged!
I have spent a significant amount of time to improve the third-party
module hosting_queue_runner
. It is now fully integrated in the 2.x
series, as hosting_queued
. I have updated the upgrade path information accordingly.
This also involved porting the daemon to Drush 5 (#1548490)
Thanks to the Debian package, the init script is also installed automatically on install, which is a nice improvement. The script was also improved to have its "status" argument actually work, which improves interoperability with configuration management tools like Puppet. Those improvements should also resolve the problems with the daemon stalling on upgrade (#1261800). The daemon also now has a "reload" command that just reloads the daemon without stopping it by sending a signal. The daemon is also "niced" by default which should help with the load on Aegir servers.
By shipping a specific init.d script for the Debian package, we are now able to ship a different init.d script, more platform-neutral, for other platforms along with the code. This will improve support for CentOS and Arch Linux, which both had patches waiting to be supported properly (#1335776 and #1493300).
2.8. Aegir core completely ported to Drush 5
Thanks to the hard work of tstoeckler, darthsteven and greg.1.anderson, Aegir was already working pretty well with Drush 5, so I didn't really have much work to do here. On top of the port of hosting-queue-runner to Drush 5, I had to work with a verbosity issue with hosting-dispatch that was flooding the terminal, and we can now consider Aegir 2.x to be fully compatible with Drush 5, although we are probably not taking full advantage of the new features of that release.
Things that would need to be pushed forward for better Drush 5 compatibility:
- declare the options to our drush commands
- more code reuse (
site-install
,archive-dump
, etc)
This, however, seems to be a lower priority.
2.9. Retiring the drush_make package
With the arrival of Drush 5, the drush_make
Debian package became
irrelevant, and was therefore removed from the official Debian
archive. The package should survive in the squeeze-backports archive,
but will not be shipped with wheezy, as it is now part of Drush 5.
2.10. Aegir 2.x upgrade path assured
I have worked on making sure we can upgrade from the 1.x branch to 2.x. This required some patches to provision-migrate due to new Drush 5 idiosyncrasies (drush now has a commandfile cache that needs to be flushed, see #1612044).
2.11. Nginx testing and cleanup
For the first time, I have spent some time to try out the Nginx code on
my Debian Wheezy laptop. I have noticed some issues with the way the
nginx configuration is managed that depart from the current practices in
the project ("tools not policy" and minimal configurations). I have
filed a few bug reports about this and provided patches for this on a
separate branch (dev-nginx-cleanup
) out of the respect for the
maintainer of the Nginx code. The issues are:
- Nginx config include files should be merged into one (#1622846, needs review)
- remove unused config file (#1635552, needs review)
- remove duplicate nginx fastcgi params (#1635586, needs review)
- nginx: do not decide the policy for users (#1635596, needs work)
- default nginx config doesn't talk to the default php-fpm config in Debian Wheezy and sid (#1635622)
Most of those issues have fixes in the dev-nginx-cleanup branch and are awaiting review.
2.12. Provision backup and migrate optimizations
My last trip on the train allowed me to spend some time looking at some optimization work for the provision-migrate i had been postponing for a while. My idea was to move files out of the sites directory, and make provision-backup follow symlinks (or not) depending on whether we are doing a real backup/clone (or a migrate), which would make migrate much faster as it wouldn't copy the files directory.
Unfortunately, this introduced a serious security issue (a symlinks
traversal) that would allow a user to hijack the data of any other
site in the aegir site. I have reported my findings and solutions in
the above issue, but I believe the proper fix is to rewrite
provision-migrate
significantly so that it doesn't use backup and
deploy. This may involve quite a lot of work, so I have postponed this
work for now.
I am still looking at how to move files out of the sites directory to facilitate SFTP account management, and this may yet allow migrate optimizations, especially if we rewire the backup system to use zip files, which can be incrementally updated, so would allow backups to be performed from multiple locations.
We should also obviously look at how drush archive-dump
does its
magic before going to far here.
See also issue #1205458.
2.13. Fix syncing problems with files/
I have had the privilege of sitting down with an Aegir contributor (jmcclelland) that actually sent meaningful patches to fix the files/ sync problems with remote servers. Since he is using those patches in production, I figured it was worthwhile to take extra time to look at those patches with him.
The result is that he rerolled a new patch based on my feedback that fixes all the concerns I had with the patch. I have therefore asked the core team to approve an exception to the API freeze to fix this bug that is plaguing remote server support.
I am waiting for feedback from the community (for tests) and for the core team (for the exception) before merging this in.
3. Upcoming work
This is the work I will be looking at doing in the coming months as part of the sponsorship project.
3.1. Publish the 2.x Debian package
I have been wanting to do this forever, and we are pretty much ready to do this, we are only missing a separate Debian archive for publishing that code. Since we are reconfiguring the Jenkins server, I figured it was worth waiting a while before setting this up however.
3.2. 2.x redesign
During my work on the above issues, I had some time to reflect on possible refactoring and redesign of the Aegir core, and ended up formulating a few recommendations. The above document has extensive reflections on the internals of the provision backend and the queuing systems.
I need to basically take a look at this again and figure out the way forward and start pounding out patches for this quite abstract problem.
3.3. Moving intelligence to the spokes
I'm quite eager to start moving more logic down to the spokes, now that we have Debian packages in order. This goes in hand with the 2.x redesign analysis, but it is a practical issue that could be adressed right now, which would be to start looking at having one queue per server, and having those servers run those queues locally instead of having the hub do everything.
The first step for this is the above issue with the files/ directory which should be fixed shortly. The next steps are to move the dispatcher to the backend and start maintaining aliases on each server.
3.4. Nginx cleanup
I would very much like to finish the work I started on this and start merging those patches in. I am waiting for feedback on those issues, so please let me know how those things can be improved!
3.5. Fix security issues in Drush
This is still a priority for me, and so I will look again at the credentials leak in the Drush IPC code.
3.6. Merge the platform auto code
One of the things we have in 2.x is a reference to the "platform auto" module which gets pulled in during the drush_make process. However, I feel that we shouldn't depend on contrib modules in the core makefile, so we should either remove that from the makefile or merge that code. I think we should just merge it. :)
3.7. Subsite support
I really need to take a look at the work mig5 has done to support subdirectory sites.
No recent comments found.