community.aegirproject.org
Messaging systems evaluation
So we've been talking seemingly forever about possible queuing system instead of our crappy "PHP/Vixie Cron based" one, but we have yet to document pro/cons of possible alternatives. Here are the goods (or the beginning of em).
Note that this document was written before (in november 2009) the current multiserver implementation was designed and completed.
Current design
Tasks are basically serialized in a MySQL database and polled at regular interval by a cron job, running drush hosting dispatch
, which bootstraps Drupal, fetches the object, and fires up provision commands.
The possibility here is to have another dispatcher that would always run, and maybe not even talk to the mysql database, to fire up the provision commands.
Another requirement is that the queuing system may be network-aware. We want multi-server support and that is a critical issue for the next release (0.4 at the time of writing). One thing to consider is how we pass files around...
Roll our own
Pro:
- full control over the API, functionality and whatnot (ie. function of manpower invested)
Con:
- rolling our own is hard! let's go shopping!
Meta: AMQP vs JMS
So yeah, there's a standard in development for all this stuff: http://www.AMQP.org/
JMS on the other hand is the "Java Messenging Service", that creates a language- (as opposed to network-) based API.
Maybe we should just not care about which messaging system you use in the backend and allow our frontend to talk to any backend, with the stupid cronjob mysql poller as a default...
Gearman
Pros (mostly out of the home page):
- supports PHP in frontend and backend
- Multi-language - There are interfaces for a number of languages, and this list is growing. You also have the option to write heterogeneous applications with clients submitting work in one language and workers performing that work in another.
- Flexible - You are not tied to any specific design pattern. You can quickly put together distributed applications using any model you choose, one of those options being Map/Reduce.
- Fast - Gearman has a simple protocol and interface with a new optimized server in C to minimize your application overhead.
- Embeddable - Since Gearman is fast and lightweight, it is great for applications of all sizes. It is also easy to introduce into existing applications with minimal overhead.
- No single point of failure - Gearman can not only help scale systems, but can do it in a fault tolerant way.
- Production ready - ran on digg, yahoo, livejournal, ...
- Gearman uses an efficient binary protocol and no XML. There's an a line-based text protocol for admin so you can use telnet and hook into Nagios plugins.
- Persistent queues are available using mysql, memcached or sqlite
- There's pretty good PHP support and it's easy to get up and running quickly.
Cons:
- "somewhat higher latency, signal-and-pull architecture"
- The system makes no guarantees. If there's a failure the client is told about the failure and the client is responsible for retries.
- gearman-specific binary/network protocol
- Notice how sixapart.com maintains both gearman and the shwartz, below
OHLOH:
- Mostly written in Perl
- Increasing year-over-year development activity
- Established codebase
- Few source code comments
Resque
Pro:
- redis-based (therefore somehow language agnostic deep in there, but certainly network-centric)
- github's (aka: it's got to work at least somewhere in production)
Con:
- ruby-centric
- I haven't actually looked at the thing or tested it
Active MQ
Apache ActiveMQ:
Pros:
- Supports a variety of Cross Language Clients and Protocols from Java, C, C++, C#, Ruby, Perl, Python, PHP
- Supports many advanced features such as Message Groups, Virtual Destinations, Wildcards and Composite Destinations
- Supports pluggable transport protocols such as in-VM, TCP, SSL, NIO, UDP, multicast, JGroups and JXTA transports
- REST API to provide technology agnostic and language neutral web based API to messaging
- Persistent
Cons:
- uses too many acronyms on the frontpage (really.)
- uses a lot of Java jargon
- production-ready?
OHLOH:
- Mostly written in Java
- Large, active development team
RabbitMQ
Pros:
- "high reliability, availability and scalability along with good throughput and latency performance that is predictable and consistent"
- "based on the emerging AMQP standard"
- in debian
Cons:
- Mostly written in Erlang
- Large, active development team
- Well-commented source code
Open Messenging Queue
https://mq.dev.java.net/overview.html
Pros:
- SOAP/HTTP interface
- Scalable
Con:
- Java/C specific
The Schwartz
Pros:
- Perl (come on, perl is everywhere... isn't it?)
- "reliable job queue"
- has retry
- "lightweight"
Cons:
- Perl (aka write only)
- "library, so some assembly required"
- "light on documentation"
- Notice how sixapart.com maintains both the shwartz and gearman, above
Open AMQ
Pros:
- Multi-language (but not PHP)
- "learn in a day or so" then "another day or two for wireAPI", development will take "some weeks"...
- "remote admin tools, one-line failover, instant federation, protection against slow clients, detailed logging"
- AMQP standard
- "Linux, AIX, Solaris, Mac OS/X, other UNIX"
Cons:
- "for C/C++ and JMS"
- Mostly written in C
- Well-commented source code
- Short source control history
- Only a single active developer
- Apache Software License may conflict with GPL
- Apache License 2.0 may conflict with GPL
Beanstalkd
http://kr.github.com/beanstalkd/
Drupal Module for Drupal 6 and 7 is at http://drupal.org/project/beanstalkd
Pros: * used in production ("causes" application in facebook)
Cons:
- C
http://www.ohloh.net/p/beanstalkd
- Mostly written in C
- Large, active development team
- Few source code comments
SimpleMQ
http://code.google.com/p/simple-mq/
Pros:
- persistent or in-memory
Cons:
- java-only.
Zero MQ
Pros:
- fast
- lightweight messaging implementation.
- supports different messaging models.
- already very fast. We're getting 13.4 microseconds end-to-end latencies and up to 4,100,000 messages a second today.
- very thin. Requires just a couple of pages in resident memory.
- provides C, C++, Java, Python, .NET/Mono, Ruby, Fortran, COBOL, Tcl, Lua and Delphi language APIs.
- supports different wire-level protocols: TCP, PGM, AMQP, SCTP.
- runs on AIX, FreeBSD, HP-UX, Linux, Mac OS X, OpenBSD, OpenVMS, QNX Neutrino, Solaris and Windows.
- supports i386, x86-64, Sparc, Itanium, Alpha and ARM microarchitectures.
- fully distributed: no central servers to crash, millions of WAN and LAN nodes.
Cons:
- no PHP whatsoever
- production?
OHLOH:
- Mostly written in C++
Others to evaluate?
- Amazon's SQS. The latencies for this service tend to be high and variable so it may not be appropriate for all tasks.
- Spread Queue ([[http://search.cpan.org/~jmay/Spread-Queue-0.4/Queue.pod|alpha software]]
- Starling (apparently, twitter ditched starling for scala)
- Houston From Zivtech, not much documentation yet
Others mentionned here: ActiveMessaging, BackgroundJob, DelayedJob, and Kestrel.
Requirements
All the above must have:
- Open Source - It's free as in beer and speech
- Login or register to post comments
- Print entire section
- Talk
#1
zeromq now has php bindings http://zeromq.org/bindings:php and is used by production products like http://saltstack.org. It is also super simple to administer as it isn't a separate system that the sys admin would have to stand up.