Messaging systems evaluation

So we've been talking seemingly forever about possible queuing system instead of our crappy "PHP/Vixie Cron based" one, but we have yet to document pro/cons of possible alternatives. Here are the goods (or the beginning of em).

Note that this document was written before (in november 2009) the current multiserver implementation was designed and completed.

Current design

Tasks are basically serialized in a MySQL database and polled at regular interval by a cron job, running drush hosting dispatch, which bootstraps Drupal, fetches the object, and fires up provision commands.

The possibility here is to have another dispatcher that would always run, and maybe not even talk to the mysql database, to fire up the provision commands.

Another requirement is that the queuing system may be network-aware. We want multi-server support and that is a critical issue for the next release (0.4 at the time of writing). One thing to consider is how we pass files around...

Roll our own

Pro:

  • full control over the API, functionality and whatnot (ie. function of manpower invested)

Con:

  • rolling our own is hard! let's go shopping!

Meta: AMQP vs JMS

So yeah, there's a standard in development for all this stuff: http://www.AMQP.org/ - also a module being developed to work with it: http://drupal.org/project/amqp

JMS on the other hand is the "Java Messenging Service", that creates a language- (as opposed to network-) based API.

Maybe we should just not care about which messaging system you use in the backend and allow our frontend to talk to any backend, with the stupid cronjob mysql poller as a default...

Gearman

Pros (mostly out of the home page):

  • supports PHP in frontend and backend
  • Multi-language - There are interfaces for a number of languages, and this list is growing. You also have the option to write heterogeneous applications with clients submitting work in one language and workers performing that work in another.
  • Flexible - You are not tied to any specific design pattern. You can quickly put together distributed applications using any model you choose, one of those options being Map/Reduce.
  • Fast - Gearman has a simple protocol and interface with a new optimized server in C to minimize your application overhead.
  • Embeddable - Since Gearman is fast and lightweight, it is great for applications of all sizes. It is also easy to introduce into existing applications with minimal overhead.
  • No single point of failure - Gearman can not only help scale systems, but can do it in a fault tolerant way.
  • Production ready - ran on digg, yahoo, livejournal, ...
  • Gearman uses an efficient binary protocol and no XML. There's an a line-based text protocol for admin so you can use telnet and hook into Nagios plugins.
  • Persistent queues are available using mysql, memcached or sqlite
  • There's pretty good PHP support and it's easy to get up and running quickly.

Cons:

  • "somewhat higher latency, signal-and-pull architecture"
  • The system makes no guarantees. If there's a failure the client is told about the failure and the client is responsible for retries.
  • gearman-specific binary/network protocol
  • Notice how sixapart.com maintains both gearman and the shwartz, below

OHLOH:

  • Mostly written in Perl
  • Increasing year-over-year development activity
  • Established codebase
  • Few source code comments

Resque

Pro:

  • redis-based (therefore somehow language agnostic deep in there, but certainly network-centric)
  • github's (aka: it's got to work at least somewhere in production)

Con:

  • ruby-centric
  • I haven't actually looked at the thing or tested it

Active MQ

Apache ActiveMQ:

Pros:

  • Supports a variety of Cross Language Clients and Protocols from Java, C, C++, C#, Ruby, Perl, Python, PHP
  • Supports many advanced features such as Message Groups, Virtual Destinations, Wildcards and Composite Destinations
  • Supports pluggable transport protocols such as in-VM, TCP, SSL, NIO, UDP, multicast, JGroups and JXTA transports
  • REST API to provide technology agnostic and language neutral web based API to messaging
  • Persistent

Cons:

  • uses too many acronyms on the frontpage (really.)
  • uses a lot of Java jargon
  • production-ready?

OHLOH:

  • Mostly written in Java
  • Large, active development team

RabbitMQ

RabbitMQ

Pros:

  • "high reliability, availability and scalability along with good throughput and latency performance that is predictable and consistent"
  • "based on the emerging AMQP standard"
  • in debian
  • Drupal module - AMQP

Cons:

  • erlang? do you know erlang? Not really necessary, since all publishers and consumers can be written in other languages that have AMQP support.

OHLOH

  • Mostly written in Erlang = very fast!
  • Large, active development team
  • Well-commented source code

Open Messenging Queue

https://mq.dev.java.net/overview.html

Pros:

  • SOAP/HTTP interface
  • Scalable

Con:

  • Java/C specific

The Schwartz

The Schwartz

Pros:

  • Perl (come on, perl is everywhere... isn't it?)
  • "reliable job queue"
  • has retry
  • "lightweight"

Cons:

  • Perl (aka write only)
  • "library, so some assembly required"
  • "light on documentation"
  • Notice how sixapart.com maintains both the shwartz and gearman, above

Open AMQ

Open AMQ

Pros:

  • Multi-language (but not PHP)
  • "learn in a day or so" then "another day or two for wireAPI", development will take "some weeks"...
  • "remote admin tools, one-line failover, instant federation, protection against slow clients, detailed logging"
  • AMQP standard
  • "Linux, AIX, Solaris, Mac OS/X, other UNIX"

Cons:

  • "for C/C++ and JMS"

OHLOH

  • Mostly written in C
  • Well-commented source code
  • Short source control history
  • Only a single active developer
  • Apache Software License may conflict with GPL
  • Apache License 2.0 may conflict with GPL

Beanstalkd

http://kr.github.com/beanstalkd/

Drupal Module for Drupal 6 and 7 is at http://drupal.org/project/beanstalkd

Pros: * used in production ("causes" application in facebook)

Cons:

  • C

http://www.ohloh.net/p/beanstalkd

  • Mostly written in C
  • Large, active development team
  • Few source code comments

SimpleMQ

http://code.google.com/p/simple-mq/

Pros:

  • persistent or in-memory

Cons:

  • java-only.

Zero MQ

http://www.zeromq.org/

Pros:

  • fast
  • lightweight messaging implementation.
  • supports different messaging models.
  • already very fast. We're getting 13.4 microseconds end-to-end latencies and up to 4,100,000 messages a second today.
  • very thin. Requires just a couple of pages in resident memory.
  • provides C, C++, Java, Python, .NET/Mono, Ruby, Fortran, COBOL, Tcl, Lua and Delphi language APIs.
  • supports different wire-level protocols: TCP, PGM, AMQP, SCTP.
  • runs on AIX, FreeBSD, HP-UX, Linux, Mac OS X, OpenBSD, OpenVMS, QNX Neutrino, Solaris and Windows.
  • supports i386, x86-64, Sparc, Itanium, Alpha and ARM microarchitectures.
  • fully distributed: no central servers to crash, millions of WAN and LAN nodes.

Cons:

  • no PHP whatsoever
  • production?

OHLOH:

  • Mostly written in C++

Others to evaluate?

Others mentionned here: ActiveMessaging, BackgroundJob, DelayedJob, and Kestrel.

Requirements

All the above must have:

  • Open Source - It's free as in beer and speech

#1

zeromq now has php bindings http://zeromq.org/bindings:php and is used by production products like http://saltstack.org. It is also super simple to administer as it isn't a separate system that the sys admin would have to stand up.