by Sundarram P. V.

Sunday, May 03, 2009

Projects to watch out

These are some projects I have been watching for sometime now which can allow you to scale easily and at the same time provide flexibility. Three things that can help one to scale are partitioning, caching and queuing. A lot that happens inside a OS is queuing.

Gearman
It can be looked on as a load-balancer of sorts for your functions. The gearman simply interfaces the client(who wants to get a particular job done) and the worker. The best thing about gearman is, it allows both synchronous and asynchronous job calls. Lets say for instance there is a search query, where you have multiple shards from which you need to consolidate data. Through gearman you can make multiple job calls, and can do the processing in parallel. When all the workers have returned, all you need to do is consolidate the data. When the load is high, it will take time for your responses but the system can still handle it gracefully. When used with a cloud service like EC2, all you need to do is have a service monitor the jobs in queue and create new EC2 worker instances depending on the load. When the load is lower some of those instances can be killed.
What gearman doesnt as of now provide is (in the works):
1. broadcast (sending a message to all the workers) (useful for housekeeping)
2. persistence of unprocessed jobs i.e. saving of jobs after the gearman server has crashed

For using gearman one needs either an neutral encoding system or both client and worker has to use the same platform. They can either choose to use JSON(JavaScript Object Notation) due to its ubiquitous nature. Or they can choose to go for language neutral binary encoding format like

Thrift
This encoding format is used by Facebook for making RPCs. The encoding format supports versioning of data objects and also is language neutral. Its bindings for different language is still under development but is available for c++, PHP, Perl, erlang etc.. It also provides transport apart from deserialization/serialization of objects. The documentation for various language bindings is not great but one can understand by looking at their test code. Thrift has got somewhat influenced by google's protobuf. It also provides an interface for using a custom encoding/decoding format.

ProtoBuf
It is similar to thrift, but the only difference is it doesnt provide transport. It is open sourced by google. Different language bindings are available for this format, and some of them have been developed outside of google.

Both of them generate static code from a definition file for creating classes/types. When used with Gearman, these can provide queued language neutral setup of backend.

0 comments:

___