The Road Ahead...

by Sundarram P. V.

Sunday, May 03, 2009

Projects to watch out

These are some projects I have been watching for sometime now which can allow you to scale easily and at the same time provide flexibility. Three things that can help one to scale are partitioning, caching and queuing. A lot that happens inside a OS is queuing.

Gearman
It can be looked on as a load-balancer of sorts for your functions. The gearman simply interfaces the client(who wants to get a particular job done) and the worker. The best thing about gearman is, it allows both synchronous and asynchronous job calls. Lets say for instance there is a search query, where you have multiple shards from which you need to consolidate data. Through gearman you can make multiple job calls, and can do the processing in parallel. When all the workers have returned, all you need to do is consolidate the data. When the load is high, it will take time for your responses but the system can still handle it gracefully. When used with a cloud service like EC2, all you need to do is have a service monitor the jobs in queue and create new EC2 worker instances depending on the load. When the load is lower some of those instances can be killed.
What gearman doesnt as of now provide is (in the works):
1. broadcast (sending a message to all the workers) (useful for housekeeping)
2. persistence of unprocessed jobs i.e. saving of jobs after the gearman server has crashed

For using gearman one needs either an neutral encoding system or both client and worker has to use the same platform. They can either choose to use JSON(JavaScript Object Notation) due to its ubiquitous nature. Or they can choose to go for language neutral binary encoding format like

Thrift
This encoding format is used by Facebook for making RPCs. The encoding format supports versioning of data objects and also is language neutral. Its bindings for different language is still under development but is available for c++, PHP, Perl, erlang etc.. It also provides transport apart from deserialization/serialization of objects. The documentation for various language bindings is not great but one can understand by looking at their test code. Thrift has got somewhat influenced by google's protobuf. It also provides an interface for using a custom encoding/decoding format.

ProtoBuf
It is similar to thrift, but the only difference is it doesnt provide transport. It is open sourced by google. Different language bindings are available for this format, and some of them have been developed outside of google.

Both of them generate static code from a definition file for creating classes/types. When used with Gearman, these can provide queued language neutral setup of backend.

Saturday, April 11, 2009

OOPS in Javascript: Part III

Apart from using inheritance and prototype for defining an object, there is closure which is strong and often misused feature in javascript. It is one feature I love and knowingly or unknowingly everyone would have used this feature. It is something like whenever a function is defined inside another function, all the variables inside the parent function exists even after the parent function has returned. It is something in the lines of creating a parallel world each time you invoke a function.


/**
* A simple example of closure
*/
var rank = 0;
function greet(name) {
var my_rank = ++rank;
/* adding a anonymous function to greet person after 5 seconds*/
window.setTimeout(function() {
alert("hi " + name + " you came in " + my_rank);
}, 5000);
}

greet("ram");
greet("geetha");


To create a closure, two things are needed,
1. Creation of a function inside a parent function.
2. Send the reference of child function outside the scope of parent function.

Firstly all the variables declared inside a function can be accessed only inside the function, unless explicitly it is passed as a reference outside. In case of closure, all those variables the child function could access will live until its reference is not deleted.
Javascript as you know, doesnt want you to manage memory and uses garbage collection to free memory. All that the garbage collector is looking for is reference count of each instance, and in this particular case my_rank, name and also the anonymous function. It will not release these resources until there is a setTimeout trigger, and it removes the reference of this anonymous function.


class2 = function () {

};

class2.prototype = {
manager : function (params, postcallback) {
var my_dailog;
var usr_callback = function(filename) {
//make ajax call and get the content
my_dailog.hide();
};
//get user input from dailog,
my_dailog = new Dailog({
message : "Enter the name of the file",
callback : usr_callback
});
my_dailog.show();
},
req : function() {
var my_div = document.getElementById('status');
var content_div = document.getElementById('content');

my_div.innerHTML = "working...";

var callback = function(params) {
my_div.innerHTML = "Done.";
content_div.innerHTML = params;
};
this.manager({
url : "http://www.ajaxian.com"
}, callback);
}
};

var ob = new class2();
ob.req();

If for some reason, setTimeout's internal code fails to remove anonymous function's reference or doesnt have a mechanism to release its reference, all these variable will be like a zombie or in technical terms leaked memory. The problem will multiply when this function is called a hundred times, boggling down a lot of memory. This memory will never be claimed back until there is a window.unload. In the above example, all we are doing is try to get the name of the file so that it can be updated. Both manager and the req, functions are using closures, and if not handled properly can and will lead to a memory leak. The leak will only occur when there is a reference to anonymous function present outside of its scope. IE's internal handling of closures is somewhat different when compared with that of Firefox or Safari and also leaks a lot of memory.
Memory leak is not a serious problem for any page which doesnt live for a longer period of time. Detection of leaks are tricky with any language, and its very difficult to debug in complex applications. It is bad for an application like gmail which runs throughout the day. It is a good idea to reload the page over a period of time.

Wednesday, August 27, 2008

My experiments with ImageMagick

I have been experimenting with ImageMagick for quite some time now. Its one of those great libraries one couldn't manage without and the best part is it just works. I had to write an implementation on top of ImageMagick, to draw styled text on image. Which i thought was pretty straight forward, and started to dig on the api docs of ImageMagick. In the end i finally found that, there was no implementation or function in api to wrap have different style of text in the same line. No big deal right, but then there are lot of subtleties which one will not realize until he/she is deep in it.
I then went over my normal ritual of getting the pseudo code(comments, i cant work without them). First i thought the only thing i ll have to do would be to wrap the text, and then set the height which is maximum for any style. There also was a straight forward api function call in ImageMagick (QueryMultilineFontMetrics). So i started not knowing what i was getting into. The wrap logic was something like,

1. you get the width of all the visible characters by making the api call,
2. now get all the words in the sentence/ paragraph
3. for each of those word add the width for each character and try to figure out whether it has surpassed the line width or not.
4. if it has start over on the next line, calculate the height needed.

To describe the above solution in one word, it is specious and it didnt work. Some of the characters were going outside the boundary. I didnt know a thing about typography and typefaces which in a way increased my curiosity. Here I was coding, without really knowing any fundamentals about the internals. I decided to give programming a pass and started reading about typeface and things related with it.
I have been using computer for more than a decade now, but one thing that has never interested me is fonts. I was finally coming to terms with my demons. As i read more about fonts, it became clear that there was a lot of thought, logic and research gone into it, so much so that it was a science as well as creative. I feel it is one of those things which stares right on our face, but we just dont realize it. The fonts are mainly categorized into monospaced (ones we see in code editors) i.e. all characters are equally spaced and have equal widths and what we normally write in paper is proportional typeface i.e. 'i' has a smaller width compared to 'W'. There is something else apart from these in the font, which created the bug in my code. They are the small pauses between two characters when they are rendered. My algorithm didnt include kerning(this is what the that hole is called) for calculations and resulted in text bleeding out. I had to slightly change my approach, and had to include this along, but since the font could be proportional or monospaced you dont really know that width. The only way you can do that is by making the query fonts call for the whole word, and use it for all the calculations. Apart from that i also found that QueryFontMetrics was trimming the space, and it had to be found out in a round about way.


use strict;
use warnings;
use Image::Magick;


my ($my_styles);
$my_styles = {
style1 => {
font => "Arial",
color => "#000000",
size => "20"
},
style2 => {
font => "Arial",
color => "#00ff00",
size => "25"
}
};
#returns the image magick object after rendering the text, with given height and width.
#it truncates the paragraph after reaching the height limit.
#wraps the text, and also can support multiple styles in the same line
sub draw_styled_text {
my ($text, $height, $width, $default_style) = @_;
my $im = Image::Magick->new();
my $last_pos = 0;
my ($x, $y) = (0, 0);
my %lines;

$im->Set(
size => $width . "x" . $height
);
$im->Read('xc:none');
#format
#sentence%%$$style_name$$sentence%%sentence
#for each styled sentence
my @sentences = split /\%\%/, $text;
for my $sentence (@sentences) {
my ($style_name, $style_details);
$style_name = $default_style;

if ($sentence =~ /\$\$(.*)\$\$([\w\W]*)/) {
#get the style name,
$style_name = $1;
$sentence = $2;
}
#check length of the string
unless (length $sentence) {
next;
}

#get the style details,
$style_details = get_style_details($style_name);
$im->Set(
font => $style_details->{'font'},
pointsize => $style_details->{'pointsize'}
);
#wrap text
my ($new_text, $line_height);
my $space_width = get_space_metrics($im);
#annotate trims text. to avoid that check the no. of spaces in the begining and in the end.
#space in the begining of the string
if ($sentence =~ /^( +).*/) {
$last_pos = $last_pos + ($space_width * length($1));
$x = $last_pos;
}

($new_text, $last_pos, $line_height) = Wrap($sentence, $im, $width, $last_pos);
#space at the end of the string.
if ($sentence =~ /.*( +)$/) {
$last_pos = $last_pos + ($space_width * length($1));
}

#for each line
for my $line (split /\n/, $new_text) {
$y = $line_height unless ($y);
my $arr;
if (defined $lines{$y}) {
$arr = $lines{$y};
} else {
$arr = ();
}
#write the line
push @$arr, {
x => $x,
y => $y,
text => $line,
color => $style_details->{'color'},
font => $style_details->{'font'},
pointsize => $style_details->{'pointsize'},
line_height => $line_height
};
$lines{$y} = $arr;

$x = 0;
$y = $y + $line_height;
}
$x = $last_pos;

if ($new_text !~ /.*\n$/) {
$y = $y - $line_height;
} else {
$x = $last_pos = 0;
}

}

my $adjustment_y = 0;
for my $y(sort {$a <=> $b} keys %lines) {
my $arr = $lines{$y};
my $max_height = 0;
my $considered_height = 0;

for my $line (@$arr) {
if ($max_height < $line->{'line_height'}) {
$max_height = $line->{'line_height'};
}
unless ($considered_height) {
$considered_height = $line->{'line_height'};
}
}
if ($considered_height != $max_height) {
$adjustment_y += $max_height - $considered_height;
}
$y += $adjustment_y;
for my $line (@$arr) {
$im->Annotate(
font => $line->{'font'},
fill => $line->{'color'},
pointsize => $line->{'pointsize'},
text => $line->{'text'},
x => $line->{'x'},
y => $y,
);
}
}
#return
return $im;
}

#im doesnt support space
sub get_space_metrics {
my ($im) = @_;
my $val = 0;
#first get for a
my $a_width = ($im->QueryFontMetrics(text => "a"))[4];
my $a_with_space_width = ($im->QueryFontMetrics(text => "a a"))[4];
$val = $a_with_space_width - (2 * $a_width);
return $val;
}

sub Wrap {
my ($text, $img, $maxwidth, $lastpos) = @_;
my (@newtext, $pos);
$pos = $lastpos || 0;
my (@words, @lines, @char, $i, $word);
@char = split //, $text;
$i = 0;
$word = "";

for my $c (@char) {
$i++;
#get the word
if ($c eq " " || $c eq "-" || $c eq "\n") {
push @words, $word;
} else {
$word = $word . $c;
if ($i >= scalar @char) {
push @words, $word;
} else {
next;
}
}
$word = "";
#measure the length of current line
my @metrics = $img->QueryFontMetrics(text => join("", @words));
if (($pos + $metrics[4]) > $maxwidth) {
$word = pop @words;
push @lines, join("", @words), "\n";

$pos = 0;
@words = ();
#check whether the size of the word is greater than max width
if ($word && ($img->QueryFontMetrics(text => $word))[4] > $maxwidth) {
#force a line break and -
my $splitword = "";
for my $ch (split //, $word) {
#check whether the set is greater than max width
if (($img->QueryFontMetrics(text => "$splitword$ch-"))[4] > $maxwidth) {
push @lines, "$splitword-", "\n";
$splitword = "";
} else {
$splitword .= $ch;
}
}
if ($splitword) {
$word = $splitword;
}
}
#push it to the current line
if ($word) {
push @words, $word;
$word = "";
}
}
#add the white space
if (scalar @words) {
$words[-1] .= $c if ($c eq " " || $c eq "-");
if ($c eq "\n") {
push @lines, join("", @words), "\n";
@words = ();
}
}
}
#push the last remaining words
push @lines, join("", @words);
my $str = join "", @lines;
my @arr = split /\n/, $str;
#try to measure the width of the last line
my @metrics = $img->QueryMultilineFontMetrics(text => $arr[-1]);
$pos = $metrics[4];
my $line_height = $metrics[5];
unless (scalar @arr > 1) {
$pos += $lastpos;
}
return ($str, $pos, $line_height);
}

sub get_style_details {
my ($style_name) = @_;
return $my_styles->{$style_name} if defined $my_styles->{$style_name};
die "style doesnt exist";
}


P.S.ttt...:
I couldnt find a lot of solutions in web to suite my requirements, and that's one of the reason why i am sharing the solution. The code though is not neat, and there could be other optimizations that can be done, but at the very least it works and can be used as a base for further refinements.

download the source code here: Perl WordWrap for ImageMagick

Friday, June 06, 2008

Five9s Availability - Is it too much to ask for?

Imagine logging into GMail, and suddenly getting a Oil change page. No way thats gonna happen now right?? Nearly 4 years back I used to see this page quite often, but over a period of time GMail has certainly matured as a product. They are still adding features seamlessly and making releases without any downtime of the service. That makes me look in awe at many services which achieve this and I sometimes wonder what it takes to achieve 99.999(Five9s) i.e. approximately 6 minutes of downtime in a year.


Message from ABC: OOPS, No donuts for you.
Developer of ABC: The possibility of this is infinitesimally small and guess what, Shit happens!!!
User: Damn!!! It always happens when I am in middle of something important
But hey why go through such a ordeal.. Let me try XYZ...

Real donuts for guessing the service ;)

Why?

For starters, if there are lot of outages users will loose trust in the service and might start looking at it as a liability. Twitter is a classic example of that. I would hate to see GMail go down while i am using it. In the Web 2.0 world there are lot of 'me too' services to take your place, the only differentiator can in many instance be speed and availability of your service.
This is normally in a later stage of a start-up as feature takes precedence over availability. But a use of little common sense initially will prolong the availability issues.

Why measure?

It is always better to measure than not to. It can be used to compare the availability of service month over month and progress made in terms of availability. In extreme cases, one can boast in some scalability and availability conference. ;)

Why Five9s and not any other number?

Six minutes of downtime is not very huge and not very small, though this depends on the type of outage and service. Choose a number that suits your goals. It is also a measure of availability of service during its business hours(24x7 for most web 2.0 startups). It is a number which is quite difficult to achieve but not impossible. Every year very few services achieve Five9s. Personally I look at it as a benchmark as only a selected few make it to this league.

The downtimes can be categorized into two, viz. predictable and unpredictable ones.

1. Unknown/Unpredictable ones
a. DB/web server needing a restart(often with windows based environs.)
b. Hardware failures

The service should have at least have two point of failures for the whole service to fail completely. It pays to have redundancy but its expensive too. RAID depending on the configuration will only protect the data, but the time required to recover from such a disaster and to go up online again will be huge. Its better to have another redundant h/w and s/w which is hot swappable. This will require a lot of design and architecture considerations. Its always better to assume that some failure like this will always happen and being prepared to tackle it. It is not financially viable for many early stage startups to have redundancy for everything, but being prepared for such a eventuality will not hurt the company.

c. A spike in traffic, taking everything down with it.
Its difficult to be prepared for spike, it is given that there will at least be one spike in a year you cant handle. Having a architecture and design for rainy days will pay off in the longer run. Services like Amazon EC2 will certainly help in handling spike, but it all depends on the preparedness for a spike.

d. Some goofy in datacenter, restarts the server by accident (I am not making it up. it has happened twice.)
To avoid such a scenario don't have your server in India, unless you are having redundant counterpart elsewhere. Not kidding!!! India is light years behind for providing any serious hosting services. If at all you want to have a server in India for various reasons, avoid resellers. If the number of servers are more go for a co-location.

continued...

Sunday, November 04, 2007

OCC Mumbai Meet

Off late I have come to notice the number of startups in India are increasing. Last Year the number of Startups could be counted, but now there is a explosion in the number of startups. Thanks mainly to Web 2.0
Here are some of the pics of OCC and Barcamp Mumbai I had visited.

Thursday, October 11, 2007

Funny!!!



I found this advertisement while reading a blog, and i found this add by google funny.....

Saturday, October 06, 2007

The First Step

September 3rd:
0825 hrs: 30 pair of eyes zeroin on a location, and are carefully calculating their next move. From the look of their eyes one could tell that they wont hesitate to kill if it is necessary. Chaos erupts as it comes to a halt and people are in a frenzy to board. With room only for a dozen, somehow all 29 of them make it. My jaws dropped, my shoulders sagged and my legs felt loose when I suddenly realised that I have to go in such a train daily to reach my office.
0855 hrs:
After watching 4 trains pass by, I summoned all the strength and courage to actually approach the train. I was pushed aside by a set of very decent looking hooligans when the next train came. After that movement, I really became one of them and any one could see the same fire in my eyes.
I finally got in and felt elevated about the herculian task I had just finished, but to my dismay I found the challenge had just began as I was standing in way of people who wanted to get down
at the next station.

October 1st:
0840 hrs: Nearly ##(couldnt count because of my concentration) pair of eyes zoomin on a location
and I made it(dont know about the rest as I was the first).

After a month, I started enjoying the whole experience. In retrospect I feel, the first step is often the most difficult and the most important one to take. Things do get easier as time goes by.

I dont know when I will be taking my first step.

Sunday, September 30, 2007

Changed Address!!!!

The idea of buying my domain had been there in the back of my mind for quite some time.
Finally I bought one and have changed the blog address from pvsun.blogspot.com to blog.pvsundarram.com.
All the old links will still work!!!!.
Looking forward to buy more in the future....

Missed it this time!!!

I went for a walk in the ground near my house, and I found lots of people working on something huge. After looking more closely, I couldnt determine what they were really trying to do. As Ganesh chathurthy was nearing, the artists were adding finishing touches to what they were doing. I could hardly believe my eyes, when I saw what they did just after 10 days. It was like a full blown ship!!!!

This photo was taken at 00:30 hrs from a distance of more than 400 meters and the place was still buzzing with people.

If you are wondering where this photo is coming from, you can check zoom.in.
Tags:

___