Perl profiling

| No Comments | No TrackBacks
First of all, I'd like to say several words about profiling.
"Measure, dont't guess" is a main principle of performance optimization. So, if you see, that your scipt/application/whatever is running slow, you should profile it. Perl gives several good tools to do it, every perl programmer should know them

Devel::DProf is base profiler, and , by the way, in a week it will be the 10th anniversary of the release :)

you just run perl5 -d:DProf test.pl and get tmon.out file with raw profiling data.
Then you run dprofpp and see parsed data , something like this:
Total Elapsed Time = -0.03244 Seconds
User+System Time = 0 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
0.00 0.090 0.090 30000 0.0000 0.0000 main::__ANON__
0.00 0.029 0.024 2730 0.0000 0.0000 Benchmark::new
0.00 0.010 0.020 3 0.0033 0.0065 main::BEGIN
0.00 0.010 0.010 5 0.0020 0.0019 Benchmark::BEGIN
0.00 0.004 0.048 6 0.0007 0.0080 Benchmark::runloop
0.00 - -0.000 1 - - vars::import
0.00 - -0.000 1 - - version::(cmp
0.00 - -0.000 1 - - Config::TIEHASH
0.00 - -0.000 1 - - Config::import
0.00 - -0.000 1 - - DynaLoader::dl_load_flags
0.00 - -0.000 1 - - DynaLoader::dl_load_file
0.00 - -0.000 1 - - DynaLoader::dl_undef_symbols
0.00 - -0.000 1 - - DynaLoader::dl_find_symbol
0.00 - -0.000 1 - - Time::HiRes::bootstrap
0.00 - -0.000 1 - - warnings::unimport

seeing top15 CPU consumers.  
So, let's imagine,that you have found the subroutine, that consumes 50% of total CPU time .
But it is very long, so you should find an exact place to optimize...

use Devel::FastProf !
it is a line-by-line profiler. It dumps your code ,showing, how many times every line was ran , and how much time does it consume. All you have to do is optimize :)

and the third one is sexy Devel::NYTProf , all-in-one and even more
it produces accurate , detail results with minimum overhead, allowing to export data to html, csv and KCachegrid readable format, showing exclusive and inclusive time.

I prefere Devel::NYTprof, btw .

Eval performance in perl

| No Comments | No TrackBacks
Do not forget, that "eval" has two forms: it can get an expression or a block
And the speed varies a lot:

cmpthese(10000000,{
        'plain' => sub { int(rand(100))},
        'inline' => sub {eval 'int(rand(100))'},
        'block' => sub {eval {int(rand(100))}}
});

            Rate inline  block  plain
inline   87055/s     --   -96%   -99%
block  2173913/s  2397%     --   -65%
plain  6211180/s  7035%   186%     --


What does it mean? It means, that every perl programmer, who can use block form  should use it.
But of course, the best way is not to use eval at all :))

Perl popularity all over the world

| No Comments | No TrackBacks
Did you know, that according to Google insights , the biggest amount of perl programmers live in Bangalore, India?
And the top10 perl countries look like this:
  1. India
  2. Japan (They already have ruby, why do they need perl??)
  3. Russia (Yes, it's true. All biggest russian websites use perl)
  4. USA (Of course)
  5. Singapore
  6. Belarus
  7. Hong kong
  8. Taiwan
  9. Germany
  10. Armenia
Reblog this post [with Zemanta]

Nginx , Memcached , some perl: shaken, not stirred. Part 1

| No Comments | No TrackBacks

Nginx + Memcached + some perl = extra quick stateless application!

One of tasks I had to solve last month was a high-performance stateless web application. It had to handle up to 10-20 millions of requests in a day.

It was more harder to invent it , than to realize ;)

first of all, small backend , I prefer FastCGI, but it could be also realized on mod_perl+Apache, or even on pure CGI. 99% of work is done by nginx, so the backend speed is not a bottleneck, even if it is very slow.

Backend's function is to write to memcached in the following cases
a) cache miss
b) some data have changed.

Then, the nginx+memcached:

location / {
set $memcached_key "prefix$uri";
memcached_pass 127.0.0.1:20000;
error_page 404 502 504 =200 @fallback;
}

@fallback is a named location of the backend - so, if we have a cache miss ,we go to the backend, else data is served from memcached.

Notice the 127.0.0.1. Usually I have very low write/read proportion in my projects, so I prefer to install memcached on the same machine with nginx and to make nginx read data from memcached thought localhost. I haven't done correct benchmarks "nginx+memcached speaking through localhost" vs "nginx+memcached speaking through eth0", but I have seen nearly 30-40% boost using this (localhost) model.

So we got trivial nginx+memcached site - and where is the magic stateless application? Let me continue.

In my task I had to show different data depending on time , user country and sometimes - by randomness.

So, I defined some variables in Nginx:

perl_set $time '
sub {
my $r = shift;
my @loc = localtime(time);
my $a = sprintf("%02d:00-%02d:00", $loc[2],($loc[2]+1) % 24);

return $a;
}

';

geo $real_addr $geo{
include geo.conf;
}

And then I put in memcached some data like this:


<!--# if expr="$time = 14:00-15:00 " -->

<!--# elif expr="$time = 18:00-19:00" -->
...
<!--# else -->
...
<!--# endif -->

Because nginx ssi does not support nested IF, instead of

<!--# if expr="$time = 14:00-15:00 " -->

<!--# if expr="$geo = fr " -->

<!--# endif -->
<!--# endif -->

I had to put inner "if" into separte memcached key and do it like
<!--# if expr="$time = 14:00-15:00 " -->

<!--# include virtual="/someurl/14-15/fr"-->

<!--# endif -->

It is legitimate, but a little boring way to do nested if's in nginx SSI :)

So , using this approach , we can create a ultra-high-performance stateless application.
Yes, of course it's a tradeoff : application becomes more complex , and somehow harder to support and consuming more memory.

But the prize is REALLY GREAT:
http_load output for a small test.

106289 fetches, 999 max parallel, 1.18216e+08 bytes, in 30.0015 seconds
1112.22 mean bytes/connection
3542.79 fetches/sec, 3.94034e+06 bytes/sec
msecs/connect: 0.433669 mean, 33.696 max, 0.02 min
msecs/first-response: 146.733 mean, 14978.5 max, 2.541 min
6733 bad byte counts
HTTP response codes:
code 200 -- 106215
code 504 -- 74

Look again - 3.5k/sec page fetches on normal server : FreeBSD 7.0, quad-core with 4GB memory. And notice:
1) neither net subsystem nor nginx itself where tuned.
2) Being stateless, this application is very scalable. So having enough RAM, every perl programmer can do magic :)

And it is just a small piece of nginx functionality.

HTML::Template, Template::Toolkit and LibXSLT performance

| No Comments | No TrackBacks
For the ages I thought, that HT is simple,but fast.
It was a great, great mistace.
All of us are lazy perl programmers and prefer to read blogs instead of benchmarking :) What my simple Benchmark.pm-powered test showed out: TT gives 431% of HT performance, and XSLT (using XML::LibXLST) beats them all: 1190% against HTML::Template. Are you still using HTML::Template?