Thursday, June 24, 2010

Implementing file upload progress bar

At work I was assigned the task of replacing the previous hodgepodge of tools that provide the progress bar functionality for forms with file uploads. At first glance this seems like a trivial thing to do - you periodically observe how much of the file you've got and update the progress bar, but there are details that make it hard to do with the current tools. I report here how I solved that problem - not because I think this is the optimal way - but rather to open the discussion and because I could not find any description of solving it when I googled for it.

First of all the common programming tools (like CGI.pm or Mason that we use here) assume that the page handler receives the whole request as input - and that whole request is not available until after the file is uploaded. So for example 'my $q = CGI.pm->new' will not finish until it is too late to measure the upload progress. The solution to that is to use another page to report the upload progress and call that page via Ajax from Javascript code updating the progress bar. This would work great - but the file is normally uploaded to a temporary file with a random name and the other script would not have any chance to guess it. We need to generate a new random file name in the form page and then pass that name to the form handler script so that it would save the data to that file, and in parallel to the Ajax scripts that would check the size of that file.


To save the data into a specified filename I used the CGI.pm callback feature:

my $q = CGI->new( \&hook, $fh, undef );
...
sub hook {
my ($filename, $buffer, $bytes_read, $fh) = @_;
print $fh substr($buffer, 0, $bytes_read);
$fh->flush();
}

It is described in the subsection called "Progress bars for file uploads and avoiding temp files" of the CGI.pm documentaion, but actually it is a great leap of thought to say that it supports progress bar implementation, you still cannot use it directly to get the progress bar from the CGI object on the form landing page, you still need the separate scripts measuring the progress. For my solution all I needed was to pass the target file name to the code saving the data, this could be easier than writing this callback above. And the callback is still not everything - I yet need a way to pass the generated filename from the form page to that script - and not via form parameters, remember they are not available at that stage. So how can that be done? Simple - as PATH_INFO - which is available in the %ENV hash even before the params are parsed by CGI.pm.

This is the skeleton of the solution - there are a few more details in the actual implementation - but the code will be published soon as Open Source - so I hope everyone will be able to look them up there.

Saturday, June 19, 2010

When those micro-seconds matter

Catalyst is the only web framework that has a mailing list I subscribe to - but I am sure that it happens at others too. In a recurring pattern someone someone posts a benchmark showing that Catalyst for some trivial operation is many times (or many hundred times) slower than some other web framework or for that matter PHP. That does not fail to generate a heated debate - but eventually the seasoned framework developers gain the upper hand with the argument that for all the, often big, web sites they worked on, those few micro-seconds lost in the Catalyst dispatcher never mattered much because the application spent hundred times more in other code fragments and mostly in business logic parts, so shaving off some part of the few micro-seconds would not improve the overall speed more than 1%. This is a great argument, perfectly reasonable and rational but it is biased towards the status quo. It might be true that everywhere where Catalyst is currently used it works great but it is also not hard to imagine an application with very simple business logic that needs to serve millions of users, Twitter anyone? Or an app that does many simple Ajax callbacks. Sure you can always code the speed requiring, simple parts in PHP and keep Catalyst only for the other more heavy-weight tasks but having a universal solution would be so much more convenient.

Sunday, June 13, 2010

Installing dependencies for your packaged for CPAN library

Update: See the comments, apparently the warning against auto_install is outdated information. I don't have any opinion of myself here.

It always puzzled me what is the 'canonical' way to install dependencies for an unpacked distribution, be it downloaded from CPAN or your own in-house product packaged the CPAN way for convenience. Module::Install provides an 'auto_install' option that you add to your Makefile.PL and then perl Makefile.PL; make (or perl Makefile.PL; sudo make) should do that trick. This seemed like a good solution that could be a standard, but now I see that the FAQ of that module says:


Do I really have to avoid auto_install()?

auto_install() has a long history of breaking CPAN toolchains. Lots of people had a bad feeling on it, and have said it should be strongly avoided. In fact it was deprecated and removed once.

Although most of the known problems have been fixed and you can use it more safely than ever, the use of auto_install() is still discouraged, especially if what you are writing is a module to be uploaded on the CPAN. auto_install() does lots of things itself, thus does not always do the same things as other toolchains do (including extra attribute handling, etc; which can be fixed somehow but that's not too DRY). It only supports CPAN and CPANPLUS as its backends. If you use other tools to install, it may still cause a trouble.

Besides, now you can do what auto_install() does with other means. If your CPAN module is new enough, you can pass a dot to the cpan command it provides, and it will install all the required distributions from the CPAN:

$ cpan .

So apparently this is the way to go - using the cpan shell. There is a minor problem with this when used for testing - it would automatically install the package from the directory you are in. Sure you can suppress that you need by adding another command switch (-m), but it would be better if the default action was the safe one.