Friday, April 27, 2012

If all of your state is on the stack then you are doing functional programming

Consider this: This is imperative code, it uses variables with state, but from the outside it is a pure mathematical function. This is because all of its variables are created anew when the function is being executed and destroyed after that. It does not matter that the variables contain simple values: This can also be purely functional provided that Markdent::Simple::Document does not use globals or system calls.

Imagine that this was enforced by the compiler/interpreter - maybe with a new keyword function, or something. I have the feeling that this would be very similar to how use strict works - giving the end user some kind of safety.

Tuesday, April 17, 2012

A Data::Dumper bug - or Latin1 strikes again

I did not believe my boss that he found a bug in Data::Dumper - but run this:

To get a character that has internal representation in Latin1 I could also use HTML::Entities::decode( 'ó' ) there with the same result. The output I get on 5.14.2 and 5.10.1 is:

When I check the dumped string - it has the right character encoded in Latin1 - and apparently eval expects UTF8 when use utf8 is set. Without use utf8 eval works OK on it. If the internal representation of the initial character is UTF8 (like when the first line is my $initial = 'รณ';) - then the dumped string contains UTF8 (which is again might be interpreted incorrectly if the code does not have use utf8 preamble).

Considering that Data::Dumper is a core module and one that is one of the most commonly used and that its docs say:

The return value can be evaled to get back an identical copy of the original reference structure.
this looks like a serious bug.

Is that a known problem? Should I post it to the Perl RT?

Update: Removed the initial eval - "\x{f3}" is enough to get the Latin1 encoded character. Some editing.
Update: I tested it also on 5.15.9 and it fails in the same way.
Update: I've reported it to the Perl RT - I am not sure about the severity chosen and the subject - this was my first Perl bug report.
Update: In reply to the ticket linked above Father Chrysostomos explains: "The real bug here is that ‘eval’ is respecting the ‘use utf8’ from outside it." and later adds that 'use v5.16' will fix the problem in 5.16.

Saturday, April 14, 2012

Breaking problems down and defaults

In a classic essay Dave Rolsky wrote: Want Good Tools? Break Your Problems Down. I wish more people have read this and applied the advice - CPAN libraries would be more useful then. But stating the goal is probably not enough - we need also to talk about how it can be reached and about problems encountered on the way there. For example let's take the module that was the result of the process described in the essay linked above:

The problem is that the criticized approach, a unified library that just converts Markdown to HTML, would result in a simpler API - for example something like this:

Maybe the difference does not look very significant - but after a while it can get annoying. For the 99% of cases you don't need to extra flexibility that comes with the replaceable parser - so why should you pay for it?  If I had to use Markdent frequently I would write a wrapper around it with an API like above.

By the way, Text::Markdown already has this wrapper and it does present a double, functional/object oriented API - where the presented above simple, functional one does the most common thing, while the object oriented one gives you more control over the choices made.  Only that it still couples parsing and generation.

Another way of simplifying the API is providing defaults to function arguments. For example to the object constructor.  Dependency Injection is all about breaking the problem down and making flexible tools -  but it might become unbearable if we not soften it up a bit with defaults.

Programming is always about doing trade-offs - here we add some internal complexity (by adding the wrappers or providing the defaults) and in exchange get a simplified API that covers the most common cases while still maintaining the full power under the hood.  I think this is a good trade off in most cases, and especially in the case of libraries published to CPAN that need to be as universal as possible.

Wednesday, April 04, 2012

What if "character != its utf8 encoding" is overengineering?

"You shell not assume anything about the internal representation of characters in Perl" - is a mantra that has been repeated over and over by the Perl pundits for something like a decade. But there are still people who refuse to take that advice and want to peek into the internal representation of characters. What if our sophisticated approach about isolating the 'idea of a character' and its representation is a case of overengineering? People often overreact for past traumas - programming is not an exception - and the conversion from many national 'charsets' to unicode was a big event. Maybe expecting another conversion soon is such an overreaction?

Getting rid of the Latin1 internal encoding does not look like a big price for improving simplicity and getting rid of all these subtle mistakes. I think it is important that the language is understood by its users and if it is not, then maybe, instead of blaming the programmers, we could make it easier to understand? Sure it is nice to have the possibility to change the internal encoding from UTF8 to UTF16 or maybe something completely different in the future - but I have the feeling that this might be case of architecture astronautics.