Tuesday, January 04, 2011

Dependency Injection or Removing Hardcoded Values?

Ever since I've heard about Dependency Injection (DI) a few years ago, I read and reread all the available documents and I kept having this feeling that I still miss something. Yesterday after some googling - I found that even those whom I always expected to explain things, have similar problems with it (see also this interesting reply to that rant). I am not sure if I now really understand DI, maybe I am still missing something important, but here is my tentative theory about it.

Let us say that somewhere deep in your web app code you have these lines:



Every minimally experienced programmer will notice immediately that for any re-usability the database name, user and his password should be extracted into a place where they can be changed easily, like the first lines of the program or better a config file. This would come very natural - but the reasoning behind it does not come from any theory - it is our experience that tells us that these pieces of data will change more frequently then the surrounding code.

I don't know if anyone would call this Dependency Injection, I think he would rather say that he is removing hardcoded constants.

Now let's take this snippet:



This is better - but the name of the DBI class and the way you construct the $dbh object is still hardcoded here - the point about DI is that this is still wrong because these parts also will change more often then the rest of the MyClass. The most important case for this is tests. In Perl it is possible to change the definition of the whole DBI package (mock it up) for the tests - but a more elegant solution is to move the database connection creation out of the MyClass code:



then, to test the MyClass methods that don't use the dbh field, you could pass an undef there and not bother with setting the database at all. But it is not only tests that require changes to that code - if for example at some point you decide that you want to log all the DBI warnings and add { PrintWarn => 1 } to the DBI->connect call - this would also be easier with this design. Again, just like with the initial example, I haven't seen any theory explaining this, but from my programming practice I am assured that this kind of change happens much more often, and at different phases of development, then changes to the rest of the MyClass code. Maybe the distinction is about what i.e. values (literals or objects) and how i.e. methods and subroutines - what is more expected and easier to change - and to get the advantage of that ease you cannot mix it together with how.

The code removed in the last sample is a bit more complicated then in the initial example - but at the core it has the same nature - it is about computing some values that are later used across a program part (or the whole program). In the initial example these values are literals, here they are objects. Literals can be created independently, objects depend on other objects: we can have a MyUser that depends on MyClass that (as in our example) depends on the database connection. Fortunately solving the dependency graph and finding out what needs to be created to have a MyUser object can be automated - this is what the various DI frameworks (like Bread::Board) do. In more 'strictly' typed languages like Java - this dependency graph is mostly defined by the types used, in Perl the dependency needs to be defined by the programmer explicitly - but the graph solving works in the same way. This is why this object creation code is (or can be made) quite simple and for all objects in the same scope it can fit into one package where you could adjust the various cooperating objects just like you do with the values of the configuration variables.

This is my understanding of Dependency Injection - maybe I am still missing something - but I don't see where, in any non-convoluted way, there is the Injection happening. There are no new dependencies injected into the code. Maybe more appropriate would Dependency Passing - as the dependencies are passed as parameters - but most of the work done when performing this refactorisation is removing parts of code, parameter passing is the trivial part. Additionally Inversion of Control, which is used to explain DI in many places, is just a minor characteristic of it, not very visible one and very different from other more common Inversion of Control occurrences. Mark Fawler coined the term Dependency Injection specifically to differ it from Inversion of Control in general - but I think that even presenting it as an example of IoC is misleading, because it is not very important to understand what is going on and it is so different from the Hollywood Principle that everyone thinks about when reading Inversion of Control. Finally what is frequently blurring explanations of DI is the details of the DI frameworks used and dwelling over constructor injections or setter injections - i.e. going into the how before the why is well explained.

I believe that removing hardcoded values is what Dependency Injection is really about. Most programmers already have good intuitions on why this is needed.

2 comments:

NPEREZ said...

I believe you are overcomplicating things a little bit. DI is the idea that your class encapsulation should be so complete and that your objects should really be blackboxes to the point where it doesn't really matter from where the constituent pieces come, but only that they behave/type as expected. When the OO paradigm is taken to the proper encapsulation level, DI is simply the natural conclusion reached.

A code example illustrates this best:

package Foo;
use Moose;

has logger => ( is => 'ro', isa =>
'Logger', handles => ['log'],
required => 1);

sub frobinate
{
shift->log('frobinate');
}

Foo doesn't care much about the logger, only that it is provided. It obviously depends on the logger being there when frobinate() is called, but it doesn't worry about constructing the logger. When you build your classes in such a fashion it becomes much easier to mock and test them. Your object graph becomes much cleaner.

Instead of looking at it as refactoring or removing code, look at it as if the class is simply written as if some external resource will be provided and available for use. The complexity of your class will collapse. This also makes it easy to define the boundaries of responsibility between the pieces in your program.

TL;DR
DI is the natural result when proper OO encapsulation occurs.

zby said...

Yeah Moose takes over 'new' and thus encourages DI, but people do find workarounds by using some 'init' or 'setup' methods or using BUILDARGS or BUILD. This is material for another post.

As to if DI follows from complete class encapsulation - that is entirely possible, but I think my approach is a bit more concrete and the explanation goes deeper.