ETL tools

I’ve looked at a few ETL (Extract Transform and Load) tools, like what was Data transformation services in SQL server (DTS) and Hyperion Application Layer (HAL).
I’m sure they are very good, but I’m not totally clear what benefits they offer over normal programming.
As far as I can tell:

Easier to pick up than programming(???)
Highly focused on manipulating data in various sources.
Easier to create user modifiable (parameterised) packages?
good integration with the data destination product.

Still need significant training to use them
Can only really be used for manipulating data
Vendor lock in

In the past I have used Excel/VBA as a ‘poor mans’ ETL tool, and have used Access for plenty of data cleaning exercises.
I recently replaced/extended a HAL process with a custom C# command line app, which seemed easy enough, so I wonder if I’m missing something about why these tools are better than custom .net apps.

Here is the C# to process a text file, replacing all occurences of ‘a’ with 0, it seems pretty quick, taking just a few seconds for a 17 Mb text file on my old laptop.
I probably should have passed in the strings – version 2 maybe.

using (StreamReader sr = new StreamReader(sFileName))
     string s = sr.ReadToEnd();
     using (StreamWriter sw = new StreamWriter( fixedFileName(sFileName)))
      sw.Write( s.Replace(“a”, “0” ));

I’m sure there is a grep like equivalent that will do this is a single line, but 3 or 4 lines in a general tool like C# is good enough for me, and easier than finding that command line app. Likewise via the gift of ADO doing all the DB stuff was only a few lines of code for each operation.

Do you use ETL tools? If so why? what benefits do they offer over ‘standard’ programming?

Maybe its a question of using general development skills such as programming, ADO, SQL etc versus using one (ETL) vendors ‘simplification’ of those things?




8 Responses to “ETL tools”

  1. Harlan Grove Says:

    General tool? One could claim C is as general a tool as C#. Then there are the standard POSIX tools. For your example,

    sed “s/a/0/g” inputfile > outputfile

    I’ve been using Unix-like tools so long that I just couldn’t adapt to overly specialized ETL tools. That said, does Monarch count as an ETL tool?

  2. Ross Says:


    Simon, I was doing this type of thing (in VB.Net though) just yesterday.
    To start off with i thought, .net must have a classs to do all this stuff, then i googled for it and after a bit of looking around found this:

    Which I tried, then i thought, bugger that, I just do it with bog standard code – i read what the chap ad to say, but for me i could really see the point.

    I have some code now that runs though (puts in array) a 8000 x 2 CSV in about 0.0000001 seconds – not tried it with bigger file – but i think it will be fine.

  3. Marcus Says:

    I love the smell of irony. I’m in the middle of a migration project at the moment, moving a Monte Carlo simulation model from Excel/Access to a production environment as the technology utilised does not meet the bank’s standards. For ETL the project is using Informatica. For the Informatica developer I wrote a VB6 DLL to extract and cleanse the data received in Excel workbooks as Informatica (apparently) wasn’t up to the task.
    It took the business about 3 months to get a suitable Informatica resource (they’re a bit light the ground at the moment).
    Data Types – it seems to see everything a strings.

    I try to use Access and queries (SQL) for ETL, otherwise I’ll use Excel or VB6. I have worked on a project where the IT department coded the ETL functionality in C++. While it took a while to code, performance was critical. For some other “production environment” projects, we’ve use DTS.

  4. Simon Says:

    Thanks Harlan, I had a feeling you would know the command.
    Marcus, Informatica is one that is being considered, glad to hear there may be some ‘opportunities’ there!

  5. Will Riley Says:

    Well we use SSIS (the replacement for DTS in SQL 2005)

    I have no issues with it. As a non-programmer, I find it does everything I require in an extremely efficient & intuitive manner. Most of our source systems are Oracle, yet the BI stack is a mixture of SQL 2000 & SQL 2005 – SSIS is the most efficient tool we tested for getting data from the source systems to the SQL BI Data Warehouse….

  6. charlie murphy Says:

    I just finished a project using the pervasive etl tools and was pretty pleased.

    I have not used any others etl tools but I could write an entire book about how this was the best thing I’ve ever discovered.

    Just not having to custom code saved me days of work on that one project. Going forward I will try another brand, not becasue anything was wrong with pervasive just that I want to try somebody else too.

  7. Runrig Says:

    My opinions about ETL tools can summed up and linked from here:

    I once had a talk over lunch with a manager about how, for our department’s needs, custom programming/scripts would be simpler, faster to develop, more maintainable, cheaper, etc. Every reason he had for not using custom scripts came from a position of not understanding of what was possible with them (you can still connect to every database, XML source, flat files, even spreadsheets), and his points were easily refuted. Except for one: that with an ETL tool, the developer is more replaceable.

  8. Jim Says:

    HI Simon, since you used access access for data cleaning process can you link me with some notes how access works as an ETL tool? I know access is Not an Etl tool but it has some characteristics of an ETl tool and thats what I am inquiring about…

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: