Getting the data right

I was chatting with a fellow dev recently.

He said for him the most important first step (in design/development rather then analysis) was to get the data right.

I started off disagreeing and suggesting the most important first step was proving the concept will even work. But then in our discussion and thinking about it after I decided that I agree with him. To an extent anyway.

My new most important first step is to be clear how you will be able to check the data is correct. Because it may work in concept, but if you can’t prove your data is correct your system cannot be proven correct, and therefore not a right lot of use. I use prove and correct here in the traditional business consulting sense rather than legal. (correct = near enough to blame the old system, prove = no one will find out before you are out of there, invoice paid.) ;-)

We were talking about a standard, load data to sever db-run some queries-extract to Excel reports, kind of a system.

What would your most important first step be? (assuming you have some idea on what the users want/need of course)



3 Responses to “Getting the data right”

  1. Patrick O'Beirne Says:

    Not so much first step, as a parallel process that needs to go on while other analysis and development goes on.
    Because Excel is so tied to the structure of the data through pointing references, changes in layout, order, detail, field formatting, etc, can all have large implications for the development effort.
    At the same time ,the data quality – correctness, completeness, uptodatedness (ok, ‘currency’ for the purists) needs to be verified from the start. I remember a struggling project which had not faced up to the effort of mapping old product IDs to new ones and the Excel wizard had to spend most of their time looking for matches by numeric values in time series.

    Click to access information-and-data-quality-in-spreadsheets-obeirne.pdf

  2. sam Says:

    Data not correct is not a problem…. I have never worked on a project where the data was completely error free.

    More important – Is the Data Structure

    If the Data is in a DB structure, the system can be built and inaccuracies can be sorted out later on..

  3. fastexcel Says:

    I have lost considerable grey hair over this unmentionable system which has:

    – 2 different data sources for IDs (deal IDs and Project IDs)
    – customer repeatedly insists there is a one-to-one relationship between deal IDs and Project Ids, but of course that turns out not to be true in practice
    – And by the way one customer can have many deals, and a deal can also span many customers
    – and we are on the 12th attempt by the customer to tell me how to calculate accruals etc
    – and the business logic/data structures change every few months but the system has to track history and forecast covering 4 years
    – and the customer people responsible seem to change every 6-9 months

    The Excel bit is relatively straightforward but the DB bit is fairly horrendous.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: