External data in spreadsheets

There are 2 common ways to get external data into a spreadsheet. Either via a worksheet formula with criteria, or via some refresh style command that sets cell values.

Essbase mainly uses the second, this overwrites cell contents with a value from the datasource. Essbase is pretty smart about it and you can intermingle formulas in some cells with data values in others.

Some other data sources provide a worksheet function like “GetExtData(…)” where you pass it some parameters and it goes off and queries the data source and returns a result. I think Excel 2007 has this for returning info from AS cubes (CUBExxx functions).

I’ve always been fairly neutral about which approach is better. But in researching my paper for Eusprig this year I came across a formula that was something like:

=GetExtData(…) + 180,000,000 + 5,968,364.

I.e. manual adjustments in the same cell as the audited ‘correct’ figures. The effect of this is that a ‘refresh’ may not be as thorough as some users might expect. My immediate concern is that those hardcoded numbers maybe a hold over from a prior periods reporting and be no longer valid. Thats not a theoretical concern, I have seen that happen in the real world.

I guess this could be avoided by effective build practices, but they would still need checking.

For me this ‘feature’ is enough to make this approach second best in my view, and I now think I prefer the ‘overwrite-cell-values’ approach (xlSet). I guess to an extent it depends on usage, if you are just reviewing your shopping list then who cares, if you are reporting billions of pounds to the regulator, maybe you need to be more concerned?

I see this as a spreadsheet builders choice thing, rather than a criticism of external data providers, I think most offer both options anyway. I’m thinking my advice in future will be to prefer the ‘set-cell-values’ approach for important data.

Do you have a preference? Or a good reason to prefer the formula based approach?



3 Responses to “External data in spreadsheets”

  1. Harlan Grove Says:

    Re: =GetExtData(…) + 180,000,000 + 5,968,364

    This is the quintessential spreadsheet problem. If you’re not free to make necessary ad hoc changes, the tool isn’t practical. If you’re undisciplined about how you make those changes, the tool isn’t reliable.

    The goal is to make ad hoc changes in a disciplined way. In this case, something along the lines of using an exception table with ReportDate and CellRef key fields, a value field (for this particular cell, = 180,000,000 + 5,968,364), AND a MANDATORY description/explanation field. Then use either a clever lookup formula or a conditional sum to add the exception FOR THE PARTICULAR ReportDate to the GetExtData value. If there’s no match against the particular ReportDate or CellRef in the exception table, return 0. Extra credit for generating report footnotes based on the relevant entries in the exception table.

    Is this common practice? Probably not. Should it be?

    I imagine the rest of you know how to generate formula listings and check them for patterns like + or – followed by numeric constants greater than a particular magnitude or * or / followed by any numeric constants. This is a necessary pre-release task.

  2. Charlie Says:

    I was thinking the same thing as Harlan, in that the problem you connect to getting external data is likely separate and really just the age-old problem. Your approach would not work with some of my clients in that they sometimes refresh their external data several times during the reporting period, and would not be happy to find their corrections erased by the data refresh. As such, they would find another place to put the corrections which would not solve the problem, just move it.

    I like Harlan’s suggestion of an exception table, but I wonder if my clients would follow it – I work hard at just trying to get them to highlight the cell with a different background colour.

    I really like the idea of checking all formulas before a release, although this would need to be automated and run by my clients during their processing. It would catch some overwrites, but not ones where they have changed the formula to pick up another cell value, or if they replace the formula with a constant – both of which I have seen

    As to which style is better, I think it depends – sometimes one needs a good chunk of data and so the set cell values approach would be preferred, but other times, a lookup into the database by formula is best.


  3. Biggus Dickus Says:


    This is not a simple question. I refuse to work on spreadsheets that import data from corporate databases unless the file is Protected completely. I either refresh all the values on the spreadsheet (using ADO) whenever they run an update or I create a link between the cells where the values are displayed and the actual values (which reside in a hidden worksheet). I have a way to do that which would probably be a good article or technote somewhere.

    If it’s absolutely necessary to revise these values I put an unprotected cell (or column of cells) and then add the two together to produce the final value.

    Users typing + or – and adjusting a formula are not allowed.

    This is not because I am mean (ok I am a little mean) – it’s simple data integrity and auditing. Remember SOX ? I do everything as if there is a SOX auditor about to descend on me ;-)..

    It also saves me a bunch of stupid phone calls when they forget that they altered the formulas.

    You CANNOT allow this !!

    Mean Dick …

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: