Archive for January, 2007

Spreadsheet lifecycle (long)

Wednesday, 31st January, 2007

A few comments have touched on parts of this, so I thought I would post what I have seen, and what I think could be improved.

Many important commercial spreadsheets I see seem to go through the following lifecycle (at least once):

  • A requirement becomes apparent
  • Previous work is adapted if possible
  • If not a new model is built, rarely completely from scratch
  • The results are reviewed for reasonableness by the business person who developed the model
  • Its used live, usually with gradually reducing scepticism
  • Other users feedback corrections and enhancements
  • Eventually abandoned as business needs change

This is fairly close to the standard Systems Development Life Cycle but a few elements are missing to my mind.

There is often very limited effort around developing useful requirements, and rarely any initial design. There is also rarely any sort of conscious technology choice. It was always going to be in Excel. A bit of effort in initial design may often highlight the fact a spreadsheet may not be ideal. I remember one Excel based unit costing system that was never going to work. A simple migration to Access and the whole thing was almost trivial. And not a 3 line multi nested array formula in sight. I accept the whole skill set argument – but that is why short term flexible consultants and contractors exist (and training courses).

We are always encouraged to reuse previous work, and in many cases reuse is a Good Thing. But what if you didn’t do the necessary due diligence to check that what you are reusing is both correct and appropriate? What if by dragging in old stuff you bring in a couple of hundred broken names and a load of useless styles, or some crazy VBA (did I ever tell you about the guy who trapped ctrl +C to trigger some VBA to delete a load of stuff on the active sheet, when I did a copy on his machine it killed my spreadsheet).

Reviewing the results for reasonableness is unlikely to generate much valuable insight. If you ‘check its ok’, you’ll usually find it is, if you look for the errors or the weak point you’re far more likely to find something worth correcting. I prefer to test with meaningless data so I can focus on the logic. A final review does no harm, unless it is in place of proper testing. Proper testing involves people other than the developer.

Using your customers as beta testers is fine, as long as they know it. Doing so without warning them is likely to reduce your credibility. I often worry that consumers of spreadsheet systems are a bit gullible, I prefer to have a developer pick my work apart as they tend to be more thorough, a real tester might be even better. As long as its formatted nicely and not too blatantly wrong, customers seem to have little difficulty accepting spreadsheets they are given.

As ball park figures, the stuff I work on takes 2-3 months to develop, test and deploy. Its lifetime is 2-5 years roughly. There is a ton of info available that is useful for those early months, but almost none that covers how to manage a live spreadsheet based system (a phase that is probably 10 time longer than the development). I would suggest that most of the real world errors are caused by poor version control causing a correct but inappropriate version of a workbook or worksheet to be used. Abandonment (ie leaving on the network, but not using) rather than graceful retirement (archiving, maybe leaving a shortcut in the original location) means there are normally more wrong versions to choose from than correct ones. Anyone else ever run last years data set with this years balances to produce almost-believable nonsense? I think a decent Sharepoint implementation would go a long way to solving these version issues.

I tried to warn one client about having multiple copies of very similar spreadsheets, he cut me off with ‘Disc space is cheap, I don’t care about redundancy’. As I was explaining the reconciliation nightmare that would follow, and the unnecessary risk he was carrying, he was already out the door. (Maybe I should have gone on a Dale Carnegie Win Friends and Influence People course?)

[I keep wondering about productionising the tatty version-compare tools I use as a commercial venture. Would anyone buy such a tool (if it was any good of course!)? would you prefer an add-in or a standalone exe? Would you swap a relative for one, or just want to swap some (secondhand) chewing gum? Anyone using something similar already? I’ve seen a few on-line, but they don’t seem to do what I want.]

Anyone else had similar experience? Think I’ve missed anything? Think I’ve got it wrong? Anyone got a better approach than warnings that fall on deaf ears?

Cheers

simon

comments

Tuesday, 30th January, 2007

Hi

Just to let you know, a few peoples comments have been marked as spam, no idea why (even my own have been caught!).

I check regularly so will release them if it happens, if there are several that are pretty much the same and from the same author, I’ll just release one, unless anyone has any objections?

Sorry about this, I havent done anything clever, its all default settings, if anyone has any suggestions let me know. Otherwise if your post goes through ok, but doesn’t show up feel free to drop me an email on simon.murphy@codematic.net, I’ll logon ASAP and sort it out.

Incidentally, how do you recover a previous comment? I got halfway through writing something at DDOE at the weekend, touched the wrong part of my mousepad and navigated away, going back did not work? Is it a browser thing?

 cheers

Simon

Spreadsheet auditing (1 of many)

Tuesday, 30th January, 2007

I went to a Eusprig conference a few years ago (highly recommended btw (I’ve been going ever since)) and was struck by the quoted error rates in spreadsheets. The evidence seemed overwhelming – 90-100% of all spreadsheets had material errors, and a lot of it was to do with developer overconfidence. Like catch 22, if I believed the figures my lifes work was riddled with errors (time to renew my professional indemnity insurance!), if I didn’t believe it, that was overconfidence – so then I’m a cocky error mongerer! blimey!

To slightly misqoute: Have you stopped building error ridden spreadsheets yet? Yes or No?

  • Yes (I used to but have stopped) or
  • No (I am still building error ridden spreadsheets)

My biggest issue was how could 100% of spreadsheets have findable ‘errors’? I can believe 100% of spreadsheets could be improved in the quality department, but not that 100% have errors so obvious even someone unconnected with the initial development could spot them.

XLAnalyst was my attempt to find out. The linked version there is getting on a bit now, and newer versions are on their way (as is a web site update). I took a look at the auditing tools around at the time, and some of the reported horror stories and personal experience. I then came up with a list of things that a workbook may contain, and what that might mean for risk of errors. What I wanted was a spreadsheet risk assessor, so I could point it at a file and see if it contained risky constructs (like 3 argument lookups, and external links). Key criteria was speed of analysis. I was trying to get a sense of the quality of the spreadsheets rather than some kind of error? yes/no over simplification.

The idea was that people would try it (its free (no cost), never expires etc), and maybe submit back the short text string containing the result of each test (but no identifying info). I was then hoping to present the findings to Eusprig. The proportion of people who submitted data back was less than 0.5%, which made the results statistically insignificant. In fact I think I had more hack attacks than data submissions.

But hopefully it was of some value to the people that used it, I know of a few cases where it spotted a real life error in production models, which is good. I downloaded over 500 workbooks as part of testing xlanalyst, but in general they seemed pretty trivial compared to the monsters I get to deal with. I may try again soon, but the ones in the public domain don’t seem to be that juicy. I’d quite like to get my hands on some government ones using the freedom of information act – anyone know how (UK specific I think)?

The Dartmouth team have been giving xlanalyst a good run out, and were going to modify some of my baseline risk weightings. I have not heard yet what their conclusions were/are.

I think Dons findings of around 14% is probably realistic. I’ll post another time about why either 100% or 14% will be hard to prove / or disprove as a general estimate.

Anyone here found a big error in one of your spreadsheets? Or had someone else find one? (I have (both options – I prefer the first!))

cheers

Simon

Links to external workbooks

Monday, 29th January, 2007

I have never been a fan of linked workbooks, here are some of the reasons:

  1. You can’t easily be sure that every workbook in the linked chain was updated in the right order
  2. Excel can’t warn you of circular references through closed workbooks
  3. Workbook links can lock you into a specific file and folder structure
  4. People never seem sure whether they should update links when asked on opening or not – more confusion.
  5. If the source file is not found, error messages can be confusing
  6. If the source workbook changes (eg a row gets inserted) your linked formulas might end up pointing to the wrong place, and returning incorrect data.
  7. in certain circumstances you get to pick the replacement file for every linked formula (could be 100’s of times) – this sometimes leads to crashes.
  8. Changing file and path names, or locations,  can burst formula limits, making them uneditable
  9. They just seem to be a bit too fragile and unauditable.
  10. There is rarely an agreement as to what the soure data was created for, what it is appropriate for, and how stable it is.
  11. Links are 1 way – there is no (easy) way to know who has linked to your workbook – this is a big danger if you need to make significant changes
  12. they seem to rapidly degenerate into a rats nest of links, half of which always end in ‘book1’ or file not found. (I’ve seen over 100 links feeding one workbook, directly and indirectly)
  13. You can’t stop them (assuming people can read your data, they can link to it)
  14. the big one – see below.

The big reason why I don’t like links finally became clear recently:

I don’t like linked workbooks because it indicates a lack of investment in proper information systems, or training in existing systems.

I have found that such a lack of investment shows through in many places, in organisations that have lots of linked workbooks. Likewise spreadsheet quality tends to be low and neglected in such organisations. General understanding about basic development, design and risk is low to none too.

My preference is use Excel as a thin presentation and analysis layer for information workers to interact with corporate information in a meaningful way. To do that the spreadsheets these analysts use really need to link to reliable, flexible data stores. Well built relational databases usually fit this bill, as do Essbase databases and Analysis Services cubes. Half finished Excel workings do not. And lets be honest, very few Excel workings ever reach ‘finished’ (if you define that as tested, proven and assured).

I still occasionally use links if I have to, but generally I prefer to use a VBA import routine with date, time and userID stamps. But even that is making the best of a bad job, what is really needed is a commitment to information that is fit for purpose, and that means direcly from a reliable source.

Anyone got some more bad things to say about links that I missed? anyone want to defend them? Am I reading too much into this?

I think my favourite data source currently is Essbase, (I havent used AS for a while), whats your favorite?

cheers

Simon

Worksheet protection

Friday, 26th January, 2007

An awful lot of people advise us to protect the worksheets in our workbooks.

I take the opposite approach, I think worksheet protection should be avoided in almost all cases, and here is why:

1. it breaks all the auditing features

2. it prevents people checking your work

3. some users take it as an insult

4. it stops data validation circle invalids from working

5. it lulls you into a false sense of security

6. it is trivial to bypass, often just copying the cells to a new sheet and swapping sheets does it – well within the capabilities of many users. If not, VBA routines to remove it (and provide the password so you can re-protect it to hide your tracks) are freely available via google. Worksheet protection is not a security feature, its a usability thing, and thats the MS view, not just me.

I think you should design spreadsheets so it is hard for people to get it wrong.  If you really do need to protect your work, or your customers from themselves, then you should use something other than Excel, like Access forms, or VB6, or .net.

If you are going to protect a worksheet, you should probably protect them all and protect the structure, to make it fractionally harder for people to bypass.

What do you think, are you a protection fan or not? and why?

cheers

Simon

International string merchant

Thursday, 25th January, 2007

String merchant offers 2 different types of string deal:

one costs 5 pounds

one costs 10 pounds

Which should you buy?

Do you feel you have enough info to make a sensible decision? how long is a piece of string?

According to the recruitment consultants that contact me, the answer is yes that is plenty of info, buy the 5 pound one, doesn’t matter how long the piece of string is, nor do any possible quality issues matter. Even if the 10 pound piece of string is 10 times as long as the 5 pound one. (Good developers are at least 10 times (up to 28 time) more effective than the average).

I normally work on a value added basis rather than a day rate, as I prefer to work in close partnership with clients, and this shared risk/reward model supports that better. Every now and then I do day rate stuff.

Anyway an agent phoned me recently, asked my day rate, then said the client would only pay half that, ok I said, I’ll work half as fast, hows that? He couldn’t see the irony. I always works as effectively as I can, I leverage all my training and experience, all my codebase, all my contacts, and all my favorite, proven on-line resources in providing a high quality service for all clients. I believe that is worth something and very much prefer to work with clients who also think that way.

Low price rarely equates to value for money. Especially in software, where the outputs are so intangible.

Anyone else have this sort of issue?

Anyone got any suggestions how to improve the situation ?

 cheers

Simon

My best boss

Tuesday, 23rd January, 2007

I have been a self employed software developer specialising in Excel and VBA since 1996. Before that I did manage to hold down the odd proper job or two.

My best (and last) permanent job was for a Leeds based brewery. As a Yorkshire lad, Tetleys would have been ideal, unfortunately I ended up at Bass (You would have to be a Yorkshireman to understand!). But it was not all bad news, for a start I had a superb boss, and also Bass introduced Hooch, Starapramen and Carling Premier- all great ways to get drunk on expenses (officially called ‘sampling’ – damn hard, working so late on Friday and Saturday nights!). Overall it worked out brilliantly.

Anyway, back to the point, my great boss, and what made him so great.

He educated me in the art of checking my work.

Every time I brought him a new report, or a new version of a report,

“have you checked the numbers?” he would ask,

“Yes” I eventually learnt to say,

“what to?” he would ask, and I would explain where and how I checked it. I don’t remember him ever checking my work himself, although I’m sure he did. (trust, but check!)

His approach was, you’re a professional, its your work, here’s the consequences if its wrong, you take responsibility for it. Of course the odd blunder still slipped through, but I don’t think I have worked many places since with such a good quality record. Our customers had a high level of confidence in the information we gave them to manage their business.

Now I’m not a massive fan of testing, or at least not a fan of having a late testing phase, or a ‘this will get sorted in testing’ type of attitude. But one secondary effect of knowing I was going to have to test my work, was to make sure my work was easy to test.  Now, work that is easy to test I am a huge fan of, I hate complex, I hate ‘clever’, I love clear and simple, step by step.

I still think certain types of testing are a bit like locking the stable door after the horse has long gone. In mainstream software those companies that follow a waterfall methodology (there still are some, someone even thought this was real: http://www.waterfall2006.com/), leave themselves wide open to quality issues that do not surface until very late in the project. What if you turn up on test Monday 3 weeks before delivery, only to find that what has been developed is almost untestable?

More modern approaches to software development favour early regular unit tests, and in fact this is how I tend to work. I don’t go the full test driven development, but I do decide the test, perform the operation then build and run the test.

In my experience spreadsheets that do not have quality and testability built in are very difficult and very expensive to check for reasonableness, never mind to test thoroughly.

I believe the target should be in-built quality.  The vehicle to get us there may be testing or inspection, and I think testability is possibly the same as quality, its certainly a part of what I see as quality.

I used to think developer skill was critical, now I’m beginning to think as long as they have the ability to build simple, testable models, and confidence in line with their abilities then that’s enough. There are other benefits to using highly skilled developers but I’ll carry that forward to another post.

One recent client implemented peer reviews on all important spreadsheets. That is an excellent idea. I don’t think the test is the key part, I think building a model knowing someone else will test it is the critical factor here. Anything spotted during the review I would see as a bonus.

Do you agree?

What do you think are the key factors in reducing the risk of spreadsheet errors during development? (lets cover the rest of the lifecycle later).

cheers

Simon

The joy of text (long)

Monday, 22nd January, 2007

It dawned on me recently why cell text formatting (.NumberFormat = “@” in VBA) bothers me so much.

If you know a bit of VBA then thinking of a cell value as a variant is near enough to be helpful. These are a chunk of computer memory that is big enough to store all the main data types, and something to tell the computer how to treat the value (eg if you read the value 65 as a number its… 65, but as text (ASCII) it’s the letter A).

If you follow Thomas the Tank (probably because you have kids – but not necessarily) this toy box may help:

Thomas xloper

Imagine the toy box is the cell and its subdivided into sections to hold each of the possible data types (I only had slots for 4, there are others).

The fat controller is responsible for putting the values you enter into their correct place. Excel can then easily and quickly evaluate the value in the right context. For example of you put some sort of cell reference or formula in a cell (as in Harold the Helicopter above) Excel goes and gets the value from that cell for you.

The fat controller also makes sure that only one of the boxes is populated – you can’t have a value that is simultaneously (stored as) a number and a piece of text.

Changing the cell format from any (non text) number format to any other (non text) number format does not affect the underlying value. The issue with text formatting is that it does change the underlying data. Formatting a cell from a number to text takes it out of the number slot and puts it in the text slot (with Thomas). That means it may not be included in number based calculations correctly, as they often only look in the number slot.

Try this

  • Open a new workbook
  • Enter 1 in A1
  • Enter 1 in A2
  • In B1 enter = SUM(A1:A2)
  • In B2 enter = A1+A2

All should be well, both formulas should give the result 2

Now format cell A1 as text (the order you do all this matters as Excel (2003 anyway) guesses the format of formula cells based on their precedents).

Now change the value of A1 to 2 and calculate, B1 becomes 1 and B2 becomes 3. ie SUM and ‘+’ give different results. Which is right? I don’t know, neither probably. I’d have thought the correct answer would be some sort of error (#VALUE probably). You just told Excel to treat A1 as pure text, not any kind of number, so it probably shouldn’t let you do any arithmetic on it at all.

Why does this difference happen?

‘SUM‘ works purely on the Troublesome Trucks section of the box (number), whereas ‘+’ manages to end up with something numeric from something stored in the text only slot.

If you change A1 from 2 to ‘a’ the SUM continues to show the result as 1, but the ‘+’ shows a #VALUE! value error. Precisely how/why that happens I don’t know, but I think it is buried deep inside the Excel code base. It could something as simple as the order of some Select Case statement.

An important related fact is that SUM is often faster than ‘+’, and more resilient to non numeric data types (may be a bad thing).

As a matter of interest Open Office Calc gives 1 as the result in both cases, and even if the 1 is replaced with ‘a’. At least this is consistent, although I think I would rather have an error. But that probably goes against the spreadsheet ethos of relaxed controls. My copy of Gnumeric seems to have disappeared so I couldn’t test that, but I will. The joy of these open source spreadsheets is you can ratch around the source code trying to understand why things happen the way they do.

Note if you have Excel 2002 (XP) or 2003 you can asked to be warned about numbers stored as text, and there is one UK school I read about that wished they did (they lost GBP 30K due to numbers formatted as text). (Tools>>Options>>Error Checking)

The precise data type used to store data in Excel cells is called an xloper (an xl operator/operand). It is defined in xlcall.h and is a C language data structure called a struct (a user defined type (UDT) in VB). The struct combines 2 pieces of data; the data type and the actual data. The actual data is in a C structure called a union, VB doesn’t have an equivalent, the nearest thing is a variant. This is what it would sort of look like in VB


Public Type xloper
  lDataType As Long '1= number, 2 = string, 4 = boolean, etc
  vActualData As Variant
End Type

This isn’t strictly correct as a variant already knows its own data sub-type (the C language VARIANT data type is itself implemented as struct/union combination), but hopefully this is near enough to give you the idea.

So to sum up then:

Setting a cells number format to text is unique because it changes the way the underlying value is interpreted, not just its presentation to the user. Probably in a bad way. This post tries to explain the how and the why. I never got chance to mention Max and Monty, which every Thomas the Tank fan will know are the real trouble causers.

What do you think?

Also do you want to see more technical content like this, or do you want to see more ‘management’ style content, or a mix (this is the current plan)?

cheers

Simon

Training Versus Education

Friday, 19th January, 2007

I saw a great quote recently – can’t remember the source, sorry.

Roughly it was that you can train a monkey, but you need to educate a person. Funny how there are all these technical ‘training’ courses and very few’education’ programs.

At most clients I initially do some development to solve the immediate need, then often end up running  some 1-2 hour sessions for a few of the internal team, to coach them on some of the more useful and relevant features of their tools. This is followed up by desk-side support, as they actually use what we discussed. I have found this works well, and people seem to use the things we worked on. Probably most important from a quality point of view it gets people thinking and opens a dialog. When people consider the alternatives, and think a little bit before diving in, I usually find their work to be easier to understand and test.

My general view is that the 2-5 days away from work at a training company can be useful for introductory level stuff. But for more advanced content, it often needs to be set in a work context, and tailored significantly for each individual. Also the more senior people struggle to dedicate 5 consecutive days to non operational work activities.

What do you think?

Been on any good training courses? or had any in-house coaching recently?

Got any good course recommendations?

cheers

simon

Real world Excel

Thursday, 18th January, 2007

Something Marcus said in a comment reminded me of my favorite Excel usage story by Rory Blyth:

http://www.neopoleon.com/home/blogs/neo/archive/2003/09/29/5458.aspx

hopefully someone hasn’t seen this before, Joel Spolsky (http://www.joelonsoftware.com/) also published it in one of his software books a couple of years ago. Be sure to read the comments too. Its so sad how true it rings.

Did I mention I saw someone doing their calculations on a calculator and typing the results into the spreadsheet last week?

(I hope it wasn’t a discounted cashflow for 100 years by month)

cheers

Simon