The joy of text (long)

It dawned on me recently why cell text formatting (.NumberFormat = “@” in VBA) bothers me so much.

If you know a bit of VBA then thinking of a cell value as a variant is near enough to be helpful. These are a chunk of computer memory that is big enough to store all the main data types, and something to tell the computer how to treat the value (eg if you read the value 65 as a number its… 65, but as text (ASCII) it’s the letter A).

If you follow Thomas the Tank (probably because you have kids – but not necessarily) this toy box may help:

Thomas xloper

Imagine the toy box is the cell and its subdivided into sections to hold each of the possible data types (I only had slots for 4, there are others).

The fat controller is responsible for putting the values you enter into their correct place. Excel can then easily and quickly evaluate the value in the right context. For example of you put some sort of cell reference or formula in a cell (as in Harold the Helicopter above) Excel goes and gets the value from that cell for you.

The fat controller also makes sure that only one of the boxes is populated – you can’t have a value that is simultaneously (stored as) a number and a piece of text.

Changing the cell format from any (non text) number format to any other (non text) number format does not affect the underlying value. The issue with text formatting is that it does change the underlying data. Formatting a cell from a number to text takes it out of the number slot and puts it in the text slot (with Thomas). That means it may not be included in number based calculations correctly, as they often only look in the number slot.

Try this

  • Open a new workbook
  • Enter 1 in A1
  • Enter 1 in A2
  • In B1 enter = SUM(A1:A2)
  • In B2 enter = A1+A2

All should be well, both formulas should give the result 2

Now format cell A1 as text (the order you do all this matters as Excel (2003 anyway) guesses the format of formula cells based on their precedents).

Now change the value of A1 to 2 and calculate, B1 becomes 1 and B2 becomes 3. ie SUM and ‘+’ give different results. Which is right? I don’t know, neither probably. I’d have thought the correct answer would be some sort of error (#VALUE probably). You just told Excel to treat A1 as pure text, not any kind of number, so it probably shouldn’t let you do any arithmetic on it at all.

Why does this difference happen?

‘SUM‘ works purely on the Troublesome Trucks section of the box (number), whereas ‘+’ manages to end up with something numeric from something stored in the text only slot.

If you change A1 from 2 to ‘a’ the SUM continues to show the result as 1, but the ‘+’ shows a #VALUE! value error. Precisely how/why that happens I don’t know, but I think it is buried deep inside the Excel code base. It could something as simple as the order of some Select Case statement.

An important related fact is that SUM is often faster than ‘+’, and more resilient to non numeric data types (may be a bad thing).

As a matter of interest Open Office Calc gives 1 as the result in both cases, and even if the 1 is replaced with ‘a’. At least this is consistent, although I think I would rather have an error. But that probably goes against the spreadsheet ethos of relaxed controls. My copy of Gnumeric seems to have disappeared so I couldn’t test that, but I will. The joy of these open source spreadsheets is you can ratch around the source code trying to understand why things happen the way they do.

Note if you have Excel 2002 (XP) or 2003 you can asked to be warned about numbers stored as text, and there is one UK school I read about that wished they did (they lost GBP 30K due to numbers formatted as text). (Tools>>Options>>Error Checking)

The precise data type used to store data in Excel cells is called an xloper (an xl operator/operand). It is defined in xlcall.h and is a C language data structure called a struct (a user defined type (UDT) in VB). The struct combines 2 pieces of data; the data type and the actual data. The actual data is in a C structure called a union, VB doesn’t have an equivalent, the nearest thing is a variant. This is what it would sort of look like in VB


Public Type xloper
  lDataType As Long '1= number, 2 = string, 4 = boolean, etc
  vActualData As Variant
End Type

This isn’t strictly correct as a variant already knows its own data sub-type (the C language VARIANT data type is itself implemented as struct/union combination), but hopefully this is near enough to give you the idea.

So to sum up then:

Setting a cells number format to text is unique because it changes the way the underlying value is interpreted, not just its presentation to the user. Probably in a bad way. This post tries to explain the how and the why. I never got chance to mention Max and Monty, which every Thomas the Tank fan will know are the real trouble causers.

What do you think?

Also do you want to see more technical content like this, or do you want to see more ‘management’ style content, or a mix (this is the current plan)?

cheers

Simon

Advertisements

7 Responses to “The joy of text (long)”

  1. Marcus Says:

    Hi Simon,

    Good post – with three boys I immediately understood the Thomas the Tank Engine analogy.

    I’ve certainly seen this phenomenon transpire but more commonly with dates and with text file imports where Excel interprets numbers as strings. Often the culprit is a single apostrophe preceding the value. The quickest solution I’ve found for the ‘numbers as text’ is a quick macro to find and replace every digit between 0 and 9 forcing Excel to reassess the nature of the cell value (I’ve unfortunately witnessed people working their way down a (l-o-o-o-ng) list of ‘numbers as text’ pressing F2 and Enter to force Excel to reassess, one cell at a time.

    P.S.
    I’m happy to see a mix of content although I do have a bias towards more technical content. It’s also interesting learning from other peoples’ “War Stories”.

  2. sam Says:

    Hi Simon,
    Nice post…Its good to see techical content…every now and then…

    Here is are 2 versions of Text 2 Number code- The first is for Multiple Cols, the Second for a Single Col – which uses the little used feature of Text to Coloums.

    Method 1
    Sub Text2Num()
    Dim myRange
    Dim lngctr As Long
    myRange = Selection.Value
    For lngctr = LBound(myRange) To UBound(myRange)
    For c = 1 To Selection.Columns.Count
    myRange(lngctr, c) = myRange(lngctr, c) * 1
    Next
    Next
    Selection.Value = myRange

    End Sub

    Method2
    Sub Text2Num_SingleCol()
    Selection.TextToColumns Destination:=Range(ActiveCell.Address), DataType:=xlDelimited, _
    TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=False, _
    Semicolon:=False, Comma:=False, Space:=False, Other:=False, FieldInfo _
    :=Array(1, 1), TrailingMinusNumbers:=True
    End sub

  3. Simon Murphy Says:

    Marcus
    I just saw NG posting pointing out that this date ‘help’ is new in 2003. I think some biologist/scientist permanatently lost a load of data because Excel converted 12.2 to 12/2 to 39125.
    For text numbers from imports I normally paste special add 0 or multiply by 1. I have seen the same painful cell by cell approach you mention too.
    Sam
    Ive not seen the test to cols approach before.
    These dont fix the underlying text formatting stored in .str rather than the .num part of the xloper though.
    cheers
    Simon

  4. sam Says:

    Simon,
    Dont understand what you mean by
    “These dont fix the underlying text formatting stored in .str rather than the .num part of the xloper though.”

    If you have a column of text numbers, then it gets converted in to real numbers with either of the code… The only difference is that with the second apporach the numbers will continue to be left alinged…

    Try this manually…
    Select a column of Text Numbers, Go To Text To Columns – say Next, Next and come to Step 3 of the wizard. Choose the General option and let the destination cell be the first cell in the selection and say finish….
    You will see that the text numbers have become real numbers (Sum will start working)

    You can covert these numbers back in to text by chosing the Text Option in Step 3 of the wizard…

    Also If you have a buch of text dates

    For Example
    20061225
    20061227
    etc
    You can convert them to real dates by choosing date and “YMD” from the drop down….
    Its also useful if you imports dates in to excel from other applications having a different date style , Ex a “.” Seperator as agaist a “-” seperator in your copy of excel…then these dates will get imported as text dates… and can be converted in to real dates by choosing Date and then the rigth option in the date drop down…

    Regards
    Sam

  5. Simon Herbert Says:

    There was a discussion not so long ago on the Excel L list about numbers stored as text.

    The following snippit quickly removes them:

    With ActiveSheet.UsedRange
    .Value = .Value
    End With

    Of course, this also replaces any formulas with values. If this needs to be avoided the replace the second .Value with .Formula.

    I have recently implemented this on some SAP BW reports I run. The problem is that when you embed a SAPBeX query into Excel the results are returned as text. If you want to compare this to other data (or link into Access) the formatting can cause incorrect / missing results. Adding that code (using resultArea instead of UsedRange if desired) removes this issue.

    (credit to the Excel L list for the original code)

  6. Simon Herbert Says:

    (addition)

    Sorry, forgot to say, with SAP BW you need to add the code to the SAPBEXonRefresh sub.

  7. Simon Murphy Says:

    Sam and Simon
    These approaches may work temporarily, but as long as the xloper internal type is xltypeStr (caused by formatting the cell as text) there is a real risk. I messed around with these and got inconsistent results.
    If you overwrite one of these newly ‘fixed’ numbers with a string, and then overwrite it again with a number, that number does not participate in many calcs. (from my very limited testing)
    To be safe you have to get rid of that text formatting too.

    Simon I think this may be a different issue to what you describe (I must have missed it on Excel-l). I’m talking specifically about cells whose numberformat is text. I know some imports get treated as text initially, which can be fixed as you describe, but those cells are not usually formatted as text, so no risk they will revert to text.

    I’ll post a proper example next week.
    cheers
    Simon

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: