Data types

Bearing in mind that cells in Excel worksheets are effectively variants is there any point in having an integrated programming language that is typed?

Is the whole ‘Option Explicit’ thing wrong headed for a spreadsheet automation language?

The modern trend seems to be towards dynamic typing, would that be a better option?

I’ve always properly typed all variables (and not with that $# nonsense either). But thats maybe a throwback to other types of dev work I have done.

When working with spreadsheet data in code should we worry about the underlying type?

And if so, should we also worry about it in spreadsheet formulas too?

What do you think?

cheers

Simon

14 Responses to “Data types”

  1. Harlan Grove Says:

    If you’re writing macros, no big deal using nothing but variant variables. OTOH, if you’re writing calculation-intensive functions, there could be a noticeable performance penalty due to type checking.

    I don’t have any urls at hand, but I’d figure this issue has been addressed in the context of scripting languages vs compiled languages a few times by now.

  2. jonpeltier Says:

    Simon – The cells themselves are ranges. Their contents (.Value) are variants. Option Explicit doesn’t care what types you use, as long as you dim all of your variables. I usually use variants to get the cell values into my code, then I do whatever validation I need, put things into an array of longs or doubles, do the math, and spit it back into the sheet.

  3. Johan Nordberg Says:

    As Homer Simpson would say:
    -Datatypes – the cause of – and solution – to all my life’s problems!

    Actually JavaScript 2 probably will get proper typing.

    And you already are worrying about types in worksheet functions. Just look at the vlookup when you try to find a integer in a list of strings even though it seems like “1” = 1.

    Dates very much have a date type in Excel and “2008-04-03” is not the same as 2008-04-03.

    I would actually like more specific data types in the worksheet. If you try to use Data, Validation to validate that the user has submitted a date it’s okay to enter 100 into the cell.

    Where do you draw the line of when to use just Variants and when to use the correct type? I can’t imagine a database with just varchar fields.

  4. Mike Staunton Says:

    I use only three types for all my UDFs – 90% are doubles, 9% are integers and 1% are longs – sometimes I’ll type arrays as well, sometimes I ‘ll just leave them to default as Variants

    And can I just defend my use of old-fashioned type-declaration characters – since my code gets printed in books and magazines, it would be much too verbose and look horrible with all the As Doubles so # is much quicker to type and saves lots of characters so readers can focus on the numerical algorithms

    And Option Explicit’s primary purpose is to warn of more common mistakes such as mis-spelling variable names

  5. Charles Williams Says:

    Data typing in VBA/Excel is IMHO something of a minefield:
    – for instance using .Value (which is the default) to get a value from a cell which is formatted as currency gives you a VBA currency data type, which means that the default loses all decimals beyond 4!
    – using default datatypes can be 100s of times slower than being sensibly explicit
    – under-the covers type conversions occur frequently and cause many subtle bugs

    So I try to always do my own type conversions explicitly, but dont always succeed in defeating VBA.

  6. Harlan Grove Says:

    Johan: the old 1-2-3 approach to date validation was formatting date entry cells as Text (actually as Label in 1-2-3 terminology, but it’s the same as Excel’s Text number format), then checking @ISERR(@DATEVALUE(…)). Same approach works in Excel, and it’s more robust than allowing dates to be entered as date values. Excel also converts valid text-dates automatically into date values in arithmetic expressions and function arguments where dates are expected, e.g., =YEAR(“3 APR”) returns 2008, at least with transition formula evaluation disabled. Only thing to watch out for is date calculations like the midpoint between dates- use (A+B)/2 rather than AVERAGE(A,B).

  7. Dick Kusleika Says:

    OK, somebody educate me. Why does Simon say that cells are variants and Charles say they are currency? If they are variants, why does typename(activecell.value) return Double when there’s a number in the cell? Does Value2 always return either Double or String? Should I be using Value2 by default so formatting can never thwart my efforts?

  8. Dick Kusleika Says:

    And as long as I’m displaying my ignorance: I’ve been telling people for years to stop using Integer. Since VB6, it’s only there for compatibility. The compiler only uses Long and all Integers are converted to Long anyway. I don’t remember where I “learned” that. Can anyone confirm or deny.

  9. Charles Williams Says:

    I would think that Excel cells are not strictly variants (probably safearray?) since thats only a VBA datattype.
    So I think the question should be what happens when you read a cell into a VBA variable.
    If for instance you read a cell that is formatted as currency using cell.Value into a VBA variable that is typed as double, the cell.value is implicitly converted to currency datatype unders the covers, and the currency temporary variable is then converted to double and stored in the BNA variable.
    Using .Value2 you do not get the under-the-cover conversion to currency (also no Date conversion from Excel epoch to VBA epoch).

    If you read an Excel cell into a VBA variant then you will get a variant containing some other datatype (string, double, date, logical, error …)

    I never use integers, longs are faster and less likely to overflow.

  10. Charles Williams Says:

    (BNA must be a non-spell-checked successor language to VBA)

  11. Marcus Says:

    “…spreadsheet data in code …worry about the underlying type?”
    Not if you don’t care about the accuracy of your calculations. Like some others, I usually load a range into an array for processing which, as you’re aware is a variant array. I always need to validate the data as there’s no guarantee that the data is of the expected data type or within permissible ranges.

    My basic assumption is never trust the data unless I can guarantee its integrity (rare instance). Hence, variants are a blessing and a curse.

    Jon’s may be on to something casting the variant array into arrays of specific data types for calculation. I usually leave the data in the variant array and validate during processing.

    Longs vs Integers
    Dick, I’ve read/been told the same – that they are both assigned the same amount of memory so there is no memory gain using integers. I also tend to stick to longs for the same reason as Charles (overflows).

    Cheers – Marcus

  12. Simon Says:

    The Excel cell datatype is a discriminated union (just like a variant is).
    That means its 2 pieces of data:
    1. a discriptor to tell you how to handle the next bit
    2. the actual data (just some binary)

    At least this is the data type used in the SDK, I guess deep inside it might be slightly different, but the principle will be the same.

    Here is a good explanation from the horses mouth:
    http://msdn.microsoft.com/archive/en-us/office97/html/SF7EF.asp?frame=true

    here was my version:

    The joy of text (long)


    and

    The joy of text (part 2 of 2 hopefully)

    UDF to get the underlying cell datatype:
    http://www.codematic.net/files/joyoftext2.zip

    A VB integer is 16 bits I think, and a long is 32, when we were all using 32 bit processors and operating systems I too had read a long was faster than an integer as its the native type. In a 64 bit world I dunno what will happen, does VB even have a 64bit integer datatype?

    I think we should be using .value2 rather than value usually, as its faster and doesn’t have implicit data conversions in the background. Charles?

  13. Mike Staunton Says:

    In the .NET world there are at least Int16 (corresponding to Integer) and Int32 (corresponding to Long) types

    Here’s a more detailed discussion
    http://forums.devx.com/archive/index.php/t-77309.html

    And yes, it looks like I can now replace my 16-bit integers with 32-bit longs since the latter are quicker in a 32-bit world and have a greater range (so now I’ll just have two types – Doubles and Longs)

  14. dougaj4 Says:

    I came here via a discussion at Daily-Dose-of-Excel:
    http://www.dailydoseofexcel.com/archives/2007/10/10/calculation-bug-fixed/

    I always explicitly declare my data types because if I don’t the routines I write interfacing with an engineering analysis program (via an API) just don’t work. Seems like a good habit to get into.

    From today I will also be adding a 2 to all my .values.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.