User defined functions

I’m still on the look out for spreadsheet horror stories for Eusprig this year, but in the meantime…

This is the final post on developer features for the time being. I think this one may be a big issue though.
User defined functions.
In mainstream programming you define a function once and use it potentially many times, so for example

bool AssertApproxEqual(double firstValue, double secondValue, double materialityLevel){
  return (fabs(firstValue - secondValue) < materialityLevel);
  //fabs is the float(/double) version of abs - absolute value
}

and then in various places you call it. This is great as you can thoroughly test the function with appropriate data, and then be assured that it will work right, as long as the parameters are valid.

In native spreadsheets we don’t really have this functionality easily accessible. And I think we probably should.
In spreadsheeting each use of a function is actually a definition, or redefinition. This creates many comprehension and auditing headaches. If I use =SUM(D10:D20), (or =SUM(R[-11]C:R[-1])) to sum sales, and then copy it across for each period, then to check the spreadsheet is correct I need to make sure that those formulas remain the same. If the formula is very long or complex then this becomes significantly harder. In big spreadsheets I personally often end up coding the formula population in VBA so I can be sure they are consistent.

We do have some options:
You can put formulas in names, I quite like this but I think its so unusual as to be unfair on potential maintainers, and users possibly. There is also a performance hit as names use array evaluation rules. It is possible to pass parameters to names but it is messy.
nameformula2.jpg

You can write worksheet functions in VBA, I never do this, the performance hit is just too great for me. It also gives clients a bigger maintenance challenge as now they need people with VBA skills as well as spreadsheet skills. (The answer to this of course is to get a maintenance agreement from a member of the soon to be announced spreadsheet support network thingy. or codematic of course) I also think VBA is a bit of a sledgehammer to crack a nut in this case, its power creating unnecessary security worries.

You can use XLM, I do do this, although I accept XLM skills are pretty rare these days. I do it because the performance is good and distribution is easy. There are plenty of downsides, scary ‘unstoppable macro’ warnings, no ability to sign code are 2 obvious ones. Even if you have no XLM experience, and forgetting for a minute how old and deprecated XLM is, just look at this, ignore the first 2 lines and see if you can guess what it does. (note too my spangly new Excel 2007)

xlm1.jpg

XLLs, these make sense for general functions, but not no so much for stuff that relates to one file only, as they must be distributed separately. These are actually pretty easy to write these days as there are some good tools available to help (at a price) eg TurboExcel (soon to change name??) and XLL+. Writing them from scratch isn’t that hard if you know a bit of C.

There are also assorted other ways using automation add-ins with VB6 or .net .

Dermot Balson demonstrates an interesting approach here. He uses data tables, a table of data, and a complex calculation to define a function once and use it many times. There was a VBA version at one time too. Access has a similar feature when you create a calculated column, or a calc in a query – defined once, used many.

I would like to see this custom function facility properly integrated into spreadsheet applications. The key point being accessible to ‘normal’ users. Defining and using functions is what spreadsheeting is all about, its odd that you currently need to be a power user to do it effectively.

You certainly should not need additional tools like a C/C++ compiler for xlls, or Visual Basic (no longer sold), or Visual Studio or other .net IDE for automation add-ins. Deployment should be a doddle too, no registration, no local admin rights required etc. I know there are lots of negatives associated with distributing multiple copies, but it seems to have worked for spreadsheets (so far anyway!). It would be nice if such UDFs did not trigger macro warnings, maybe by limiting their functionality. I’m open on protecting UDF’s from prying eyes, personal preference is probably not to bother though.

Easy migration would be the icing on the cake, imagine modelling the function in Excel, click a button, and deploy some sort of compiled component (dll probably) to your web server, that doesn’t even know what Excel is. TurboExcel does this (I think), and XLL+ can host xlls without needing Excel I think, but as I don’t currently have a working version of either (hint!) I can’t check.

I was thinking of creating a homemade version with a UDF called (imaginatively) UDF. It would be used like this:

=UDF(“SumSales”, param1, param2, param3…)
SumSales would refer to a table of workings a bit like Dermot describes. These would be defined on a special new sort of ‘function definition’ sheet. The definition would include a decent description, lots of info about the parameters, including expected data type and error values (so it would never need to be wrapped in ISERROR, or IF(ISERROR for us old school types (although PED has an IFERROR xll, and I have an XLM version above)
I’m sure this could be built into spreadsheets natively avoiding the need for the ugly UDF piece. Actually I don’t think current Excel XLM macro sheets are that far off. Not sure if Ooc or Gnumeric have something similar. Here is a rough example:
udf1.jpg

Here are some links (corrected – thanks Rob) to some other peoples thoughts in this area, if you know of others please leave a link in a comment.
Some (Microsoft funded, I think) research 1, 2
A Microsoft patent in this area
Useful names for googling: Simon Peyton-Jones, Alan Blackwell
Cortes and Hansen a simplified spreadsheet showing how user defined functions may be integrated, and plenty of onward interestng links.

I have heard this mentioned many times as an important feature in taking spreadsheeting to the ‘next level’ whatever that is. What do you think?, is it vital to have some form of function definition separate from its usage? or is the repeated combined definition and use a key part of what makes spreadsheets spreadsheets? Do we have enough options already? Quite a few people have been less than enthusiastic about other suggestions, mainly on the basis they would add complexity, is this the same? or would it reduce complexity? (Depends on the user I guess).

Cheers

Simon

Advertisements

14 Responses to “User defined functions”

  1. Dermot Balson Says:

    Thanks for the reference, Simon.

    I’ve eaten my own dogfood recently, and implemented my approach for an actuarial valuation system, creating 3 “layers” in 3 worksheets, the top one calculating just 3 times, and the bottom one calculating 30,000 times, in one particular example.

    It works very, very well, enabling me to fit and test complex clients very quickly. The downside is speed, if the number of iterations gets too large. I estimated that a sheet with no formulae at all still requires about .0015 to calculate,and that adds up when you repeat thousands of times. It needs Microsoft to give us a subroutine (or nested sheet) facility that works efficiently on a large scale.

  2. Rob Bruce Says:

    Simon, I’m getting 404s on the links in the second last paragraph of your post.

  3. Simon Says:

    Rob
    Thaks for the heads up, I seem to have managed to get them relative instead of absolute. Should be sorted now
    cheers
    Simon

  4. Rich Says:

    We can take a hint…

    And yes, TurboExcel will, if requested, create DLL’s that can be called from activex Web pages, and also, if requested, it can automatically create Web services of the generated functions.

    The com server route is extremely slow; fully 5,000 times slower than calling the C++ DLL generated code (no typo). And you are absolutely correct, VBA UDF’s are slow: they give you a 300 fold performance hit.

    Though we’re not sure if or how these numbers change in 2007.

  5. Bob Phillips Says:

    Can you tell me where you get that mutiple of 300, it is not a figure I recognise?

    UDFs will vary depending upon whether you write all the code from scratch, whether you use built-in functionality etc., and whilst I wouldn’t dispute a performance hit, 300 times is well in excess of my experience.

  6. Simon Says:

    Bob
    If you choose something VB is especially bad at like string manipulation you could get something like this.
    Steve Dalton (http://www.eigensys.com/) measures C at 309 times faster than VBA for summing the ascii values of a string. Not sure why you might do that, but if you did and you needed it done quickly choose C not VBA!

    Rich I have heard good reports of TurboExcel, when I get around to updating the UDF performance stuff at codematic I’d like include it in the tests. I want to get several tests though to give a better picture.

    cheers
    Simon

  7. Rich Says:

    The 300 figure happened to be from a rondom number generator, not string stuff. It was almost a nothing routine, so it measured more the overhead of calling into VBA than the VBA code itself. If string stuff is especially slow, I would have expected an even bigger performance difference.

    Simon, just let us know when you’d like a copy.

  8. Bob Phillips Says:

    Simon,

    Comparing a C routine to say a VBA routine is not measuring the overhead of a UDF surely, that only comes with comparing a worksheet function with a similalrly functioning UDF. In a random number generatro, the worst I can get is 50% overhead, nowhere near 30,000%.

  9. Harlan Grove Says:

    Summing the character code values of a string is one step in CRC error checking routines and many hashing functions. Not something the sane would screw around with in VBA.

    The example in your 4th paragraph doesn’t hold. Formulas aren’t equivalent to functions in other languages. They’re equivalent to expressions in other languages, and nothing prevents coders from writing ugly long expressions in most procedural languages.

    If your concern is consistency of formulas that should be the same (modulo copying from one location to another, so identical in R1C1 addressing), then I’ve already mentioned that it’d be useful for MSFT to add another entry mechanism like [Ctrl]+[Enter] that enters formulas in a batch *and* prevents modification of individual formulas so entered, like with multiple cell array formulas. Maybe [Alt]+[Ctrl]+[Enter].

    There are other ways to achieve this, but they require more setup. You could use lots of defined names to name ranges that should always have the same formulas (in R1C1 addressing). Then have Open and BeforeSave event handlers call the same procedure to check whether all formulas in the indicated named ranges were the same. Best to provide an option to allow the user/developer to proceed with inconsistent formulas, but the proc should also list the formulas and allow the user/developer to select the correct formula, then copy that formula (.FormulaR1C1) to all other cells in the range. Maybe also call that proc from SheetChange as well. This SHOULD be built-in functionality for developers, but it can be added with VBA.

    An alternative would be parametrized defined names. Purely as a parsing exercise, it wouldn’t be all that difficult to detect that the defined name

    eqa(x,y)

    was something different than a static token like eqa_of_x_and_y, i.e., it’d be easy to detect just one set of matching parentheses, detect commas (or more stringently, system list separator characters) within the parentheses, then split the remaining characters into tokens with the first token, the only one outside the parentheses, put into the udf or add-in function name space, and use the other tokens to allocate argument slots. The defined name parser would need to ensure that only the other tokens appeared in the definition (no recursion for now), and that each of the other tokens in the name appeared at least once in the definition or issue a warning about unused arguments). For the defined name above, the definition could be

    =ABS(x-y)

  10. Harlan Grove Says:

    Your blog software ate some of my response, I’ll use the FORTRAN .LT. in place of the less than operator/left angle bracket.

    An alternative would be parametrized defined names. Purely as a parsing exercise, it wouldn’t be all that difficult to detect that the defined name

    eqa(x,y)

    . . . For the defined name above, the definition could be

    =ABS(x-y) .LT. TOLERANCE

    where TOLERANCE would be a different, scalar defined name evaluating to a ‘small’ positive value considered close enough to equality. When worksheet functions call such macro functions, they just performs text substitution on the definition, replacing the definition’s argument tokens with the values from the formula’s function call.

    Maybe the initial intent for XLM was the same, but XLM provides looping and functions that could affect the Excel environment or even run external processes. Parametrized macros, as I see them, would provide nothing more than argument handling, but the definitions would only be able to call built-in, udf and add-in functions and evaluate arithmetic and string operations, basically nothing more than what formulas could contain without parametrized macros. That should be fairly safe, so no need for signing code.

  11. Simon Says:

    Bob It was in the context of an xll UDF. I have some donothing functions (http://www.codematic.net/Excel-development/Excel-Visual-studio/excel-user-defined-functions.htm) to test the calling overhead, and its significant but not massive. I can’t remember the exact results but it was maybe 10 times faster or something at most. Probably similar to your findings actually. I’m going to look at this all more thoroughly in a couple of months, in particular I want to try some different testing methods. And I want to test VS2005 and Excel 2007 on a dual core pc. Do you fancy joining up to do that?

    Harlan, sorry your comment disappeared
    Whats the difference between a function and an expression?
    At least you can embed comments in normal func/pressions, although you are right you can create monsters if you like (I’ve seen a single VBA routine with 1,600 lines – that was pretty messy).
    The ‘block’ idea sounds good, I find the inability to insert/delete within array blocks to be a mixed blessing, would you continue that or allow inserts? maybe copying formulas into inserted range?
    Parameterised names also makes sense – as long as there is a decent interface to manage them (hence my thoughts of a new sort of sheet).
    I think a bit of tidying and reducing the scope and XLM could be the way to go. Maybe remove the command stuff, just leave the function stuff, only be able to create function names, better argument handling and we’d pretty much be there. I think they should be so intrinsic that no additional security/signing is needed.
    cheers
    Simon

  12. Harlan Grove Says:

    The equivalence is between a FORMULA and an expression. A function is a procedure that returns something, as opposed to a procedure that doesn’t return anything. [And in C, all procedures are functions, and the ones that return nothing are type void.] So ‘function’ without further qualification is a function definition. A ‘function call’ is a term that’s effectively equivalent to the value the function returns given its arguments. Single terms may be expressions, but expressions may involve many terms, so from little expressions big expressions grow.

    The following are all expressions.

    1
    1+1
    1+Abs(x)
    1+Abs(Sin(y))
    Exp(1+Abs(Sin(y)))

    The last 3 include function calls.

    As for paremetrized names vs special purpose sheets, please don’t give me special purpose sheets. On the other hand, another suggestion I made in a response in David Gainer’s blog was to provide a RESTRICTED mode for VBA which would allow macros to do anything WITHIN the Excel environment, but NOTHING outside it, so no file system operations, no API calls, nothing. Problematic whether restricted VBA should be able to open, save and close files – no clear harm opening files, no clear harm (other than wasted effort) closing them without saving, but overwriting existing files is borderline unsafe but often an obvious need in macros.

    Anyway, IF Microsoft were willing to invest in creating restricted mode XLM sheets, that might be best, but XLM is a very crude language. Parametrized names would likely cost less to implement than restricted mode XLM. And if it were nothing more than text replacement followed by evaluation of the resulting text, it could provide useful functionality like dynamic external references, e.g., extref(p,wb,ws,r) defined as

    =’p\[wb]ws’!r

    and similarly for DDE links. This assumes the evaluation mechanism would be able to evaluate external references. I suspect Excel caches external references, using the external reference tokens as associative indices into a table of cached values. If so, it’ll never provide dynamic external reference evaluation without the kludge of using a subordinate automated Excel application instance.

    TANGENT: udfs can instantiate other Excel application instances and launch other processes via Shell, but it’s Oh So Unsafe! for Evaluate to handle external references. More proof (if needed) that Microsoft has little if any respect for the intelligence of Excel developers. That, or Excel’s implementation for evaluating external references is so awkward that it’s cost-prohibitive to change.

  13. Idetrorce Says:

    very interesting, but I don’t agree with you
    Idetrorce

  14. acrortfrotret Says:

    Great web site, Will visit once again!!

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: