Structured data sheets

This is what got me thinking about combining the best of Access and Excel the other day.

structured-data2.jpg

I think I would like to see some new sheet types. A couple that immediately come to mind are function definition sheets (to be covered soon) and structured data sheets.
A structured data sheet would end up pretty similar to an Access table probably (possibly a not very well structured one (ie not normalised)). Heres a thought about that- if users dont understand normalisation and data theory, why not develop tools that hide that stuff from them, whilst still allowing them to express themselves effectively? (But of course make it available for people who know what they are doing and need full control).

Anyway, structured data sheets… A lot of the models I work with have big, repetitive tables, often with vlookups or index/matches pulling specific items out for use elsewhere, perhaps summaries. The data is structured to an extent, but there is no way to tell Excel (or your spreadsheet of choice).

If you could say values only here, formulas only there, you could remove a whole class of potential errors. If you could create efficient indexes, look ups could be faster. Perhaps you could use SQL (or something a bit more user friendly – whatever that is), maybe removing the 3 argument lookup/unsorted list errors. Perhaps you could define the functions you need just once, and apply them in many places, removing potentially inconsistent formula type errors. I don’t think it could give us world peace, but it might be handy all the same?
Comments?

Heres a rough prototype, but I can already see possible changes  – including losing the row and column headers.

 structured-data1.jpg

Do you have any other suggestions for new developer features?
Harlan already suggested SQL.Request (how would you implement connections Harlan?), It seems I already nicked Ross’s combining Excel and Access idea (you do have prior art proof of that tm right Ross?) Has anyone got any others they would like to share? Improv anyone?

cheers
Simon

Advertisements

3 Responses to “Structured data sheets”

  1. Harlan Grove Says:

    Doesn’t SQL.REQUEST already handle connections? Or does it rely on pre-defined ODBC connection objects?

    Anyway, for comparison purposes, Lotus 123 Release 5 (early 1990s) provided named database connections that could be used like ranges in various functions. For example, @VLOOKUP calls against database tables. As I recall, it required pre-defined ODBC connections, but it was very useful.

    Having seen and worked with database functionality in Excel and 123, I’ve always been impressed by how LITTLE database functionality Microsoft has seen fit to provide Excel. I can’t see them changing their minds about this any time soon.

  2. Harlan Grove Says:

    Different tack.

    Spreadsheets’ main advantage is their grid. It presents an array data structure that can accommodate text or numbers in each cell, and it provides a very simple to understand & use addressing scheme. This makes it VERY EASY to create ad hoc forms/interfaces. NOTHING ELSE is as quick or easy for setting up SIMPLE interfaces.

    Spreadsheets’ main disadvantage is the one-off formula. Excel provides some assistance in locating possibly anomalous formulas via tool tips, but it’s an overly simplistic mechanism (seems to me it compares formulas effectively in R1C1 addressing notation and flags ANY differences). Array formulas address this to some extent, but they’re relatively inflexible. Entering like formulas in batch by selecting multiple cell ranges and entering with [Ctrl]+[Enter] is usually a good idea, but the result is a collection of single cell formulas with no means of locking in their common form. No matter how well you try to integrate spreadsheets and databases, the spreadsheet part will still suffer from one-off formulas.

    Thinking in the context of Excel 2007 and its table data structure, one of the questions I asked in David Gainer’s blog was whether worksheets could be set up as ENTIRELY tables. More accurately, I was angling to find out whether it’d be possible to set up a worksheet so that it could contain only a table, no matter what the size of that table might be. I was disappointed: tables are just special ranges in standard worksheets. As I see it, the advantage of exclusively single table sheets (STSs for short) is that inside themselves they could only contain formula fields/columns based on structured referencing, and those formulas would only need to be entered once and they’d propagate to any newly added records/rows.

    To make this even more usable, it’d be nice if Microsoft added structured referencing syntax to identify different rows within tables, ideally using something similar to SQL SELECT clauses, but I could live with something like R1C1-like [+/-#] offsetting. This would provide a data structure that could be fed in part directly from database tables/views and supplemented with columns containing formulas that provided the consistency of relations (like the formulas in Lotus Improv and more recent analytical programs – Quantrix is the only one I can think of that’s currently on the market that isn’t an OLAP subsystem). It’d also be nice to be able to add additional summary rows for ad hoc summary calculations, e.g., separate averages of 5 largest and 5 smallest values in given columns.

    Further, STSs would have field names rather than column letters (or R1C1 column numbers) in the worksheet border. I’d be indifferent to whether the worksheet border displayed row numbers as long as they numbered data records and summary calculation records differently, ideally with a visible break between the two.

    Finally, formulas in other worksheets could refer to STSs only using structured references, but certain obvious functions, like VLOOKUP, would accommodate partial structured references as 3rd argument, e.g.,

    =VLOOKUP(A2,SomeTable,[Amount])

    returning the value from the Amount field in the table named SomeTable from the row in the first column of SomeTable matching the value in cell A2. Of course if there were row identification syntax like I mentioned above, it’d be possible to refer to this using something like

    =SomeTable[Amount][FirstColumn=(A2)]

    That is, the [Amount] token would specify the column in SomeTable, and the following [FirstColumn=(A2)] token would specify the row (yes, I know this means indexing by column then row, which differs from Excel’s INDEX function – get used to it). I’m putting forward the convention that selection criteria that need to be evaluated would appear within parentheses. This would address possible ambiguity between other field names in the table and defined names or even cell addresses outside the table. For example, if the table contained fields named OpenDate and CloseDate and these were also defined names,

    SomeTable[Amount][OpenDate=CloseDate]

    would refer to the array of values from SomeTable’s Amount field for records in which SomeTable’s OpenDate and CloseDate fields were equal, while

    SomeTable[Amount][OpenDate=(CloseDate)]

    would refer to the array of values from SomeTable’s Amount field for records in which SomeTable’s OpenDate field were equal to the value of the CloseDate defined name. There are other possible ways to handle this; I just thought of using parentheses as the old standby for forcing pass by value in FORTRAN and VBA.

    Anyway, rigidly enforced structured referencing of STSs would go a long way toward providing Excel some much needed referential and formula integrity. It’d also be nice if there were an option to treat formulas entered using [Ctrl]+[Enter] as a single collection so that none of those cells could be changed individually without explicitly undoing the collection. Similar to array formulas but without calculating the formulas as array formulas.

  3. Simon Says:

    Harlan
    Thats pretty much exactly what I am on about, picking results based on value rather than position. Like you, I think this would go a long way to improving integrity.
    I like that you have already sorted the syntax out too.
    Anyone else got any thoughts?
    cheers
    Simon

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: