Iteration on spreadsheets is great, and there's opportunity for tons of innovation in this space. But this suffers from the same problems that most spreadsheets have. Errors like not averaging over the right set of data (which you pointed out) don't come up because the code can't be diffed in git. They come up because there's not enough clarity in whether a set of cells are consistent or not. Making this text-based doesn't add clarity because ultimately there will be too much text to read.
Here's a test for the most common kind of error. If I add data in A5 and B5, will that data be represented in the chart? How about in the averages? Will I be able to even see that?
Here's the pivot: Break your data into regions, where regions have repeated elements. Something like this:
The dataset associated with that sub-block can be clearly annotated to an editing user as such, with simple tools for adding a row to that dataset, or sorting it without effecting the rest of the sheet. Within the dataset, there's no corruption of the formulas (the third row's C cant be different than the second's), you've still got your diffing (probably better because it's clear that data changed but formulas different) and it's extremely hard to make the calculated average come out wrong.
Yeah, that exact representation isn't great. Maybe a "view" metaphor, so that headers and footers can be attached to the dataset instead of floating outside it. But once you've gone this direction, there's all sorts of amazing things possible by sharing datasets, linking them, transforming them, etc.
The Numbers app on OSX does something quite like this. It's highly underrated, but having a number of smaller, floating spread sheets in one big sandbox is waaaaay better for many things than a single sheet. I think the concept should be pushed even further.
And of course, Excel can flag cells that, according to its heuristics, seem to use inconsistent formulas, such as using empty cells in a summation, or having one cell in a column use a different formula.
Hey, Stencila developer here. Yes, I agree. I recently came across the Numbers app when I was writing this. I haven't used it but from what I have read it seems to have some really good ideas for improving the spreadsheet interface.
Kayia (kayia.org) is a spreadsheet/reactive programming environment but where the code can be 'diff'ed. All values in the spreadsheet are versioned, so you can either look at the current value of, say, Cell B2 or its immutable value at a specific point in time, or see how its value has changed over time. And since the code is stored in the same way as the data, version control of the source is a first-class part of the language.
Yes, you're right this doesn't solve all the issues with spreadsheets as a programming environment. The "averaging over the right data" is a big one, and yes, being able to diff a text file won't help that.
Thanks a lot for the suggested syntax - it's really good to get some feedback. I need more time to think it through and work out exactly how this would work.
In the meantime, one solution to the problem, that I would like to implement soon, is cell mapping - in effect projecting native data structures onto grid cells. That way you can still display and modify your data in the spreadsheet but use a sensible formula like `mean(data$height)` instead of `mean(A2:A5)`. Would love comments on that idea - there's an issue on it https://github.com/stencila/stencila/issues/118
Here's a test for the most common kind of error. If I add data in A5 and B5, will that data be represented in the chart? How about in the averages? Will I be able to even see that?
Here's the pivot: Break your data into regions, where regions have repeated elements. Something like this:
This could be used to generate: The dataset associated with that sub-block can be clearly annotated to an editing user as such, with simple tools for adding a row to that dataset, or sorting it without effecting the rest of the sheet. Within the dataset, there's no corruption of the formulas (the third row's C cant be different than the second's), you've still got your diffing (probably better because it's clear that data changed but formulas different) and it's extremely hard to make the calculated average come out wrong.Yeah, that exact representation isn't great. Maybe a "view" metaphor, so that headers and footers can be attached to the dataset instead of floating outside it. But once you've gone this direction, there's all sorts of amazing things possible by sharing datasets, linking them, transforming them, etc.