Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Iteration on spreadsheets is great, and there's opportunity for tons of innovation in this space. But this suffers from the same problems that most spreadsheets have. Errors like not averaging over the right set of data (which you pointed out) don't come up because the code can't be diffed in git. They come up because there's not enough clarity in whether a set of cells are consistent or not. Making this text-based doesn't add clarity because ultimately there will be too much text to read.

Here's a test for the most common kind of error. If I add data in A5 and B5, will that data be represented in the chart? How about in the averages? Will I be able to even see that?

Here's the pivot: Break your data into regions, where regions have repeated elements. Something like this:

    A1 = 'Height
    B1 = 'Width
    C1 = 'Area
    A2:C2 = Region {
      C1 = =A1*B1
    }{
      A1 = 2
      B1 = 3
      A2 = 4
      B2 = 5
    }
    B3 = 'Average
    C3 = =sum(C2)/count(C2)
This could be used to generate:

    Height  Width   Area
    2       3        6
    4       5       20
            Average 13
The dataset associated with that sub-block can be clearly annotated to an editing user as such, with simple tools for adding a row to that dataset, or sorting it without effecting the rest of the sheet. Within the dataset, there's no corruption of the formulas (the third row's C cant be different than the second's), you've still got your diffing (probably better because it's clear that data changed but formulas different) and it's extremely hard to make the calculated average come out wrong.

Yeah, that exact representation isn't great. Maybe a "view" metaphor, so that headers and footers can be attached to the dataset instead of floating outside it. But once you've gone this direction, there's all sorts of amazing things possible by sharing datasets, linking them, transforming them, etc.



The Numbers app on OSX does something quite like this. It's highly underrated, but having a number of smaller, floating spread sheets in one big sandbox is waaaaay better for many things than a single sheet. I think the concept should be pushed even further.


The UI is clunkier than that of Numbers, but you can do something similar in Excel with tables (blocks in a sheet that you somewhat enforce some structure on. See https://support.office.com/en-gb/article/Overview-of-Excel-t...).

And of course, Excel can flag cells that, according to its heuristics, seem to use inconsistent formulas, such as using empty cells in a summation, or having one cell in a column use a different formula.


Hey, Stencila developer here. Yes, I agree. I recently came across the Numbers app when I was writing this. I haven't used it but from what I have read it seems to have some really good ideas for improving the spreadsheet interface.


Kayia (kayia.org) is a spreadsheet/reactive programming environment but where the code can be 'diff'ed. All values in the spreadsheet are versioned, so you can either look at the current value of, say, Cell B2 or its immutable value at a specific point in time, or see how its value has changed over time. And since the code is stored in the same way as the data, version control of the source is a first-class part of the language.


Yes, you're right this doesn't solve all the issues with spreadsheets as a programming environment. The "averaging over the right data" is a big one, and yes, being able to diff a text file won't help that.

Thanks a lot for the suggested syntax - it's really good to get some feedback. I need more time to think it through and work out exactly how this would work.

In the meantime, one solution to the problem, that I would like to implement soon, is cell mapping - in effect projecting native data structures onto grid cells. That way you can still display and modify your data in the spreadsheet but use a sensible formula like `mean(data$height)` instead of `mean(A2:A5)`. Would love comments on that idea - there's an issue on it https://github.com/stencila/stencila/issues/118


An interesting extension would be to use the now easily parsable text-based format, and apply static analysis checks to it.

eg.

Warning: C10 has a range from A1:A9 but C9 only has B1:B8




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: