Early benchmarks seem to support the claim that we can save a lot on JS parsing ...

iainmerrick · on Aug 18, 2017

It seems like one big benefit of the binary format will be the ability to skip sections until they're needed, so the compilation can be done lazily.

But isn't it possible to get most of that benefit from the text format already? Is it really very expensive to scan through 10-20MB of text looking for block delimiters? You have to check for string escapes and the like, but it still doesn't seem very complicated.

comex · on Aug 18, 2017

Well, for one thing, a binary format’s inherent “obfuscatedness” actually works in its favor here. If Binary AST is adopted, I’d expect that in practice, essentially all files in that format will be generated by a tool specifically designed to work with Binary AST, that will never output an invalid file unless there’s a bug in the tool. From there, the file may still be vulnerable to random corruption at various points in the transit process, but a simple checksum in the header should catch almost all corruption. Thus, most developers should never have to worry about encountering lazy errors.

By contrast, JS source files are frequently manipulated by hand, or with generic text processing tools that don’t understand JS syntax. In most respects, the ability to do that is a benefit of text formats - but it means that syntax errors can show up in browsers in practice, so the unpredictability and mysteriousness of lazy errors might be a bigger issue.

I suppose there could just be a little declaration at the beginning of the source file that means “I was made by a compiler/minifier, I promise I don’t have any syntax errors”…

In any case, parsing binary will still be faster, even if you add laziness to text parsing.

iainmerrick · on Aug 19, 2017

a simple checksum in the header should catch almost all corruption

For JavaScript, you have to assume the script may be malicious, so it always has to be fully checked anyway.

It's true that the binary format could be more compact and a bit faster to parse. I just feel that the size difference isn't going to be that big of a deal after gzipping, and the parse time shouldn't be such a big deal. (Although JS engine creators say parse time is a problem, so it must be harder than I realise!)

comex · on Aug 19, 2017

> For JavaScript, you have to assume the script may be malicious, so it always has to be fully checked anyway.

The point I was trying to make isn't that a binary format wouldn't have to be validated, but that the unpredictability of lazy validation wouldn't harm developer UX. It's not a problem if malicious people get bad UX :)

Anyway, I think you're underestimating the complexity of identifying block delimiters while tolerating comments, string literals, regex literals, etc. I'm not sure it's all that much easier than doing a full parse, especially given the need to differentiate between regex literals and division...

iainmerrick · on Aug 22, 2017

I was figuring you could just parse string escapes and match brackets to identify all the block scopes very cheaply.

Regex literals seem like the main tricky bit. You're right, you definitely need a real expression parser to distinguish between "a / b" and "/regex/". That still doesn't seem very expensive though (as long as you're not actually building an AST structure, just scanning through the tokens).

Automatic semicolon insertion also looks fiddly, but I don't think it affects bracket nesting at all (unlike regexes where you could have an orphaned bracket inside the string).

Overall, digging into this, it definitely strikes me that JS's syntax is just as awkward and fiddly as its semantics. Not really surprising I guess!