>In this post, we are going to build a generalization of Transformer models that can operate on (almost) arbitrary structures such as functions, graphs, probability distributions, not just matrices and vectors.
We already know the keywords of the language and symbols from the standard library and othe major ones. As well as the rules of the grammar. So the weight can be biased against that. Not sure how that would work, though.
I don’t think that would help with natural language to programming language, but that can probably help with patterns, kinda like a powerful suggestion engine.