SOTA results are a happy byproduct of the core mission of our approach, which is to enable the effective and simple translation of policy documents into a model without having to fine-tune and prompt engineer. This performance is somewhat unexpected but also sensical, so we're still trying to figure out the best way to harness it. That may include releasing model artifacts in the future.
Here is a problem I've been noodling with. If you are a decent programmer, how does your LLM help you solve this problem?
Given a cheminformatics fingerprint definition based on SMARTS substructure patterns, come up with a screening filter, likely using a decision tree, which uses intermediate feature tests to prune search space faster than simply testing each pattern one-by-one.
which could be improved by an element count test - count the number of fluorines, and only do the test if there are enough atoms in the molecule to fingerprint.
So one stage might be to construct a list of element counts;
ele_counts = [0]*200
seen = set()
for atom in mol.GetAtoms():
ele_counts[eleno:=atom.GetAtomicNum()] += 1
seen.add(eleno)
then have a lookup table for each element, based on the patterns which have at least that count of the given element type;
ele_patterns = [
# max known count, list of set of matching patterns
(0, [set()]), # element 0
(0, [set()]), # hydrogen
..
(20, [{all patterns which contain no carbon},
{all patterns which require at most 1 carbon}, ...
{all patterns which require at most 19 carbons}],
(10, [{all patterns which contain no fluorine}, ..
{all patterns which contain at most 9 fluorines}],
...]
However, this is not sophisticated enough to identify which other tests, like the "CC(=NNC=O)C" example I gave before, or "S(=O)(=O)", which might be good tests at a higher level than the element.
And clearly if there isn't a sulphur, aren't two oxygens, and aren't two double bonds then there's no need to test "S(=O)(=O)", suggesting a tree structure would be useful.
New business models come around very rarely. In the mean time, existing companies optimize everything they can relentlessly - that's why people own their stock.
Minion AI | Fullstack Eng, ML Eng, Tools Eng | SF or Remote | Full-time
Creator of GitHub Copilot here <wave>.
Minion AI is on a mission to build a useful web agent that performs tasks for you. Our personal AI is at the forefront of rethinking human-computer interaction in light of AI advancements.
Join us to work at the bleeding edge of AI: prompting, fine-tuning, synthetic data, learning by example, codegen, planning, reasoning, and memory for embodied agents.