No time\* to try it unfortunately :( Sounds great, though, and Mac just kicks th...

No time* to try it unfortunately :( Sounds great, though, and Mac just kicks the pants of every other platform on local inference thanks to Metal, I imagine MLX must extend that lead to the point Qualcomm/Google has to a serious investment in open source acceleration. Cheapest iPhone from 2022 kicks the most expensive Android from 2023 (Pixel Fold) around the block, 2x on inference. 12 tkns/s vs. 6/s.

* it sounded like a great idea to do an OpenAI LLM x search app. Then it sounded like a great idea to add embeddings locally for privacy (thus, ONNX). Then it sounded like a great idea to do a local LLM (thus, llama.cpp). Then it sounded like a great idea to differentiate by being on all platforms, supported equally. Really taxing. Think I went too far this time. It works but, jeez, the workload...hopefully, after release, it turns out maintenance load is relatively low