I saw a video of a Chinese phone that did something like that. Their implementation was a privacy and security nightmare but basically it shared a active feed of your screen with an LLM and would literally tap, type and swipe to achieve your objective. Like order these oodles from this app, it would only interrupt it's actions at payment processing screens.
Looked really cool and like the AI I've always imagined.
This is why Apple (and Google) is in a privileged position to tackle this issue at the OS level. If you currently trust your OS, then having a local agent use your apps wouldn't be terribly different (prompt injection risk aside)
Looked really cool and like the AI I've always imagined.