Cool, I've been doing a lot of "coding" (and other typing tasks) recently by tapping a button on my Stream Deck. It starts recording me until I tap it again. At which point, it transcribes the recording and plops it into the paste buffer.
The button next to it pastes when I press it. If I press it again, it hits the enter command.
This is exactly what I am building right now, Stream Deck with two buttons too (push to talk and enter)! It's a sweet little pet project, and has been a blast to build so far. Excited to finally add it to my workflow once its working well.
Check out z.ai coder plan. The $27/mo plan is roughly the same usage as the 20x $200 Claude plan. I have both and Claude is a little better, but GLM 5.1 is much better value.
Agreed, I use Z.ai and the usage is fantastic the only temper that recommendation that it's often unreliable. Perhaps a few times per week it's unresponsive. Maybe more often it seems to become flakey.
It's very variable though recently I'm noticing it's more reliable but there was a patch where it was nearly unusable some days.
Agreed. They had a rough patch around the 4.7 to 5 upgrade. New architecture required hardware migration. The 5 to 5.1 upgrade was much smoother (same architecture new weights). As you say, little rough around edges, but still great value. Trick I learned is that it's max 2 parallel requests per user. You can put a billion tokens a month through it, but need to manage your parallelism.
If you're ok with a model provider that goes down all the time and has such a poor inference engine setup that once you get past 50k tokens you're going to get stuck in endless reasoning loops.
I feel they will go token base at some point, currently if you only use it with precise prompts and not random suggestions, switch between models 5.4 and 5.4 mini depending on the work, it is the best deal.
I bought one of the google AI packages that came with a pile of drive storage and Gemini access.
Unfortunately gemini as a coding agent is a steaming useless pile. They have no right selling it, cheap open weight Chinese models are better at this point.
It's not stupid it just is incompetent at tool use and makes bad mistakes. It constantly gets itself into weird dysfunctional loops when doing basic things like editing files.
I'm not sure what GOOG employees are using internally, but I hope they're not being saddled with Gemini 3.1. It's miles behind.
Are you using gemini CLI or antigravity? The former is not really comparable to the latter in terms of quality. I wouldn't say antigravity is as good as the competition but it's pretty close. Miles behind is overstating it.
Gemini CLI but also used the Gemini models via opencode. They're terrible at CLI tool use. Like I said, just editing text files, they fall over rapidly, constantly making mistakes and then mistakes fixing their mistakes.
Antigravity wants me to switch IDEs, and I'm not going to do that.
This lines up with my experience. Antigravity doesn't have this shortcoming though. I think the agent harness matters equally to the model. Gemini CLI and opencode aren't very great harnesses in my opinion.
I too dislike having my choice of ide forced on me. Hopefully that situation improves, but antigravity demonstrates that Gemini isn't necessarily behind by that much.
Gemini 3.1 is a good coding agent. We've been totally spoiled now. Also, if you use Antigravity you can burn up Opus 4.6 credits off your Goog account instead, before you have to switch to Gem 3.1.
I use the free Chat AIs all the time; Claude, ChatGPT, Gemini, Grok, Mistral.
In the last month they have all clamped down quite heavily. I use to be able to deep-dive into a subject, or fix a small Python project, multiple times per day on the free Web UIs.
Claude, this morning, modified a small Python project for me and that single act exhausted all my free usage for the day. In the past I could do multiple projects per day without issue.
Same with ChatGPT. Gemini at least doesn't go full on "You can use this again at 1100AM", but it does fallback to a model that works very poorly.
Grok and Mistral I don't really use that much, but Grok's coding isn't that bad. The problem is that it is not such a good application for deep-diving a topic, because it will perform a web search before answering anything, making it take long.
Mistral tends to run out of steam very quickly in a conversation. Never tried code on it though.
I use a quota monitor and grind out code on Gemini 3 flash. Only go to sonnet or pro is there's issues flash can't deal with or I have a critical architecture I need nailed on the first try.
I still review every line generated.
Gemini 3.1 pro on the web interface still works if my problems are scoped to a single module or two and my better model quotas are exhausted in the IDE.
For $7 over what I was already paying for storage, primarily using flash is still a good development experience for me.
That's only good for the web based UI. If you want Gemini API access which is what this article is about then you must go the AIStudio route and pricing is API usage based. It does have a free usage tier and new signups can get $300 in free credits for the paid tier so it's I think it's still a good deal, just not as good as using the subscriptions would be.
No? Isn't the article about Codex, which is roughly equivalent to "Gemini CLI" and Google's Antigravity? Google's subscriptions include quotas for both of those, albeit the $20 monthly "Pro" plan has had its "Pro" model quota slashed in the last few weeks. You still get a large number of "Gemini 3 Flash" queries, which has been good enough for the projects I've toyed with in Antigravity.
I guess that's true but I find Google's models better than their public tooling. The Pro subscription includes "Gemini Code Assist and Gemini CLI" but the Gemini Code Assist plugin for IntelliJ which is my daily driver is broken most of the time to the degree that it's completely unusable. Sometimes you can't even type in the input box.
The only way I can do serious development with Gemini models is with other tooling (Cline, etc) that requires API based access which isn't available as part of the subscription.
I agree. Gemini models are held back by their segmentation of usage between multiple products, combined with their awful harnesses and tooling. Gemini cli, antigravity, Gemini code assist, Jules.... The list goes on. Each of these products has only a small limit and they must share usage.
It gets worse than that though. Most harnesses that are made to handle codex and Claude cannot handle Gemini 3.1 correctly. Google has trained Gemini 3.1 to return different json keys than most harnesses expect resulting in awful results and failure. (Based on me perusing multiple harness GitHub issues after Gemini 3.1 came out)
If you aggressively use all buckets Google is incredibly generous. In theory for one AI pro subscription you can get what is a ridiculous return in investment in a family plan.
You could probably be charging google literally thousands if all 6 members were spamming video and image generation and antigravity.
What has actually changed? It's unclear how much can you do right now, unless they've already switched you to the new plan and you're speaking from experience.
We are exiting a hype cycle, well into the adoption curve. Subscriptions were never going to last.
My next step is going to be evaluating open and local models to see if they are sufficiently close to par with frontier models.
My hope is that the end of seat based pricing comes with this tech cycle. I was looking for document signing provider that doesn't charge a monthly, I only need a few docs a year.
I'm developing software in this area right now, so I try a lot of the new models. They're not even close for coding tasks. It basically comes down to 26b parameters vs 1T parameters / quantisation / smaller context sizs, there's no comparison. However, for agentic work, tool calling, text summarisation, local LLMs can be quite capable. Workloads that run as background tasks where you're not concerned about TTFB, cold starts, tok/s etc., this is where local AI is useful.
If you have an M processor then I would recommend that you ditch Ollama because it performs slowly. We get double or triple tok/s using omlx or vmlx, respectively, but vmlx doesn't have extensive support for some models like gpt-oss.
Kimi K2.5 (as an example) is an open model with 1T params. I don't see a reason it has to be local for most use cases- the fact that it's open is what's important.
That is just idealism. Being "open" doesnt get you any advantage in the real world. You're not going to meaningfully compete in the new economy using "lesser" models. The economy does not care about principles or ethics. No one is going to build a long term business that provides actual value on open models. They can try. They can hype. And they can swindle and grift and scalp some profit before they become irrelevant. But it will not last.
Why? Because what was built with an open model can be sneezed into existence by a frontier model ran via first party API with the best practice configurations the providers publish in usage guides that no one seems to know exist.
The difference between the best frontier model (gpt-5.4-xhigh or opus 4.6) and the best open model is vast.
But that is only obvious when your use case is actually pushing the frontier.
If you're building a crud app, or the modern equivalent of a TODO app, even a lemon can produce that nowadays so you will assume open has caught up to closed because your use case never required frontier intelligence.
A model with open weights gives you a huge advantage in the real world.
You can run it on your own hardware, with perfectly predictable costs and predictable quality, without having to worry about how many tokens you use, or whether your subscription limits will be reached in the most inconvenient moment, forcing you to wait until they will be reset, or whether the token price will be increased, or your subscription limits will be decreased, or whether your AI provider will switch the model with a worse one, and so on.
Moreover, no matter how good a "frontier model" may be, it can still produce worse results than a worse model when the programmer who manages it does not also have "frontier intelligence". When liberated of the constraints of a paid API, you may be able to use an AI coding assistant in much more efficient ways, exactly like when the time-sharing access to powerful mainframes has been replaced with the unconstrained use of personal computers.
When I was very young I have passed through the transition from using remotely a mainframe to using my own computer. I certainly do not want to return to that straitjacket style of work.
The vision has been that the open and/or small models, while 8-16 months behind, would eventually reach sufficient capabilities. In this vision, not only do we have freedom of compute, we also get less electricity usage. I suspect long-term the frontier mega models will mainly be used for distillation, like we see from Gemini 3 to Gemma 4.
I recently experimented creating a Python library from scratch with Codex. After I was done, I took the PRD and Task list that was generated and fed them to opencode with Qwen 3.5 running locally.
Opencode was able to create the library as well. It just took about 2x longer.
I did a book in rst and liked that it had cool admonition, import, glossary, and index features that made it better than markdown for me. Still hate the heading conventions.
I have a custom pandoc filter for callouts and index entries. None of the simple lightweight markup languages has complete support for writing a real book. Writing custom rst code is a pain (and no one else in the world uses it). (I say this as a 25-year Python veteran and as a docutils committer!)
I just spent yesterday applying Kaparthy's autoresearch on an ML problem.
I teach ML for a living and was amazed with what the tokens gave back to me after many rounds of experiments. If Kaggle was still a thing, AI would generally beat it.
The challenge I've seen is that most data science/ml modeling work is quite weak. Folks don't even know the basic tools well. Not sure if giving AI to them will really open up many doors to them.
As always experts love minions of juniors doing their deeds. Non-experts get to wade through slop.
I agree AI could probably do a decent job on Kaggle problems. Of course, almost no DS job is building models with well-defined objectives and perfect data. The DS and MLE folks I work with mostly spend their time reframing ill-posed product requests into ML systems that can be maintained and improved with feedback loops.
A _huge_ part of a DS is saying "No" to bad ideas posed by non-experts. The issue with LLMs is all they ever say is "Yes" and "Wow, that's such a great idea!"
Yeah, once you move onto legitimate business evaluation metrics (where Precision@k or Recall@k don't actually fit your business model without modification), GPTs just seem to suffer without context, and hey, knowing the context is part of what gives a data scientist his value.
Or Forth with scientific library, bound to the constraints. Put some HTTP library on top and some easy HTML interface from a browser with no JS/CSS3 support at all. It will look rusty but unexploitable.
Enterprise computing with custom software will make a comeback to avoid these pitfalls. I depise OpenJDK/Mono because of patents but at least they come with complete defaults and a 'normal' install it's more than enough to ship a workable application for almost every OS. Ah, well, smartphones. Serious work is never done with these tools, even with high end tables. Maybe commercials/salespeople and that's it.
It's either that... or promoting reproducible environment with Guix everywhere. Your own Guix container, isolated, importing Pip/CPAN/CTAN/NPM/OPAM and who knows else into a manifest file and ready to ship anywhere, either as a Guix package, a Docker container (Guix can do that), a single DEB/RPM, an AppImage ready to launch on any modern GNU/Linux with a desktop and a lot more.
> Or Forth with scientific library, bound to the constraints. Put some HTTP library on top and some easy HTML interface from a browser with no JS/CSS3 support at all. It will look rusty but unexploitable.
Let this be a lesson to you youngsters that nothing in unexploitable.
Forth has no standard library for interfacing with SQLite or any other database. You're either using 8th or the C ABI. Therefore, you'll most likely be concatenating SQL queries. Are you disciplined enough to make that properly secure? Do you know all the intricacies?
But not all project exploited in a supply chain attack get exploited on the same day.
So when project A gets pwned on day 1 and then, following the attack, project B gets pwned on day 3, if users wait 7 days to upgrade, then that leaves two days for the maintainers of project B to fix the mess: everybody shall have noticed on the 8th day that package A was exploited and that leaves time for project B (and the other projects depending on either A or B) to adapt / fix the mess.
As a sidenote during the first 7 days it could also happen that maintainers of project A notices the shenanigans.
But I love the hacker feel of it.
reply