The problem is that the well you are drinking from has in fact been poisoned. Maybe you think you can tolerate it but some projects are taking a policy decision that any exposure is too dangerous and that is IMO perfectly reasonable.
The fact that the AI agent will just go and attempt to do whatever insane shit I can dream up is both the most fun thing about playing with it, and also terrifying enough to make me review its output carefully before it goes anywhere near production.
(Hot take: If you're not using --dangerously-skip-permissions, you don't have enough confidence in your sandbox and you probably shouldn't be using a coding agent in that environment)
That Terraform blast radius is exactly the problem I'm building Daedalab around: agents need hard approvals, scoped permissions, and an audit trail before prod is even reachable. If you're curios: www.daedalab.app
Hot take indeed. Unfortunately it's too blunt an instrument. I can't control "you may search for XYZ about my codebase but not W because W is IP-sensitive". So, to retain Web Search / Web Fetch for when it's useful, all such tool uses must be reviewed to ensure nothing sensitive goes outside the trust boundary.
Yes, I'm aware this implies differing levels of trust for data passing through Claude versus through public search. It's okay for everyone to have different policies on this depending on specific context, use-case and trust policies.
From the article, it sounds like that engineer did a lot of other reckless things even before handing the tasks over to the AI agent to continue the recklessness with even more abandon.
This is a case study in "if you don't know what you're doing, the answer is not just to hand it over to some AI bot to do it for you."
The answer is to hire a professional. That is if you care about your data, or even just your reputation.
> before handing the tasks over to the AI agent to continue the recklessness with even more abandon
Which is a funny outcome of this because apparently the AI agent (Claude) tried to talk him out of doing some of the crazy stuff he wanted to do! Not only did he make bad decisions before invoking the AI, he even ignored and overruled the agent when it was flagging problems with the approach.
> Unless your strategy is to create a photo-lab-like screen in pure black and red, or wear deep-red-tinted glasses, it’s unlikely that a pure colorshift strategy will cut out that big of a chunk of the spectrum.
The writer is dismissing this out of hand but to me this sounds like a great idea.
> And I'm guessing that the reason macOS doesn't give more details is because macOS is likely not involved in the step that fails
And I guess because of the wide variety of third-party hardware macOS has to support, it's not practical to write a pre-flight check into the update process either.
I've never tried it myself, but it's oft-repeated folk wisdom in Apple circles that enabling filesystem case-sensitivity breaks all manner of third-party software that has only ever been tested on the case-insensitive default.
Are you hosted on cloud platforms that are SOC2 compliant? Or have you achieved and been audited for SOC2 compliance yourself? I'm going to have to assume it's the former because if it was the latter you would directly say so. To me that type of sleight-of-hand inspires distrust, which is fatal to any prospect of me evaluating the product.
Beyond that, a key risk that has been brought into focus more and more lately is data portability and vendor lock-in. At this point I do not deploy a new vendor without documenting the exit strategy.
The best exit strategy you can offer is an open source, self-hostable version of the product with a simple migration plan. Some of the other existing competitors in the enterprise chat space already offer this. Even if no-one uses it, by offering it you keep your priorities aligned with your customers.
I have not been as aggressive as GP in trying new AI tools. But the last few months I have been trying more and more and I'm just not seeing it.
One project I tried out recently I took a test-driven approach. I built out the test suite while asking the AI to do the actual implementation. This was one of my more successful attempts, and may have saved me 20-30% time overall - but I still had to throw out 80% of what it built because the agent just refused to implement the architecture I was describing.
It's at its most useful if I'm trying to bootstrap something new on a stack I barely know, OR if I decide I just don't care about the quality of the output.
I have tried different CLI tools, IDE tools. Overall I've had the best success with Claude Code but I'm open to trying new things.
Do you have any good resources you would recommend for getting LLM's to perform better, or staying up-to-date on the field in general?
reply