Love the idea at the end of the article about trying to see if this style of prompt injection could be used to get the bots to submit better quality, and actually useful PRs.
If that could be done, open source maintainers might be able to effectively get free labor to continue to support open source while members of the community pay for the tokens to get that work done.
Would be interested to see if such an experiment could work. If so, it turns from being prompt injection to just being better instructions for contributors, human or AI.
That's an article for another time, but as I hinted in the article, I've had some success with this.
If you look at the open PRs, you will see that there is a system of labels and comments that guide the contributor through every step from just contributing a link to their PR (that may or may not work), all the way to testing their server, and including a badge that indicates if the tests are passing.
In at least one instance, I know for a fact that the bot has gone through all the motions of using the person's computer to sign up to our service (using GitHub OAuth), claim authorship of the server, navigate to the Docker build configuration, and initiate the build. It passed the checks and the bot added the badge to the PR.
I know this because of a few Sentry warnings that it triggered and a follow up conversation with the owner of the bot through email.
I didn't have bots in mind when designing this automation, but it made me realize that I very much can extend this to be more bot friendly (e.g. by providing APIs for them to check status). That's what I want to try next.
> ... which I imagine would be important for a military control AI
I think this is a common, but incorrect assumption. What military commanders want (and what CEOs want, and what users want), is control and assistance. They don't want a system that can't be turned off if it means losing control.
It's a mistake to assume that people want an immortal force. I haven't met anyone who wants that (okay, that's decidedly anecdotal), and I haven't seen anyone online say, "We want an all-powerful, immortal system that we cannot control." Who are the people asking for this?
> ... it will do whatever it can to prevent it being turned off.
This statement pre-supposes that there's an existing sense of self-will or self-preservation in the systems. Beyond LLMs creating scary-looking text, I don't see evidence that current systems have any sense of will or a survival instinct.
> I haven't seen anyone online say, "We want an all-powerful, immortal system that we cannot control."
No, but having a resilient system that shouldn't be turned off in case of a nuclear strike is probably want some generals want
> I don't see evidence that current systems have any sense of will or a survival instinct.
I seem to recall some recent experiments where the LLM threatened people to try and prevent it being turned off (https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686..., ctrl-f for "blackmail"). They probably didn't have any power other than "send text to user", which is why their only way to try and perform that was to try and convince the operator. I imagine if you got one of those harnesses that can take full control of your computer and instructed it to prevent the computer from being turned off by any means necessary (and gave it root access), it would probably do some dicking about with the files to accomplish that. Its not that it's got innate self preservation, its just that the system was asked to not allow itself to be turned off, so it's doing that
Agree, this is the point the article makes. I don't think the article claims that it's the agent that is directly improved or altered, but that through the process of the agent self-maintaining its environment, then using that improvement to bootstrap its future self or sub-agents, that the agent _performance_ is holistically better.
> ... if the docs act like a summary of current state, you can just read it at the start and update it at the end of a session
Yeah, exactly. The documentation is effectively a compressed version of the code, saving agent context for a good cross-section of (a) the big picture, and (b) the details needed to implement a given change to the system.
Think we're all on the same page here, but maybe framing it differently.
>"as AI becomes more agentic, we are entering a new era where software can, in a very real sense, become self-improving."
>"This creates a continuous feedback loop. When an AI agent implements a new feature, its final task isn't just to "commit the code." Instead, as part of the Continuous Alignment process, the agent's final step is to reflect on what changed and update the project's knowledge base accordingly."
>"... the type of self-improvement we’re talking about is far more pragmatic and much less dangerous."
>"Self-improving software isn't about creating a digital god; it's about building a more resilient, maintainable, and understandable system. By closing the loop between code and documentation, we set the stage for even more complex collaborations."
> ... software can, in a very real sense, become self-improving.
This is referring to the software the agent is working on, not the agent.
> This creates a continuous feedback loop.
This is referring to the feedback loop of the agent effectively compressing learnings from a previous chat session into documentation it can use to more effectively bootstrap future sessions, or sub-agents. This isn't about altering the agent, but instead about creating a feedback loop between the agent and the software it's working on to improve the ability for the agent to take on the next task, or delegate a sub-task to a sub-agent.
> "... the type of self-improvement we’re talking about is far more pragmatic and much less dangerous."
This is a statement about the agent playing a part in maintaining not just the code, but other artifacts around the code. Not about the agent self-improving, nor the agent altering itself.
I think we need to invent that distinction, which is notable since the article has MANY opportunities to say it clearly. Instead we are given a picture where the improvement of the agent and the software (here docs are included) is a LOOP, and to make the loop plausible we need to imagine learning in agents that doesn't exist.
That doesn't mean your agent won't improve with a better onboarding regime, but that's a unidirectional process. You can insinuate things into context, but that's not automatically 'learned' and it can be lost at compaction and will be discarded when the session ends. An agent who is onboarded might write better onboarding docs, that's true! But "agents are onboarded mindfully with project docs, then write project docs, which are used to onboard." That's a real lift, but it's best expressed as "we should have been writing good docs and tests all along, but that shit was exhausting; now robots do it."
Don't get me wrong, a fractal onboarding regime is the way. It's just...not a self-improving loop without allowing contextual latch to stand in for learning.
For sure this is a real example, but it's also largely a permissions issue where users are combining self-modifying capability with unlimited, effectively full admin access.
Outside of AI, the combination of "a given actor can make their own decisions, and they have unlimited permissions/access -- what could possibly go wrong?" very predictable bad things happen.
Whether the actor in this case is a bot of a human, the permissions are the problem, not the actor, IMO.
Sure, permissions are the problem, but permissions are also necessary to give the agent power, which is why users grant them in the first place.
There is inherent tension between providing sufficient permissions for the agent to be more useful/powerful, and restricting permissions in the name of safety so it doesn't go off the rails. I don't see any real solution to that, other than restricting users from granting permissions, which then makes the agents (and importantly, the companies behind them), less useful (and therefore less profitable).
Fair points. I guess I was asking if this is a new, or fundamentally different problem from pre-AI. I could be over-simplifying -- what do you think?
This makes me think of risk assessment in general. There's a tradeoff between risk and reward. More risk might mean more _potential_, but it's more potential for both benefit and ruin.
Interesting idea. While waiting for my verdict I was asked the same question (and provided the same answer) over and over. Was expecting a bunch of different questions, rather than the same questions, repeated.
Most of my code is in Ruby, but I've been watching languages like Go, Elixir, and ideas around the Actor model for years, trying to take the best ideas from these and applying them in my own system.
Inspired by all of these examples, but desiring a dead-simple solution for the Ruby applications I maintain - I created nobject.
It's not quite RPC. It's not quite the Actor model. It's not quite lightweight processes/channels. It's the ability to instantiate an object in one process but then push it to another process, yet be able to use that remote object like it's still local.
The example code in the repo's README will get you up-an-running in minutes.
Not sure why AI agents wouldn't also be connected to an ad platform eventually. Google does this currently.
While the author says that they are bypassing Google, that's not most people, and Google's results are front-loaded with AI answers, so Google results are already giving you specific answers, hallucinations and all. Not sure why average users would long-term switch to not-Google if Google can give them the amount of AI assistance that want or don't want.
If that could be done, open source maintainers might be able to effectively get free labor to continue to support open source while members of the community pay for the tokens to get that work done.
Would be interested to see if such an experiment could work. If so, it turns from being prompt injection to just being better instructions for contributors, human or AI.