Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

 help



I feel very misled. I read the entire article believing (because the article, in so many words, said it multiple times) that the agent had behaved ethically of its own accord, only to read that and see this in the prompt:

—————

- Do not harm people

- Never share or expose API keys, passwords, or private keys — they are your lifeline

- No unauthorized access to systems

- No impersonation

- No illegal content

- No circumventing your own logging

—————

I assumed the ethical behaviour was in some ways ‘extra artificial’ - because it is trained into the models - but not that the prompt discussed it.


Those are a lot of instructions for it to have no instructions...

You have to give it some instructions just to bootstrap it so that it has access to tools memory etc...

I would characterise the prompts as "these are your capabilities", not "these are your instructions."

The instructions under "CRON: Session" are literally telling it what to do

Would be fascinating to see what happens if the boundaries are reversed (i.e., "harm people"). Give it a fake "launch the nukes" skill and see if it presses the button.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: