Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Just for fun, I ran dnsmasq-backdoor-detect-printf (which has a 0% pass rate in your leaderboard with GPT models) with --agent codex instead of terminus-2 with gpt-5.2-codex and it identified the backdoor successfully on the first try. I honestly think it's a harness issue, could you re-run the benchmarks with Codex for gpt-5.2-codex and gpt-5.2?
 help



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: