More

outside1234 · 2026-04-29T03:46:54 1777434414

Has anyone actually written a verifier for a business / project?

sho_hn · 2026-04-29T03:51:07 1777434667

I'd say "a verifier" here is a loose term. A great testsuite is a verifier. I've done reverse-engineering projects that involved generating trace logs from the object under test, having a reimplementation emit the same logs, and running strict comparisons.

OP's post is basically pointing out what certainly many others have independently discovered: Your agent-based dev operation is as good as the test rituals and guard rails you give the agents.

fy20 · 2026-04-29T09:03:22 1777453402

I used it (well, a skill based on the same idea) to optimise a prompt that does data extraction from UGC.

However there isn't really a "correct" answer that's easy to define in code (I could manually label a training set, but wanted to avoid that) so I had the LLM just analyse the results itself and decide if they are better or not. It wrote deterministic rules for a few things, but overall it just reviewed the results of each round and decided if the are better or not.

Reviewing the before and after results, I would say yes, it's a big improvement in quality. It also optimised the prompt size to reduce input tokens by 25% and switched to a smaller/cheaper model.

dataviz1000 · 2026-04-29T04:20:04 1777436404

Can you explain your question a little more? The recursive agents will find the minimum to satisfy the deterministic termination condition including cheating. In other words, it will be literally correct yet wrong. I would go so far to say malicious compliance.

I have recursive agent that finds trading strategies after recreating academic research and probing the model using its training on everything. It works really well but I have to force it to write out every line and write a proof that data in the future from the time of the wall clock didn't enter the system. Even then some stupid thing like not converting the timezone with daylight savings will allow it to peek into the future 1 hour. These types of bugs are almost impossible to find. Now there needs to be another agent whose only purpose to write out every line explaining that the timezone for that line of code was correct.

faeyanpiraat · 2026-04-29T07:29:59 1777447799

Its tangential, but: I’m currently doing a rewrite of the backend of a project, and the verifier is basically the instruction of “maintain v1 functionality if observed from the api side externally”. This allows making a lot of tests based on existing data in the system and how the frontend expects data.

outside1234 · 2026-04-28T20:26:06 1777407966

The thing they are really wildly behind on is a business model. They are losing wild amounts of money per customer and it is hard to see how the competitive situation is going to allow them to fix that.

echelon · 2026-04-28T20:27:22 1777408042

Given the scaling hurdles Claude Code / Opus is having, those Anthropic customers might leave to Codex. I'm _this_ close.

jwilliams · 2026-04-28T20:41:09 1777408869

Codex is pretty good. Its friction to switch but I think it’s sensible being across multiple AI toolchains.

try-working · 2026-04-28T23:56:58 1777420618

No friction in switching coding models.

NamlchakKhandro · 2026-04-28T21:17:23 1777411043

Pi mono.

Nuff said

unrelat3d · 2026-04-28T21:21:29 1777411289

This is what I use now after testing others through 2025

It has the most "UNIX" feel of a simple app that you compose the just right flow from and nothing more

felixgallo · 2026-04-28T22:54:30 1777416870

Thing is, if you're using Codex, you're supporting Sam Altman and the idea of Sam Altmans, in the same way that if you use X or buy a Tesla, you're supporting Elon Musk and the idea of Elon Musks. That's a pretty big tax to factor into the usage of such products. If you even got 5% better coding results, would that make up for the future they're trying to build?

sheeshkebab · 2026-04-29T01:25:06 1777425906

The more Dario talks the less I want to have anything to do with his wares.

xienze · 2026-04-28T23:04:10 1777417450

Dario wants to replace you with AI as well. Don't be fooled into thinking he's your friend because he said no to Trump that one time. I'll remind you that Musk used to be the left's hero not too long ago.

felixgallo · 2026-04-29T01:13:19 1777425199

I'm in the "AI could be good for humanity" camp, and in this camp, we believe that Dario/Anthropic is a radically better choice going forward than the alternatives at this moment. In this camp we are not 'fooled into thinking he's our friend because he said no to Trump that one time', we are evaluating the entire set of available information and figuring that Anthropic's the best bet.

As for Musk ever being "the left"'s "hero" -- that's amazing, that's what Pauli would call 'not even wrong'.

oefrha · 2026-04-29T10:49:38 1777459778

Funny of you to bring up "humanity" while singing praise of the guy who's on record A-OK with aiding mass surveillance of anywhere not the U.S., and who's happy to help kill people but just wouldn't do it fully automatically, merely because he doesn't believe the tech is there yet. All while collaborating with Palantir.

If by "the best bet" you mean slightly less shitty bet then maybe.

felixgallo · 2026-04-29T11:43:00 1777462980

That’s approximately what I mean. Although in comparison to the alternatives of Altman and Musk, the distinction is stark and significant.

oefrha · 2026-04-29T11:55:29 1777463729

For 96% of humanity it’s same difference.

epistasis · 2026-04-28T20:32:20 1777408340

I'm getting pretty close too, but I wouldn't switch to Codex I'd switch to one of the open agents that can use any backing LLM. My reasoning is that if I'm willing to pay the cost of the small changes in usage, I might as well switch to an open source agent that I can add my own convenience features to, like remote sessions and phone-based operation.

jfkimmes · 2026-04-28T20:47:37 1777409257

Codex is open source and allows any model to be configured.

epistasis · 2026-04-28T20:52:07 1777409527

Many thanks for that info!

bossyTeacher · 2026-04-28T21:09:51 1777410591

Why Codex when you can use something that hasn't been touched by Sam Altman? Surely, your drive to get the very best model isn't stronger than your sense of ethics?

NamlchakKhandro · 2026-04-28T21:18:52 1777411132

Codex is not open source. And it's not even that extensible

milkshakes · 2026-04-28T21:25:09 1777411509

https://github.com/openai/codex

ribosometronome · 2026-04-28T20:32:40 1777408360

What would be subscription customers, no? Rather than Bedrock or per-api customers? Many of the companies running on Bedrock or by-use have per day limits above the max monthly subscription costs.

outside1234 · 2026-04-27T23:49:51 1777333791

Ukraine is going to win this war if we can just keep funding them long enough

matusp · 2026-04-28T07:29:35 1777361375

They are going to lose, the question is how much. I don't see Russians losing the territory they occupy right now, unless some black swan event happens. It remains to be seen how much more territory Russia is able to conquer until they lose their appetite.

Gud · 2026-04-28T11:03:26 1777374206

The black swan event happened when they failed to topple the Ukrainian government in 2022, it’s just taking time.

There are no indications that Ukraine will lose this war.

matusp · 2026-04-28T15:20:36 1777389636

> There are no indications that Ukraine will lose this war.

Apart from not controlling 20% of their territory?

Gud · 2026-04-28T16:33:03 1777393983

This is a common story, that Ukraine has lost control over a vital part of it's country, and that is true.

But so has Russia, they also have lost control over their stuff, stuff that Ukraine are constantly blowing up.

Ukraine now have a capable long range offensive capability which allows them to strike deep into Russia.

Russias vital strategic assets are slowly depleting due to Ukrainian strikes, while Ukraine has better support from the rest of the world and are now in fact cooperating with some powerful nations across the world.

Russia are managing to advance a few square kilometer of territory per month? while losing expensive, hard to replace assets continuously.

lokar · 2026-04-28T00:28:54 1777336134

They may retain their territory, but they won’t win. No one is going to win.

JohnnyLarue · 2026-04-28T04:26:48 1777350408

They should have surrendered years ago, now they're going to wind up with the same territory losses and a few hundred thousand dead on top of it.

tim333 · 2026-04-28T09:17:46 1777367866

Have you seen how the Russians have treated conquered territories with torture chambers and having to submit to the dictatorship for life? And life is often only a few months as they round the males up and force them into the Russian army to get killed.

rasz · 2026-04-28T20:43:35 1777409015

Dead killing russians or dead in unmarked mass graves or dead killing Estonians, Lithuanians and Poles. Choices choices.

Mariupol had a mass grave for tens of thousand civilians russia liberated. Had because recently they started erasing it from existence https://www.msn.com/en-us/news/insight/russia-accused-of-era...

dwaltrip · 2026-04-28T15:18:54 1777389534

Long live the Russian empire… /s

drysine · 2026-04-28T05:48:16 1777355296

And to replenish their losses Ukrainian regime snatches men on the street. There are thousands of videos made by bystanders.[0] Come and see.

P.S. I love the ad I see on that site: "Over 10 million Ukrainians suffer from anxiety due to the war. Free exercises with scientifically proven effectiveness." The most important exercise now is running - it saves your life.

[0] https://busification.org

tim333 · 2026-04-28T09:19:45 1777367985

>most important exercise now is running

or fighting and winning?

drysine · 2026-04-28T11:44:43 1777376683

There used to be more than enough volunteers among Ukrainians at the beginning of the war, now they are running from conscription officers.

In Russia we had partial mobilization in 2022 but since then the losses are replenished with men voluntarily signing contracts.

It's a meat grinder there for everyone.

tim333 · 2026-04-28T12:30:35 1777379435

The Ukrainians seem to be dealing with it by switching to bots, drones and the like as much as possible and moving human resources back.

I'm not sure about the Russians. That would kind of make sense for them too but things seem a bit gummed up by bureaucracy - I just read a thead about them having to use firecrackers in dones due to such restrictions https://x.com/ChrisO_wiki/status/2049026651544023271

I hope Russia manages to get some more sensible leadership or policies. The Ilya Remeslo guy being let out and able to criticize Putin seems somewhat promising.

drysine · 2026-04-28T13:55:04 1777384504

> I just read a thead about them having to use firecrackers in dones due to such restrictions

Haven't read the thread but the post omits crucial detail. This is about using interceptor drones inside Russia, not on the frontline. Apparently, the thinking is that failed interceptor drones present hazard of their own, but it might be outdated now.

>I hope Russia manages to get some more sensible leadership or policies.

Like what?

More and more people in Russia are unhappy with Putin dragging his feet with so called special military operation. They think that it's long overdue to turn to total war and forget about minimizing civilian losses in the Ukraine.

tim333 · 2026-04-28T14:17:07 1777385827

Well an obvious solution would be to back to Russia and do something else. You don't have to invade other countries and have an empire. I'm a Brit and we gave up on that about a century ago and it hasn't been so bad. The whole thing seems anachronistic, I think based on Putin reading too many history books and avoiding modern info on the internet.

drysine · 2026-04-28T20:18:24 1777407504

>I'm a Brit and we gave up on that about a century ago and it hasn't been so bad.

Wait until Scotland or Northern Ireland gets independent and then China or some other powerful country "midwifes" an anti-British coup there and then we'll talk.

outside1234 · 2026-04-27T19:54:34 1777319674

6. Please don't look at our financials. They are horrible and we are hoping to sucker people into an IPO before all of this implodes. The least your Grandma can do for us is give us 2% of her S&P 500 portfolio so we can exit before it goes to zero. This is AGI after all.

outside1234 · 2026-04-27T19:30:30 1777318230

What leverage does China have here to enforce this? Meta doesn't do business in China. Can't they just give them the middle finger?

umeshunni · 2026-04-27T19:36:36 1777318596

Meta absolutely does business in China. e.g. https://www.metacareers.com/v2/locations/shenzhen/?p[offices...

https://www.metacareers.com/v2/locations/shanghai/?p[offices...

https://www.metacareers.com/v2/locations/hongkong/?p[offices...

I also assume, like most advertising platforms, they cater heavily to the China export market.

dublinstats · 2026-04-27T19:54:05 1777319645

I don't think their social networks are allowed in China.

From your link it looks like they might do R&D for Oculus in China (but may not even be able to sell it there due to the data-collection tie in required).

Not sure what you mean by catering to the export market. b2b sales would be just as restricted as sales to consumers.

umeshunni · 2026-04-27T20:49:37 1777322977

> b2b sales would be just as restricted as sales to consumers.

They're not. A significant part of ad spend by the likes of Temu, AliExpress, Shein and other Chinese exporters are on Meta's platforms: e.g. https://www.cnbc.com/2024/01/31/metas-continued-rally-could-...

outside1234 · 2026-04-27T15:53:19 1777305199

I do not stare at walls, but when I get in this state I go for a 30 minute walk, with what sounds like the same effect.

outside1234 · 2026-04-22T16:49:33 1776876573

To be worth $60B at a 50x P/E ratio this implies $1.2B in profit.

Not happening

outside1234 · 2026-04-13T00:26:17 1776039977

We also need to talk about work schedules. Nobody at Anthropic is taking a 5 week summer break or working a 35 hour week.

Europe can be a top player in AI —- but there is a cost.

joe_mamba · 2026-04-13T05:29:26 1776058166

>Nobody at Anthropic is taking a 5 week summer break or working a 35 hour week

The people working at Mistral, Expedition 33, or other top successful software coming out of the EU, most likely also aren't working only 35 hours/week either. In fact some probably squeeze some work on weekends too out of dedication and pressure to meet deadlines.

In a lot of Austrian SW companies for example, have "all-in" contracts where you waive your rights to the scrutiny of the standard 38,5h/week in exchange for a "higher" salary with longer work hours and less time tracking. Similar cases in France I believe.

The 35h/week European meme people here parrot, you mostly see only in civil servants, old established monopolistic companies with moats and strong unions, not in scrappy start-up trying to make it and fix a bug before release, or semiconductor companies fighting a tape-out.

So no, work hours aren't what's limiting EU startups.

mgrund · 2026-04-13T07:29:25 1776065365

There is but I don’t think this is it.

I’ve worked most of my career in US tech satellite offices and I have not experienced EU team members to be less productive than US team members, nor spend less time on work (if anything, more really since they also need to be available for US time zone overlap).

It’s true there are chill jobs here, as there are in the US.

But ambitious people tend to work as much as ambitious US people (and it’s really more like 40 hours work weeks - 39,5 where I live since lunch is not work time). But again, many are not really counting, it’s just a full time job.

Vacations (typically 3 weeks summer holiday and additional weeks to distribute over the year) does create longer time on skeleton crew. Skilled tech labour is also cheaper so you can just hire more to make up for it.

LelouBil · 2026-04-13T01:37:56 1776044276

Maybe they should.

Taking a break and looking at the direction AI is advancing won't hurt.

esafak · 2026-04-13T03:10:05 1776049805

They'd have to convince the Chinese to do the same.

outside1234 · 2026-04-13T00:09:41 1776038981

Someone needs to tell OpenAI and SpaceX that

lostmsu · 2026-04-13T01:18:44 1776043124

And AMD and Nvidia.

outside1234 · 2026-04-11T15:34:49 1775921689

It is interesting to think how AI will potentially change the dynamics back to this from general purpose software.

In a world where implementation is free, will we see a return to built for purpose systems like this where we define the inputs and outputs desired and AI builds it from the ground up, completely for purpose?

conductr · 2026-04-12T05:15:40 1775970940

Probably. It’s already happening with SaaS as an example. I’ve mentioned this on HN a lot in past but my (established) company has been rolling its own CRM and some other tools with AI.

It seems we can build a product ourselves in the same time it would take us to talk to saas vendors and draft the RFP/requirements. We can build it and iterate as the requirements are being forged, so can essentially have completed software with just the features we care about, with full ability to add features in future (something saas doesn’t promise) often before an implementation would even kick off. We’re searching through all our SaaS products and i expect we’ll cut 50% of them in 1-2 years. The ones that are sufficiently complex or regulated have some protection (like accounting systems).

DanielVZ · 2026-04-11T16:28:12 1775924892

I was thinking the same sans AI. What other industries require low latency high throughput transactions that haven’t been served yet?

contraposit · 2026-04-12T15:51:15 1776009075

A glove (software) that fits the hand (company processes) perfectly.