More

tananaev · 2026-03-10T14:45:12 1773153912

Russia has been slowly cracking down on popular communication and media platforms. First they slow down connection to unusable speeds. This happened to YouTube at some point last year. At first they even said that it's something wrong with Google and it's not them. I think the intention is to slowly get people off the platform without completely blocking it. Then eventually they block access completely. Same happened to messaging apps, like WhatsApp and Telegram. Telegram is still working for messaging, but not calls. It's kind of funny because Telegram is used by Russian military to coordinate a lot of things, so they complain a lot about the block.

_fat_santa · 2026-03-10T14:54:52 1773154492

I have family in Russia and it's a sad state of affairs. Our ability to communicate with them is slowly degrading to the point where now I am looking into self-hosted communications.

tananaev · 2026-03-10T15:25:59 1773156359

I've been using WeChat. My hope is they won't dare to block Chinese messenger. China is pretty much the only remaining lifeline for Russia.

sega_sai · 2026-03-10T15:53:53 1773158033

I have a similar situation and Amnezia (either in WG mode or Xray mode) works well with a self-hosted server. Also SSH tunnel as proxy so far also works.

proxysna · 2026-03-10T15:00:51 1773154851

Look into vxray it works for my wife's family. AmneziaVPN worked for me during my last visit too.

bbminner · 2026-03-10T15:21:15 1773156075

To my surprise, even sophisticated means of traffic masking like amnezia and vxray get disrupted frequently, requiring hopping around self hosted solutions and updating ones setup periodically. That's waaay beyond what most people are capable of. I am fortunate to have some tech worker acquaintance who live next to my family members, otherwise there'd be no way for me to for example guide them through setup and re-configuration remotely. Still, this setup gets disrupted every month or so requiring manual intervention.

wildylion · 2026-03-10T21:05:00 1773176700

Try to get a middle hop somewhere at a russian datacenter. Sometimes these have DPI censorship boxes disabled (?) -- I know one that lets me forward simple Wireguard from mobile routers to a EU server with a few SNAT/DNAT rules, even though ordinarily that would get blocked at first sight.

(Sadly, it's just Mikrotik gear that can't use any fancy censorship evasion protocols).

argulane · 2026-03-11T16:55:58 1773248158

If it's arm based mikrotik then you could run containers with fancier protocol wrappers

konart · 2026-03-10T17:37:38 1773164258

I have 3x-ui installed in Netherlands and everything works fine so far.

But sure, they are trying their worst to block every channel of data exchange they can.

OCTAGRAM · 2026-03-11T07:23:38 1773213818

I would say they are trying to block every public VPN, and if some VPN tried to hide behind CloudFlare's backs thinking that they took all CloudFlare sites as hostages, then whole CloudFlare is nuked, and hostages did not save VPN from blocking.

wildylion · 2026-03-10T20:51:26 1773175886

I'm considering even creating a dial-up (yes, V.34 modem!) line somewhere near to Russia, to offer a side channel with text browsing, news, IRC and email. For when things get really, really bad (they will ...)

Before you ask: yes, dialup works on modern networks if the codec is G.711 (uncompressed). Most public phone network is this way because fax is a thing, but some bulk carriers or some enterprises use compressed codecs.

pixl97 · 2026-03-10T15:36:50 1773157010

Iron curtain is coming back up.

TitaRusell · 2026-03-10T17:58:49 1773165529

On a positive note Russia is now the heart of digital piracy. They aren't in a mood to go after piracy groups.

DANmode · 2026-03-10T18:10:20 1773166220

Now? lol

iberator · 2026-03-10T19:15:08 1773170108

[flagged]

dang · 2026-03-10T22:20:55 1773181255

Nationalistic flamewar is not allowed on Hacker News, regardless of nation. Personal attacks aren't allowed either. We ban accounts that post like this, so please don't.

I'm sure you have good reason to feel the way you do, but please, no more of this here.

https://news.ycombinator.com/newsguidelines.html

Edit: you've unfortunately been breaking the site guidelines in other places as well, and we've already warned you once. If you'd please review the guidelines and stick to them when posting here, we'd appreciate it.

tryauuum · 2026-03-10T22:15:37 1773180937

Maybe you should redirect your hate towards the thing you can influence -- your own government who supposedly "lets too many fake refugees in".

elvin_d · 2026-03-10T19:37:27 1773171447

I feel your take will be taken down soon as it’s not the place for such discussion.

Just food for thought: what makes you to feel so entitled to judge people place of living? You know how we are not choosing where to be born and our mobility is restricted a lot of times? Is your separation for man and women comes from religion or other bias to categorize them?

iberator · 2026-03-10T19:53:07 1773172387

Russia is our enemy. Simple as that. It's bizarre to allow those people in the west (everybody suddenly is a refugee which is fake) .

wildylion · 2026-03-10T21:00:18 1773176418

This is the literal job of state security. Besides, most of those $#@$###s get in not by pretending they've been persecuted, but simply because they already have deep cover and hold passports of other countries.

And I sincerely hope that you will never have to know what it's like to flee your own country, first hand. Peace.

johnisgood · 2026-03-10T21:15:51 1773177351

The Russian government or the Russian people?

iberator · 2026-03-11T07:39:32 1773214772

Russian people who didn't fight against the state with following persecution have no right to be refugees imo

Maybe except gay people etc

netsharc · 2026-03-10T20:31:59 1773174719

Oh look at you, how easy for you to demand of others that they put themselves in danger before you deem them worthy of protection. "You must have proof you've been arrested by Putin's police before we let you in here!". So they must risk the chance of immediate imprisonment in a freezing Siberian dungeon before you open your generous doors...

And graduates working for the Belarussian state, why is that acceptable and not considered as "conspiring with the war criminals" in your eyes? What other barriers are there that you have in your mind we don't know about, for someone who's worked as expected, and fled the country afterwards?

gridder · 2026-03-10T22:09:59 1773180599

Survivorship bias

wildylion · 2026-03-10T20:56:12 1773176172

Fuck you. I have many people here (many of them queer) who had to leave everything and become forced emigres or asylum seekers.

Have a bit of compassion, would ya?

My childhood crush is in Ukraine (I mean, he's Ukrainian), my dear friend (Ukrainian) had to leave everything and seek refuge in NL. My friends (Russian) are under a constant threat of getting imprisoned for 10+ years because they still help support queer and trans people in Russia.

Compared to them I feel very privileged, because I was able to GTFO on my own. But if you think that all Russian citizens must be deported, you're either a troll or a madman. Besides, this is exactly what, for example, Stalin did to Chechens, or think about what the USA did to the Japanese.

Did it help someone? No. Did it ruin millions of lives? You fucking guess.

upd: made it all clearer, and sorry for all the profanity

Esn024 · 2026-03-10T18:17:04 1773166624

>It's kind of funny because Telegram is used by Russian military to coordinate a lot of things, so they complain a lot about the block.

If that's true, then it was really stupid of them to allow things to get to that point. Look at the US -- they had no tolerance for a major social media app (TikTok) to be outside their own control, and they weren't even in a major war at the time. It seems obvious that if you ARE in a major war, you wouldn't want your main social media and messaging app to be under the control of somebody (Pavel Durov) who was recently arrested by a member (France) of the military alliance you're fighting against (NATO), when it is unclear what deal he may have made with that government to be released from prison. It seems obvious to suspect that the price of his freedom may have been a backdoor that allows the opposing military to read all the messages your own people are sending.

The real failure of Russia's is that, unlike the US, they have been systematically unable to keep its own top tech talent supportive of their own government. The top US tech companies have been only too eager to do almost anything their government asks of them, with only some rare and tepid pushback (such as that by Anthropic recently), that seems to get severely punished when it does happen. So there has been no need for the US government to go to the extents that Russia is going to now, simply because they were able to coopt their top talent into working for and with the state (with some rare exceptions like Snowden, and I'd say the "damage" from that has been pretty successfully contained).

The Chinese government may have had some issues with that as well, considering what happened with Jack Ma (though I don't know much about it).

OCTAGRAM · 2026-03-11T07:49:13 1773215353

> unable to keep its own top tech talent supportive of their own government

Government did much to turn them away. And with regards to Makh messanger. Patriotic tech talents are supposed to be interested in Elbrus 2000 PC and Aurora mobile OS. Does Makh messenger work on something Russian? No, Makh does not work on anything Russian. So what makes Makh Russian? We don't get it. It's some another Russia that we don't belong to. Our Russia is Elbrus 2000 PC and Aurora mobile OS. And software from Astra group. People behind Elbrus 2000 support orthodoxal christianity, and people behind Astra group are for great Soviet past. They call one of Astra Linux releases "Leningrad". The proper name of what is currently known as "Saint-Petersburg."

Makh is from some commercial group that does not care about our values. Virtually openly violates traditional values. They are from VK group, and VK hosts VKFest, an open air for youth with rotten words, furnication songs, all that stuff.

Our Russia and their Russia don't mix like water and oil.

For military there is another communication network called Свод (Svod, Arch). It was 4 years late to the party, but at least goes in now.

bojan · 2026-03-10T15:22:12 1773156132

That explains why I can't seem to access VKontakte anymore from outside.

Not a huge loss as it rightfully suffers the same fate as Facebook, but still.

nasretdinov · 2026-03-10T19:05:08 1773169508

I think VK is being blocked in some Western countries because it allowed pirated content, not because Russia blocked it :)

betaby · 2026-03-10T15:27:46 1773156466

VK loads just fine from Canada. Rogers and Bell mobile phones to be more specific.

Modified3019 · 2026-03-10T15:34:24 1773156864

> It's kind of funny because Telegram is used by Russian military to coordinate a lot of things, so they complain a lot about the block.

This plus the starlink cutoff blinded them so badly Ukraine was able to counterattack and retake a bit of area north of Huliaipole, with armored vehicles (which normally attract immediate drone response these days) last I checked operations are still ongoing, so it’ll be a bit before we know the extent of what they were able to do.

ekropotin · 2026-03-10T15:07:06 1773155226

Russia seems to be executing CCP’s playbook. They even trying to push everyone to their version of WeChat, which is called Max.

esafak · 2026-03-10T15:23:11 1773156191

Perhaps they could use an encryption program that uses the sanctioned app as the transport layer. Like how people used to use PGP with email.

Terr_ · 2026-03-10T17:33:38 1773164018

That might satisfy message-privacy and connectivity, but it seems it'd be vulnerable when it comes to identity-privacy and detection.

I suppose you could use an LLM on each end to write superficially plausible messages and use ~sten~ steganography, although then there's still the problem of "Weird, this user types at 500WPM without sleeping."

itintheory · 2026-03-10T17:49:17 1773164957

> stenography

I think you mean steganography[0]. Stenography is shorthand, used for transcription.

[0] https://en.wikipedia.org/wiki/Steganography

wildylion · 2026-03-10T21:01:43 1773176503

That'll eventually get detected and put a massive target on the back of anybody dumb enough to use it. Sorry to burst your bubble.

adgjlsfhk1 · 2026-03-10T15:47:46 1773157666

oh cool, I didn't know hbo had a Russian messaging service

mandeepj · 2026-03-10T16:26:35 1773159995

VPNs didn't help?

OCTAGRAM · 2026-03-11T07:16:38 1773213398

Far not everybody cares to use them

moralestapia · 2026-03-10T15:29:51 1773156591

What do they use, instead?

It's not like they don't want any videos online.

OCTAGRAM · 2026-03-11T06:55:33 1773212133

+ Dzen Video

YouTube is easily accelerated via DPI bypass. It used to work with CloudFlare too, but not anymore, so CloudFlare is banned more hard than YouTube. With DPI bypass YouTube is very fast.

the_mitsuhiko · 2026-03-10T15:38:34 1773157114

Rutube, VK.

flexagoon · 2026-03-10T15:48:40 1773157720

> What do they use, instead?

a VPN

sourcegrift · 2026-03-10T15:04:46 1773155086

> youtube is slow

Maybe they're using Windows Phones?

tananaev · 2026-03-06T23:07:18 1772838438

No link to source and just says "The source code is available from Open Camera's SourceForge page." Why not link it?

tl2do · 2026-03-06T23:16:02 1772838962

I couldn’t find it at first, but it is there on SourceForge: https://sourceforge.net/p/opencamera/code/ci/master/tree/app...

tananaev · 2026-02-23T13:44:51 1771854291

Maybe they’re trying to DDoS it, and once an error is returned, they assume that no robots.txt file exists and then crawl everything else on the site?

Ndymium · 2026-02-23T14:17:25 1771856245

While 7700 per hour sounds big, pretty much any dinky server can handle it. So I don't think it's a matter of DDoS. At this point it's just... odd behaviour.

mghackerlady · 2026-02-23T14:26:45 1771856805

especially for a txt file. I don't know anything really about webdev but I'm pretty sure serving up 7700 plaintext files with roughly 10 lines each an hour isn't that demanding

tananaev · 2026-02-16T00:53:06 1771203186

I think it's really poor argument that AGI won't happen because model doesn't understand physical world. That can be trained the same way everything else is.

I think the biggest issue we currently have is with proper memory. But even that is because it's not feasible to post-train an individual model on its experiences at scale. It's not a fundamental architectural limitation.

esafak · 2026-02-16T03:15:41 1771211741

You need to be able to at least control things that interact with the world to learn from it.

stagezerowil · 2026-02-16T00:55:23 1771203323

When people move the goal posts for AGI toward a physical state, they are usually doing it so they can continue to raise more funding rounds at a higher valuation. Not saying the author is doing that.

tananaev · 2026-01-25T16:10:16 1769357416

Are you serious? It's open source. And there's less than 1000 lines total. Get Codex or Claude to review it if you're paranoid.

encom · 2026-01-25T16:41:42 1769359302

Go easy on the guy. Mac users are so used to overpaying for trivial functionality.

Alejandro9R · 2026-01-25T16:39:09 1769359149

The thing is that how do you know at the end of the day that the compiled binary hasn't been tampered with "extra code" besides what's in the repo?

I don't even think notarization gets rid of this problem neither, so the best you can do for this is compile it yourself. Maybe I'm wrong!

alexford1987 · 2026-01-25T16:41:17 1769359277

Compiling it yourself is the best/only thing you can do if you really want to know what code went into a binary.

prmoustache · 2026-01-25T17:47:35 1769363255

What prevents you from compiling it if it is open-source?

That's what I do with every project delivered as docker image. I rebuild the app and the image.

tananaev · 2026-01-22T01:42:17 1769046137

I recommend checking history of deregulation of agricultural industry in New Zealand. It didn't lose the industry. Actually the opposite happened.

Persistent government subsidies are almost never a good idea long term. I understand that some temporary support might make sense in some cases, but not permanent one. It prevents innovation and optimization. And in the long run it usually makes more damage.

keithnz · 2026-01-22T01:53:02 1769046782

Having been in the NZ ag tech industry for the last 25+ years, US subsidies and tarrifs drove a lot of innovation in NZ (also Europe) and then US manufacturers in the spaces I've been in have pretty much collapsed when faced with better tech as farmers switched to using our ( or the European) tech.

bix6 · 2026-01-22T03:57:45 1769054265

Curious what sort of tech? Like better tractors and such?

greycol · 2026-01-22T20:18:49 1769113129

A lot of meat cutting (and packaging) robotics and dairy automation are the flashy ones. Softer tech like crop, orchard management and cultivar creation as well as stock breeding/selection or logistics all of which came a long way. The development of uses for byproducts i.e. chemical refineries to change milk into something like protein or milk powder and use the secondary products from those processes to produce alcohols or fertilizer.

UltraSane · 2026-01-22T04:47:36 1769057256

Please provide examples

tw04 · 2026-01-22T01:59:50 1769047190

It would appear that to remain competitive they had massive consolidation, and with that an increase in animal density leading to major issues with water pollution.

https://en.wikipedia.org/wiki/CraFarms

So I guess yay deregulation, now with more capitalist privatized profits with socialized costs!

tw04 · 2026-01-22T05:13:17 1769058797

Downvoting without engaging in a discussion kind of directly violates both the spirit and rules around here.

I've posted pretty solid evidence that dregulation, did not, in fact improve the agricultural situation for New Zealand. It absolutely made a subset of corporations and mega-farmers extremely rich at the expense of the natural resources the rest of the country shares. Would LOVE to hear the arguments about how that's a good thing for the people of New Zealand or our planet as a whole.

But then again, that would require thoughtful discourse...

array_key_first · 2026-01-22T17:50:45 1769104245

Just to expand on this idea with more historical context: part of the reason agriculture is regulated like it is in the US is because it used to be much more deregulated. And then speculation and profiteering in agriculture in the 1920s contributed to the great depression and caused the dust bowl. Then, it became a national food security issue. The New Deal is where a lot of the regulation and subsidies originate, but we didn't just do it for kicks. We have, actually, tried the alternative, and it was a disaster.

mlrtime · 2026-01-22T11:14:27 1769080467

Because it goes against the urban popular group think. "Blue States subsidizing Red States" "NZ did it, so US can to"

Provide any real or partial claims this isn't the whole story and it's difficult to change your mind on something that is fundamental to your beliefs. So downvote and move to the next post that validates your beliefs. Happens to everyone including me.

tananaev · 2025-12-18T18:58:40 1766084320

I was very skeptical about Codex at the beginning, but now all my coding tasks start with Codex. It's not perfect at everything, but overall it's pretty amazing. Refactoring, building something new, building something I'm not familiar with. It is still not great at debugging things.

One surprising thing that codex helped with is procrastination. I'm sure many people had this feeling when you have some big task and you don't quite know where to start. Just send it to Codex. It might not get it right, but it's almost always good starting point that you can quickly iterate on.

jackschultz · 2025-12-18T20:13:09 1766088789

Infinitely agree with all. I was skeptical, and then tried Opus 4.5 and was blown away. Codex with 5.0 and 5.1 wasn't great, but 5.2 is big improvement. I can't do code without it because there's no point. Time and quality with the right constraints, you're going to get better code.

And same thought with both procrastination because of not knowing where to start, but also getting stuck in the middle and not knowing where to go. Literally never happens anymore. Having discussions with it for doing the planning and different options for implementations, and you get to the end with a good design description and then, what's the point of writing the code yourself when with that design, it's going to write it quickly and matching the agreements.

nextaccountic · 2025-12-18T21:03:25 1766091805

You can code without it. Maybe you don't want to, but if you're a programmer, you can

(here I am remembering a time I had no computer and would program data structures in OCaml with pen and paper, then would go to university the next day to try it. Often times it worked the first try)

jackschultz · 2025-12-18T21:20:44 1766092844

Sure, but the end of this post [0] is where I'm at. I don't feel the need or want to write the code when I can spend my time doing the other parts that are much more interesting and valuable.

> Emil concluded his article like this:

> JustHTML is about 3,000 lines of Python with 8,500+ tests passing. I couldn’t have written it this quickly without the agent. > But “quickly” doesn’t mean “without thinking.” I spent a lot of time reviewing code, making design decisions, and steering the agent in the right direction. The agent did the typing; I did the thinking. > That’s probably the right division of labor.

>I couldn’t agree more. Coding agents replace the part of my job that involves typing the code into a computer. I find what’s left to be a much more valuable use of my time.

[0] https://simonwillison.net/2025/Dec/14/justhtml/

culopatin · 2025-12-18T22:09:18 1766095758

But are those tests relevant? I tried using LLMs to write tests at work and whenever I review them I end up asking it “Ok great, passes the test, but is the test relevant? Does it test anything useful?” And I get a “Oh yeah, you’re right, this test is pointless”

manmal · 2025-12-18T23:07:19 1766099239

Keep track of test coverage and ask it to delete tests without lowering coverage by more than let’s say 0.01 percent points. If you have a script that gives it only the test coverage, and a file with all tests including line number ranges, it is more or less a dumb task it can work on for hours, without actually reading the files (which would fill context too quickly).

gaigalas · 2025-12-19T00:16:09 1766103369

That does not work as advertised.

If you leave an agent for hours trying to increase coverage by percentage without further guiding instructions you will end up with lots of garbage.

In order to achieve this, you need several distinct loops. One that creates tests (there will be garbage), one that consolidates redundant tests, one that parametrizes repetitive tests, and so on.

Agents create redundant tests for all sorts of reasons. Maybe they're trying a hard to reach line and leave several attempts behind. Or maybe they "get creative" and try to guess what is uncovered instead of actually following the coverage report, etc.

Less capable models are actually better at doing this. They're faster, don't "get creative" with weird ideas mid-task and cost less. Just make them work one test at the time. Spawn, do one test that verifiably increases overall coverage, exit. Once you reach a treshold, start the consolidating loop: pick a redundant pair of tests, consolidate, exit. And so on...

Of course, you can use a powerful model and babysit it as well. A few disambiguating questions and interruptions will guide them well. If you want true unattended though, it's damn hard to get stable results.

manmal · 2025-12-19T22:42:03 1766184123

If you read my comment, I was describing the consolidation part.

tlarkworthy · 2025-12-19T05:57:47 1766123867

We fixed this at work by instructing it to maximize coverage with minimal tests, which is closer to our coding style.

elbear · 2025-12-19T07:29:12 1766129352

Those tests were written by people. That's why they were confident that what the LLM implemented was correct.

jackschultz · 2025-12-19T14:28:02 1766154482

Meta about how important context is.

People see LLMs and tons of tests tests written in the same sentence, and think that shows how models love writing pointless tests. Rather than realizing that the tests are standard and people written to show that the model wrote code that is validated by a currently trusted source.

Shows the importance for us to always write comments that humans are going to read with the right context is _very_ similar to how we need to interact with LLMs. And if we fail to communicate with humans, clearly we're going to fail with models.

elbear · 2025-12-20T09:04:12 1766221452

Yeah, we now need to specify who wrote the tests, because it's important information.

wahnfrieden · 2025-12-18T22:24:01 1766096641

Yes

Skill issue... And perhaps the wrong model + harness

scottyah · 2025-12-19T01:03:22 1766106202

It's the semantics of "can", where it is used to suggest feasibility. When I moved and got a new commute, I still "could" bike to work, but it went from 30min to an hour and a half each way. While technically possible, I would have had to sacrifice a lot when losing two hours a day- laundry, cooking dinner, downtime. I always said I "can't really" bike to work, but there is a lot of context lost.

djvdq · 2025-12-19T08:42:27 1766133747

So you can, but don't want to.

scottyah · 2025-12-19T20:49:06 1766177346

zamadatix · 2025-12-18T22:50:51 1766098251

"Can" is too overloaded a word even with context provided, ranging from places like "could conceivably be achieved" to "usually possible".

The only hint you can dig out is where they might have limits feasibility around it. E.g. "I can fly first class all the time (if I limit the number of flights and spend an unreasonable portion of my weath on tickets)" is typically less useful an interpretation than "I can fly first class all the time (frequently without concern, because I'm very well off)", but you have to figure out which they are trying to say (which isn't always easy).

wahnfrieden · 2025-12-18T22:23:09 1766096589

I can't without seriously sacrificing productivity. (I've been coding for 30 years.)

7thpower · 2025-12-18T23:53:33 1766102013

What are you talking about? 5.2 literally just came out.

drdrey · 2025-12-19T15:54:36 1766159676

5.2-codex just came out. You could use codex with regular 5.2 for a week or so.

girvo · 2025-12-18T21:42:21 1766094141

> It is still not great at debugging things.

It's so fascinating to me that the thread above this one on this page says the opposite, and the funniest thing is I'm sure you're both right. What a wild world we live in, I'm not sure how one is supposed to objectively analyse the performance of these things

AstroBen · 2025-12-18T23:19:28 1766099968

Give them real world problems you're encountering and see which can solve them the best, if at all

A full week of that should give you a pretty good idea

Maybe some models just suit particular styles of prompting that do or don't match what you're doing

ssl-3 · 2025-12-19T04:15:14 1766117714

It's great at some things, and it's awful at other things. And this varies tremendously based on context.

This is very similar to how humans behave. Most people are great a small number of things, and there's always a larger set of things that we may individually be pretty terrible at.

The bots the same way, except: Instead of billions of people who each have their own skillsets and personalities, we've got a small handful of distinct bots from different companies.

And of course: Lies.

When we ask Bob or Lisa help with a thing that they don't understand very well, they usually will try to set reasonable expectations . ("Sorry, ssl-3, I don't really understand ZFS very well. I can try to get the SLOG -- whatever that is -- to work better with this workload, but I can't promise anything.")

Bob or Lisa may figure it out. They'll gather up some background and work on it, bring in outside help if that's useful, and probably tread lightly. This will take time. But they probably won't deliberately lie [much] about what they expect from themselves.

But when the bot is asked to do a thing that it doesn't understand very well, it's chipper as fuck about it. ("Oh yeah! Why sure I can do that! I'm well-versed in -everything-! [Just hold my beer and watch this!]")

The bot will then set forth to do the thing. It might fuck it all up with wild abandon, but it doesn't care: It doesn't feel. It doesn't understand expectations. Or cost. Or art. Or unintended consequences.

Or, it might get it right. Sometimes, amazingly-right.

But it's impossible to tell going in whether it's going to be good, or bad: Unlike Bob or Lisa, the bot always heads into a problem as an overly-ambitious pack of lies.

(But the bot is very inexpensive to employ compared to Bob or Lisa, so we use the bot sometimes.)

9dev · 2025-12-18T19:27:55 1766086075

I always wonder how people make qualitative statements like this. There are so many variables! Is it my prompt? The task? The specific model version? A good or bad branch out of the non-deterministic solution space?

Like, do you run a proper experiment where you hand the same task to multiple models several times and compare the results? Not snark by the way, I’m asking in earnest how you pick one model over another.

embedding-shape · 2025-12-18T19:31:58 1766086318

> Like, do you run a proper experiment where you hand the same task to multiple models several times and compare the results?

This is what I do. I have a little TUI that fires off Claude Code, Codex, Gemini, Qwen Coder and AMP in separate containers for most task I do (although I've started to use AMP less and less), and either returns the last message of what they replied and/or a git diff of what exactly they did. Then I compare them side by side. If all of them got something wrong, I update the prompt, fire them off again. Always starting from zero, and always include the full context of what you're doing with the first message, they're all non-interactive sessions.

Sometimes I do 3x Codex instead of different agents, just to double-check that all of them would do the same thing. If they go off and do different things from each other, I know the initial prompt isn't specific/strict enough, and again iterate.

dotancohen · 2025-12-18T19:42:35 1766086955

Please share! I'd much rather help develop your solution than vibe code one of my own ))

Honestly, I'd love to try that. My Gmail username is the same as my HN username.

nl · 2025-12-19T02:37:09 1766111829

Not the OP but I have https://github.com/nlothian/autocoder which supports a Github-centric workflow using the following options:

  - Claude
  - Codex
  - Kilocode
  - Amp
  - Mistral Vibe

Very vibe coded though.

handfuloflight · 2025-12-18T22:19:45 1766096385

What's this costing you?

versteegen · 2025-12-19T00:16:10 1766103370

So how do the models compare in your experience?

energy123 · 2025-12-18T23:24:13 1766100253

I have sent the same prompt to GPT-5.2 Thinking and Gemini 3.0 Pro many times because I subscribe to both.

GPT-5.2 Thinking (with extended thinking selected) is significantly better in my testing on software problems with 40k context.

I attribute this to thinking time, with GPT-5.2 Thinking I can coax 5 minutes+ of thinking time but with Gemini 3.0 Pro it only gives me about 30 seconds.

The main problem with the Plus sub in ChatGPT is you can't send more than 46k tokens in a single prompt, and attaching files doesn't help either because the VM blocks the model from accessing the attachments if there's ~46k tokens already in the context.

enraged_camel · 2025-12-18T19:33:00 1766086380

Last night I gave one of the flaky tests in our test suite to three different models, using the exact same prompt.

Gemini 3 and Gemini 3 Flash identified the root cause and nailed the fix. GPT 5.1 Codex misdiagnosed the issue and attempted a weird fix despite my prompt saying “don’t write code, simply investigate.”

I run these tests regularly, and Codex has not impressed me. Not even once. At best it’s on par, but most of the time it just fails miserably.

Languages: JavaScript, Elixir, Python

paustint · 2025-12-19T03:56:54 1766116614

The one time I was impressed with codex was when I was adding translations in a bunch of languages for a business document generation service. I used claude to do the initial work and cross checked with codex.

The codex agent ran for a long time and created and executed a bunch of python scripts (according to the output thinking text) to compare the translations and found a number of possible issues. I am not sure where the scripts were stored or executed, our project doesn't use python.

Then I fed the output of the issues codex found to claude for a second "opinion". Claude said that the feedback was obviously from someone that knew the native language very well and agreed with all the feedback.

I was really surprised at how long Codex was thinking and analyzing - probably 10 minutes. (This was ~1+mo ago, I don't recall exactly what model)

Claude is pretty decent IMO - amp code is better, but seems to burn through money pretty quick.

tmikaeld · 2025-12-18T20:24:07 1766089447

I have the same experience. To make it worse, there’s a mile of difference between the all too many versions and efforts..

thek3nger · 2025-12-19T09:43:53 1766137433

This works for me in general. If I am procrastinating, I ask a coding agent for a small task. If it works, I have something to improve upon. If it doesn’t work, my OCD forces me to “fix it.” :D

freedomben · 2025-12-19T01:46:15 1766108775

Same actually. Though, for some reasons Codex utterly falls down with podman, especially rootless podman. No matter how many explicit instructions I give it in the prompt and AGENTS.md, it will try to set a ton of variables and break podman. It will then try use docker (again despite explicit instructions not too) and eventually will try to sudo podman. One time I actually let it, and it reused its sudo perms to reconfigure selinux on my system, which completely broke it so that I could no longer get root on my own machine and the machine never booted again (because selinux was blocking everything). It has tried to do the same thing three times now on different projects.

So yeah, I use codex a lot and like it, but it has some really bad blind spots.

jillesvangurp · 2025-12-18T21:19:02 1766092742

> One surprising thing that codex helped with is procrastination.

Heh. It's about the same as an efficient compilation or integration testing process that is long enough to let it do it's thing while you go and browse Hacker News.

IMHO, making feedback loops faster is going to be key to improving success rates with agentic coding tools. They work best if the feedback loop is fast and thorough. So compilers, good tests, etc. are important. But it's also important that that all runs quickly. It's almost an even split between reasoning and tool invocations for me. And it is rather trigger happy with the tool invocations. Wasting a lot of time to find out that a naive approach was indeed naive before fixing it in several iterations. Good instructions help (Agents.md).

Focusing attention on just making builds fast and solid is a good investment in any case. Doubly so if you plan on using agentic coding tools.

wahnfrieden · 2025-12-18T22:26:41 1766096801

On the contrary, I will always use longer feedback cycle agents if the quality is better (including consulting 5.2 Pro as oracle or for spec work).

The key is to adapt to this by learning how to parallelize your work, instead of the old way of doings things where devs are expected to focus on and finish one task at a time (per lean manufacturing principles).

I find now that painfully slow builds are no longer a serious issue for me. Because I'm rotating through 15-20 agents across 4-6 projects so I always have something valuable to progress on. One of these projects and a few of these agents are clear priorities I return to sooner than the others.

anabis · 2025-12-19T00:35:30 1766104530

> One surprising thing that codex helped with is procrastination.

The Roomba effect is real. The AI models do all the heavy implementation work, and when it asks me to setup an execute tests, I feel obliged to get to it ASAP.

cmrdporcupine · 2025-12-19T15:03:37 1766156617

I think Opus + Claude Code is the more competent overall general "making things" system, while it makes sense to have a $20 Codex subscription to find bugs and review the things that Claude Code makes.

On its own, as sole author, I find Codex overcomplicates things. It will riddle your code with unnecessary helper functions and objects and pointless abstractions.

It is however useful for doing a once over for code review and finding the things that Claude rushed through.

BinaryIgor · 2025-12-18T19:01:06 1766084466

I have similar experiences with Claude Code ;) Have you used it as well? How does it compare?

tananaev · 2025-12-09T16:07:45 1765296465

Is it just a Bluetooth mic in a form of a ring? Or is there something more to this device?

wkat4242 · 2025-12-09T17:09:38 1765300178

Yes it says it can store 5 minutes of audio

tananaev · 2025-10-21T15:12:15 1761059535

We've had purpose built machines for a while now. I think the whole point is to have an adaptable machine that can replace remaining humans.

tananaev · 2025-10-03T19:03:45 1759518225

Read the article, but couldn't understand how they measured it.

To be fair, I think it's definitely a bubble, but it's hard to compare something like this.

kleiba · 2025-10-03T19:18:59 1759519139

It's a click-baity title, for sure.