Russia has been slowly cracking down on popular communication and media platforms. First they slow down connection to unusable speeds. This happened to YouTube at some point last year. At first they even said that it's something wrong with Google and it's not them. I think the intention is to slowly get people off the platform without completely blocking it. Then eventually they block access completely. Same happened to messaging apps, like WhatsApp and Telegram. Telegram is still working for messaging, but not calls. It's kind of funny because Telegram is used by Russian military to coordinate a lot of things, so they complain a lot about the block.
I have family in Russia and it's a sad state of affairs. Our ability to communicate with them is slowly degrading to the point where now I am looking into self-hosted communications.
I have a similar situation and Amnezia (either in WG mode or Xray mode) works well with a self-hosted server. Also SSH tunnel as proxy so far also works.
To my surprise, even sophisticated means of traffic masking like amnezia and vxray get disrupted frequently, requiring hopping around self hosted solutions and updating ones setup periodically. That's waaay beyond what most people are capable of. I am fortunate to have some tech worker acquaintance who live next to my family members, otherwise there'd be no way for me to for example guide them through setup and re-configuration remotely. Still, this setup gets disrupted every month or so requiring manual intervention.
Try to get a middle hop somewhere at a russian datacenter. Sometimes these have DPI censorship boxes disabled (?) -- I know one that lets me forward simple Wireguard from mobile routers to a EU server with a few SNAT/DNAT rules, even though ordinarily that would get blocked at first sight.
(Sadly, it's just Mikrotik gear that can't use any fancy censorship evasion protocols).
I would say they are trying to block every public VPN, and if some VPN tried to hide behind CloudFlare's backs thinking that they took all CloudFlare sites as hostages, then whole CloudFlare is nuked, and hostages did not save VPN from blocking.
I'm considering even creating a dial-up (yes, V.34 modem!) line somewhere near to Russia, to offer a side channel with text browsing, news, IRC and email. For when things get really, really bad (they will ...)
Before you ask: yes, dialup works on modern networks if the codec is G.711 (uncompressed). Most public phone network is this way because fax is a thing, but some bulk carriers or some enterprises use compressed codecs.
Nationalistic flamewar is not allowed on Hacker News, regardless of nation. Personal attacks aren't allowed either. We ban accounts that post like this, so please don't.
I'm sure you have good reason to feel the way you do, but please, no more of this here.
Edit: you've unfortunately been breaking the site guidelines in other places as well, and we've already warned you once. If you'd please review the guidelines and stick to them when posting here, we'd appreciate it.
I feel your take will be taken down soon as it’s not the place for such discussion.
Just food for thought: what makes you to feel so entitled to judge people place of living? You know how we are not choosing where to be born and our mobility is restricted a lot of times? Is your separation for man and women comes from religion or other bias to categorize them?
This is the literal job of state security. Besides, most of those $#@$###s get in not by pretending they've been persecuted, but simply because they already have deep cover and hold passports of other countries.
And I sincerely hope that you will never have to know what it's like to flee your own country, first hand. Peace.
Oh look at you, how easy for you to demand of others that they put themselves in danger before you deem them worthy of protection. "You must have proof you've been arrested by Putin's police before we let you in here!". So they must risk the chance of immediate imprisonment in a freezing Siberian dungeon before you open your generous doors...
And graduates working for the Belarussian state, why is that acceptable and not considered as "conspiring with the war criminals" in your eyes? What other barriers are there that you have in your mind we don't know about, for someone who's worked as expected, and fled the country afterwards?
Fuck you. I have many people here (many of them queer) who had to leave everything and become forced emigres or asylum seekers.
Have a bit of compassion, would ya?
My childhood crush is in Ukraine (I mean, he's Ukrainian), my dear friend (Ukrainian) had to leave everything and seek refuge in NL. My friends (Russian) are under a constant threat of getting imprisoned for 10+ years because they still help support queer and trans people in Russia.
Compared to them I feel very privileged, because I was able to GTFO on my own. But if you think that all Russian citizens must be deported, you're either a troll or a madman. Besides, this is exactly what, for example, Stalin did to Chechens, or think about what the USA did to the Japanese.
Did it help someone? No. Did it ruin millions of lives? You fucking guess.
upd: made it all clearer, and sorry for all the profanity
>It's kind of funny because Telegram is used by Russian military to coordinate a lot of things, so they complain a lot about the block.
If that's true, then it was really stupid of them to allow things to get to that point. Look at the US -- they had no tolerance for a major social media app (TikTok) to be outside their own control, and they weren't even in a major war at the time. It seems obvious that if you ARE in a major war, you wouldn't want your main social media and messaging app to be under the control of somebody (Pavel Durov) who was recently arrested by a member (France) of the military alliance you're fighting against (NATO), when it is unclear what deal he may have made with that government to be released from prison. It seems obvious to suspect that the price of his freedom may have been a backdoor that allows the opposing military to read all the messages your own people are sending.
The real failure of Russia's is that, unlike the US, they have been systematically unable to keep its own top tech talent supportive of their own government. The top US tech companies have been only too eager to do almost anything their government asks of them, with only some rare and tepid pushback (such as that by Anthropic recently), that seems to get severely punished when it does happen. So there has been no need for the US government to go to the extents that Russia is going to now, simply because they were able to coopt their top talent into working for and with the state (with some rare exceptions like Snowden, and I'd say the "damage" from that has been pretty successfully contained).
The Chinese government may have had some issues with that as well, considering what happened with Jack Ma (though I don't know much about it).
> unable to keep its own top tech talent supportive of their own government
Government did much to turn them away. And with regards to Makh messanger. Patriotic tech talents are supposed to be interested in Elbrus 2000 PC and Aurora mobile OS. Does Makh messenger work on something Russian? No, Makh does not work on anything Russian. So what makes Makh Russian? We don't get it. It's some another Russia that we don't belong to. Our Russia is Elbrus 2000 PC and Aurora mobile OS. And software from Astra group. People behind Elbrus 2000 support orthodoxal christianity, and people behind Astra group are for great Soviet past. They call one of Astra Linux releases "Leningrad". The proper name of what is currently known as "Saint-Petersburg."
Makh is from some commercial group that does not care about our values. Virtually openly violates traditional values. They are from VK group, and VK hosts VKFest, an open air for youth with rotten words, furnication songs, all that stuff.
Our Russia and their Russia don't mix like water and oil.
For military there is another communication network called Свод (Svod, Arch). It was 4 years late to the party, but at least goes in now.
> It's kind of funny because Telegram is used by Russian military to coordinate a lot of things, so they complain a lot about the block.
This plus the starlink cutoff blinded them so badly Ukraine was able to counterattack and retake a bit of area north of Huliaipole, with armored vehicles (which normally attract immediate drone response these days) last I checked operations are still ongoing, so it’ll be a bit before we know the extent of what they were able to do.
That might satisfy message-privacy and connectivity, but it seems it'd be vulnerable when it comes to identity-privacy and detection.
I suppose you could use an LLM on each end to write superficially plausible messages and use ~sten~ steganography, although then there's still the problem of "Weird, this user types at 500WPM without sleeping."
YouTube is easily accelerated via DPI bypass. It used to work with CloudFlare too, but not anymore, so CloudFlare is banned more hard than YouTube. With DPI bypass YouTube is very fast.
While 7700 per hour sounds big, pretty much any dinky server can handle it. So I don't think it's a matter of DDoS. At this point it's just... odd behaviour.
especially for a txt file. I don't know anything really about webdev but I'm pretty sure serving up 7700 plaintext files with roughly 10 lines each an hour isn't that demanding
I think it's really poor argument that AGI won't happen because model doesn't understand physical world. That can be trained the same way everything else is.
I think the biggest issue we currently have is with proper memory. But even that is because it's not feasible to post-train an individual model on its experiences at scale. It's not a fundamental architectural limitation.
When people move the goal posts for AGI toward a physical state, they are usually doing it so they can continue to raise more funding rounds at a higher valuation. Not saying the author is doing that.
I recommend checking history of deregulation of agricultural industry in New Zealand. It didn't lose the industry. Actually the opposite happened.
Persistent government subsidies are almost never a good idea long term. I understand that some temporary support might make sense in some cases, but not permanent one. It prevents innovation and optimization. And in the long run it usually makes more damage.
Having been in the NZ ag tech industry for the last 25+ years, US subsidies and tarrifs drove a lot of innovation in NZ (also Europe) and then US manufacturers in the spaces I've been in have pretty much collapsed when faced with better tech as farmers switched to using our ( or the European) tech.
A lot of meat cutting (and packaging) robotics and dairy automation are the flashy ones. Softer tech like crop, orchard management and cultivar creation as well as stock breeding/selection or logistics all of which came a long way. The development of uses for byproducts i.e. chemical refineries to change milk into something like protein or milk powder and use the secondary products from those processes to produce alcohols or fertilizer.
It would appear that to remain competitive they had massive consolidation, and with that an increase in animal density leading to major issues with water pollution.
Downvoting without engaging in a discussion kind of directly violates both the spirit and rules around here.
I've posted pretty solid evidence that dregulation, did not, in fact improve the agricultural situation for New Zealand. It absolutely made a subset of corporations and mega-farmers extremely rich at the expense of the natural resources the rest of the country shares. Would LOVE to hear the arguments about how that's a good thing for the people of New Zealand or our planet as a whole.
But then again, that would require thoughtful discourse...
Just to expand on this idea with more historical context: part of the reason agriculture is regulated like it is in the US is because it used to be much more deregulated. And then speculation and profiteering in agriculture in the 1920s contributed to the great depression and caused the dust bowl. Then, it became a national food security issue. The New Deal is where a lot of the regulation and subsidies originate, but we didn't just do it for kicks. We have, actually, tried the alternative, and it was a disaster.
Because it goes against the urban popular group think. "Blue States subsidizing Red States" "NZ did it, so US can to"
Provide any real or partial claims this isn't the whole story and it's difficult to change your mind on something that is fundamental to your beliefs. So downvote and move to the next post that validates your beliefs. Happens to everyone including me.
I was very skeptical about Codex at the beginning, but now all my coding tasks start with Codex. It's not perfect at everything, but overall it's pretty amazing. Refactoring, building something new, building something I'm not familiar with. It is still not great at debugging things.
One surprising thing that codex helped with is procrastination. I'm sure many people had this feeling when you have some big task and you don't quite know where to start. Just send it to Codex. It might not get it right, but it's almost always good starting point that you can quickly iterate on.
Infinitely agree with all. I was skeptical, and then tried Opus 4.5 and was blown away. Codex with 5.0 and 5.1 wasn't great, but 5.2 is big improvement. I can't do code without it because there's no point. Time and quality with the right constraints, you're going to get better code.
And same thought with both procrastination because of not knowing where to start, but also getting stuck in the middle and not knowing where to go. Literally never happens anymore. Having discussions with it for doing the planning and different options for implementations, and you get to the end with a good design description and then, what's the point of writing the code yourself when with that design, it's going to write it quickly and matching the agreements.
You can code without it. Maybe you don't want to, but if you're a programmer, you can
(here I am remembering a time I had no computer and would program data structures in OCaml with pen and paper, then would go to university the next day to try it. Often times it worked the first try)
Sure, but the end of this post [0] is where I'm at. I don't feel the need or want to write the code when I can spend my time doing the other parts that are much more interesting and valuable.
> Emil concluded his article like this:
> JustHTML is about 3,000 lines of Python with 8,500+ tests passing. I couldn’t have written it this quickly without the agent.
> But “quickly” doesn’t mean “without thinking.” I spent a lot of time reviewing code, making design decisions, and steering the agent in the right direction. The agent did the typing; I did the thinking.
> That’s probably the right division of labor.
>I couldn’t agree more. Coding agents replace the part of my job that involves typing the code into a computer. I find what’s left to be a much more valuable use of my time.
But are those tests relevant? I tried using LLMs to write tests at work and whenever I review them I end up asking it “Ok great, passes the test, but is the test relevant? Does it test anything useful?” And I get a “Oh yeah, you’re right, this test is pointless”
Keep track of test coverage and ask it to delete tests without lowering coverage by more than let’s say 0.01 percent points. If you have a script that gives it only the test coverage, and a file with all tests including line number ranges, it is more or less a dumb task it can work on for hours, without actually reading the files (which would fill context too quickly).
If you leave an agent for hours trying to increase coverage by percentage without further guiding instructions you will end up with lots of garbage.
In order to achieve this, you need several distinct loops. One that creates tests (there will be garbage), one that consolidates redundant tests, one that parametrizes repetitive tests, and so on.
Agents create redundant tests for all sorts of reasons. Maybe they're trying a hard to reach line and leave several attempts behind. Or maybe they "get creative" and try to guess what is uncovered instead of actually following the coverage report, etc.
Less capable models are actually better at doing this. They're faster, don't "get creative" with weird ideas mid-task and cost less. Just make them work one test at the time. Spawn, do one test that verifiably increases overall coverage, exit. Once you reach a treshold, start the consolidating loop: pick a redundant pair of tests, consolidate, exit. And so on...
Of course, you can use a powerful model and babysit it as well. A few disambiguating questions and interruptions will guide them well. If you want true unattended though, it's damn hard to get stable results.
People see LLMs and tons of tests tests written in the same sentence, and think that shows how models love writing pointless tests. Rather than realizing that the tests are standard and people written to show that the model wrote code that is validated by a currently trusted source.
Shows the importance for us to always write comments that humans are going to read with the right context is _very_ similar to how we need to interact with LLMs. And if we fail to communicate with humans, clearly we're going to fail with models.
It's the semantics of "can", where it is used to suggest feasibility. When I moved and got a new commute, I still "could" bike to work, but it went from 30min to an hour and a half each way. While technically possible, I would have had to sacrifice a lot when losing two hours a day- laundry, cooking dinner, downtime. I always said I "can't really" bike to work, but there is a lot of context lost.
"Can" is too overloaded a word even with context provided, ranging from places like "could conceivably be achieved" to "usually possible".
The only hint you can dig out is where they might have limits feasibility around it. E.g. "I can fly first class all the time (if I limit the number of flights and spend an unreasonable portion of my weath on tickets)" is typically less useful an interpretation than "I can fly first class all the time (frequently without concern, because I'm very well off)", but you have to figure out which they are trying to say (which isn't always easy).
It's so fascinating to me that the thread above this one on this page says the opposite, and the funniest thing is I'm sure you're both right. What a wild world we live in, I'm not sure how one is supposed to objectively analyse the performance of these things
It's great at some things, and it's awful at other things. And this varies tremendously based on context.
This is very similar to how humans behave. Most people are great a small number of things, and there's always a larger set of things that we may individually be pretty terrible at.
The bots the same way, except: Instead of billions of people who each have their own skillsets and personalities, we've got a small handful of distinct bots from different companies.
And of course: Lies.
When we ask Bob or Lisa help with a thing that they don't understand very well, they usually will try to set reasonable expectations . ("Sorry, ssl-3, I don't really understand ZFS very well. I can try to get the SLOG -- whatever that is -- to work better with this workload, but I can't promise anything.")
Bob or Lisa may figure it out. They'll gather up some background and work on it, bring in outside help if that's useful, and probably tread lightly. This will take time. But they probably won't deliberately lie [much] about what they expect from themselves.
But when the bot is asked to do a thing that it doesn't understand very well, it's chipper as fuck about it. ("Oh yeah! Why sure I can do that! I'm well-versed in -everything-! [Just hold my beer and watch this!]")
The bot will then set forth to do the thing. It might fuck it all up with wild abandon, but it doesn't care: It doesn't feel. It doesn't understand expectations. Or cost. Or art. Or unintended consequences.
Or, it might get it right. Sometimes, amazingly-right.
But it's impossible to tell going in whether it's going to be good, or bad: Unlike Bob or Lisa, the bot always heads into a problem as an overly-ambitious pack of lies.
(But the bot is very inexpensive to employ compared to Bob or Lisa, so we use the bot sometimes.)
I always wonder how people make qualitative statements like this. There are so many variables! Is it my prompt? The task? The specific model version? A good or bad branch out of the non-deterministic solution space?
Like, do you run a proper experiment where you hand the same task to multiple models several times and compare the results? Not snark by the way, I’m asking in earnest how you pick one model over another.
> Like, do you run a proper experiment where you hand the same task to multiple models several times and compare the results?
This is what I do. I have a little TUI that fires off Claude Code, Codex, Gemini, Qwen Coder and AMP in separate containers for most task I do (although I've started to use AMP less and less), and either returns the last message of what they replied and/or a git diff of what exactly they did. Then I compare them side by side. If all of them got something wrong, I update the prompt, fire them off again. Always starting from zero, and always include the full context of what you're doing with the first message, they're all non-interactive sessions.
Sometimes I do 3x Codex instead of different agents, just to double-check that all of them would do the same thing. If they go off and do different things from each other, I know the initial prompt isn't specific/strict enough, and again iterate.
I have sent the same prompt to GPT-5.2 Thinking and Gemini 3.0 Pro many times because I subscribe to both.
GPT-5.2 Thinking (with extended thinking selected) is significantly better in my testing on software problems with 40k context.
I attribute this to thinking time, with GPT-5.2 Thinking I can coax 5 minutes+ of thinking time but with Gemini 3.0 Pro it only gives me about 30 seconds.
The main problem with the Plus sub in ChatGPT is you can't send more than 46k tokens in a single prompt, and attaching files doesn't help either because the VM blocks the model from accessing the attachments if there's ~46k tokens already in the context.
Last night I gave one of the flaky tests in our test suite to three different models, using the exact same prompt.
Gemini 3 and Gemini 3 Flash identified the root cause and nailed the fix. GPT 5.1 Codex misdiagnosed the issue and attempted a weird fix despite my prompt saying “don’t write code, simply investigate.”
I run these tests regularly, and Codex has not impressed me. Not even once. At best it’s on par, but most of the time it just fails miserably.
The one time I was impressed with codex was when I was adding translations in a bunch of languages for a business document generation service. I used claude to do the initial work and cross checked with codex.
The codex agent ran for a long time and created and executed a bunch of python scripts (according to the output thinking text) to compare the translations and found a number of possible issues. I am not sure where the scripts were stored or executed, our project doesn't use python.
Then I fed the output of the issues codex found to claude for a second "opinion". Claude said that the feedback was obviously from someone that knew the native language very well and agreed with all the feedback.
I was really surprised at how long Codex was thinking and analyzing - probably 10 minutes. (This was ~1+mo ago, I don't recall exactly what model)
Claude is pretty decent IMO - amp code is better, but seems to burn through money pretty quick.
This works for me in general. If I am procrastinating, I ask a coding agent for a small task. If it works, I have something to improve upon. If it doesn’t work, my OCD forces me to “fix it.” :D
Same actually. Though, for some reasons Codex utterly falls down with podman, especially rootless podman. No matter how many explicit instructions I give it in the prompt and AGENTS.md, it will try to set a ton of variables and break podman. It will then try use docker (again despite explicit instructions not too) and eventually will try to sudo podman. One time I actually let it, and it reused its sudo perms to reconfigure selinux on my system, which completely broke it so that I could no longer get root on my own machine and the machine never booted again (because selinux was blocking everything). It has tried to do the same thing three times now on different projects.
So yeah, I use codex a lot and like it, but it has some really bad blind spots.
> One surprising thing that codex helped with is procrastination.
Heh. It's about the same as an efficient compilation or integration testing process that is long enough to let it do it's thing while you go and browse Hacker News.
IMHO, making feedback loops faster is going to be key to improving success rates with agentic coding tools. They work best if the feedback loop is fast and thorough. So compilers, good tests, etc. are important. But it's also important that that all runs quickly. It's almost an even split between reasoning and tool invocations for me. And it is rather trigger happy with the tool invocations. Wasting a lot of time to find out that a naive approach was indeed naive before fixing it in several iterations. Good instructions help (Agents.md).
Focusing attention on just making builds fast and solid is a good investment in any case. Doubly so if you plan on using agentic coding tools.
On the contrary, I will always use longer feedback cycle agents if the quality is better (including consulting 5.2 Pro as oracle or for spec work).
The key is to adapt to this by learning how to parallelize your work, instead of the old way of doings things where devs are expected to focus on and finish one task at a time (per lean manufacturing principles).
I find now that painfully slow builds are no longer a serious issue for me. Because I'm rotating through 15-20 agents across 4-6 projects so I always have something valuable to progress on. One of these projects and a few of these agents are clear priorities I return to sooner than the others.
> One surprising thing that codex helped with is procrastination.
The Roomba effect is real. The AI models do all the heavy implementation work, and when it asks me to setup an execute tests, I feel obliged to get to it ASAP.
I think Opus + Claude Code is the more competent overall general "making things" system, while it makes sense to have a $20 Codex subscription to find bugs and review the things that Claude Code makes.
On its own, as sole author, I find Codex overcomplicates things. It will riddle your code with unnecessary helper functions and objects and pointless abstractions.
It is however useful for doing a once over for code review and finding the things that Claude rushed through.
reply