Hacker Newsnew | past | comments | ask | show | jobs | submit | 4corners4sides's commentslogin

People have made cool racing education simulators with this too: https://github.com/FT-Autonomous/ft_grandprix.

This is a pretty unique aesthetic. Does the author have other works / a blog? Can v0 / Claude Design do this these days?

v0 / Claude Design does this these days. As in, I may be wrong here, but this aesthetic is a good clue that the site was designed by AI. These models seem to love this brutalist high contrast look.


It's Claude Design. How do I know? A site I was working on has the same aesthetic... Which is disappointing but not unexpected.

Ah okay, thanks. We've come a long way since the purple gradients and glowy buttons but I suppose there is still a tell.

Roblox is the elephant in the room here which fills the niche for freemium, fun 3D experiences that run on basically any platform or device.


Can confirm. There are even growing numbers of high-quality games/projects on the platform (I explored it recently), including a fully interactive+realistic nuclear reactor and 1:1 DCS-like Airbus A320 simulator.

I suspect the popularity and ease of distribution/development on the platform makes it very attractive for developers with a dream.


Many of the games that actual kids spend time on are the purest expression of gaming slop (half-broken microtransaction gambling hell with schizophrenic flashing colors). Roblox and Fortnite's Islands system are both guilty of this. The problem is kids don't know any better and don't yet understand the value of money. The obvious response is "parents should handle this" and while I agree, there is no system to let them say "here are Robux/V-Bucks you can spend on quality content (e.g., Fortnite's Battle Pass is very well designed, quality content), but gambling slop is disabled".


Roblox accounts for about 50% of total video game play time, it’s insane how big Roblox is.

But not just Roblox. People are spending their time and money elsewhere too. Polymarket and sports betting for one.


Roblox is indeed eating the world


This article is a really good summary of current thinking on the “world model” conundrum that a lot of people are talking about, either directly or indirectly with respect to current day deployments of LLMs.

It synthesizes comments on “RL Environments” (https://ankitmaloo.com/rl-env/), “World Models” (https://ankitmaloo.com/world-models/) and the real reason that the “Google Game Arena” (https://blog.google/innovation-and-ai/models-and-research/go...) is so important to powering LLMs. In a sense it also relates to the notion of “taste” (https://wangcong.org/2026-01-13-personal-taste-is-the-moat.h...) and how / if it’s moat-worthiness can be eliminated by models.


The benchmarks look good. Slide decks and spreadsheets look better. The people must use Claude Cowork and have their Claude Code moment and figure out the consequences. It will be really interesting to see articles like this (https://mitchellh.com/writing/my-ai-adoption-journey) written by people who actually care about accuracy in places like KPMG to get their perspective on things.

I remember over hearing some normal people on the bus talking about essentially orchestrating some agent scraper to pull and summarise news from 40 different sites he identified as important which put him quite ahead of his peers. These were non-technical people orchestrating an agent workflow to make them better at work.

Though there’s not much that tickles my software brain here. But the agents are coming for us all.


This is OpenAI taking the concept of AI coworkers seriously down to the level of “identity” for these agents.

This reminded me of Kairos which came up a few days ago (https://www.kairos.computer/) however I actually feel much better and more inspired at the angle OpenAI took than the angle kairos took. OpenAI’s genuinely feels like a platform for a coworker while Kairos is yet another cool landing page, yet another agent platform with X amount of data integrations. The use cases in OpenAI’s article also felt more concrete and impressive to be honest.

The fact that “as agents have gotten more capable, the opportunity gap between what models can do and what teams can actually deploy has grown.” is definitely true. I think the analogy whose source I have forgot commented that we have F1 cars driving at 60/kmh so for a lot of enterprises they are not even at the deployment limit where improving benchmarks matter. They are still at the level of not being able to provide the right info, not having the right evaluation and improvement frameworks .etc.

Using “Opening the AI Frontier” as a heading would be in really poor taste before OpenAI released their OSS models (earning their ClosedAI moniker) but I guess it’s a bit less offensive now. I think this product combined with OpenAI FDEs is going to make a lot of large industries inaccessible to startups but there may still be value in companies like Kairos watching what OpenAI does in this space and copying them.


Large scale AI deployment has led to a complete change in what signal code actually conveys and what it means for maintainers. Code is no longer a yardstick for effort, care or expertise. If anything a large amount of it can be the opposite.

I read an article a while ago about how “taste is a moat” (https://wangcong.org/2026-01-13-personal-taste-is-the-moat.h...) and it kind of applies here. In that article a technically correct kernel patch was rejected since it actually just re-implemented functionality htat was available elsewhere. In the tldraw repo, users seem to clone the repo, spin up claude and then make a PR without any kind of “taste” involved.

What confuses me is the fact that tldraw is actually very good for trying to get the best out of models, and indeed, internal to tldraw, models are expected to be used and the author gets value out of them. And yet, people leave sloppy unvetted PRs. This is a social issue that we didn’t really have before since it was producing code was the difficult part. Now producing code and PRs is easy the signal v.s. Noise ratio has collapsed completely and it’s just not worth it for people to actually review this stuff.

It would be better for people to leave one line issues with video demonstrations and allow the internal team to /fix them: “In a world of AI coding assistants, is code from external contributors actually valuable at all? If writing the code is the easy part, why would I want someone else to write it?”. Is code really needed to convey problems with open source repos or is it something unnecessary that we are now unshackled from? In the case of tldraw a lot of the PRs are just the result of people running claude on issues and therefore they add absolutely zero value.


A compiler is another thing whose honor and pride that the models have taken from the nerds. In the past, people would debate for hours about the “dragon book” v.s. “writing interpreters” and present their cool bespoke compilers in Show HN articles. Now models can produce 100,000 lines of code over two weeks with no human intervention that actually work and can compile significant project. Which way now nerd? The models are getting better, are you?

The article has some really odd low level descriptions of bash orchestration which I suppose are important to illustrate how barebones it was. However I always feel it odd when we’re talking about agents that are lauded as borderline super intelligence and there is still low level bash being slung around – feels like we’re talking about things at the wrong level.

The point about writing extremely high quality tests reminds me a bit of the “hot mess theory of AI” (https://alignment.anthropic.com/2026/hot-mess-of-ai/) also made by anthropic where they essentially say that long horizon tasks are more likely to fall to incoherency than for a model to purposefully pursue incorrect results. This is phrased in the article as “Claude will work autonomously to solve whatever problem I give it. So it’s important that the task verifier is nearly perfect, otherwise Claude will solve the wrong problem”.

The author also observes something that I’ve realised after the initial joy of seeing an agent one shot a task wore off – for a 30 minute agent task, 25 minutes may be spent doing exploration of the environment. While it would be an offence to give a human unvetted model generated documentation and runbooks (I’m looking at you emoji ridden README.md files becoming more common across Show HN), models should commit things like this to memory for themselves to avoid repeatedly paying the “discovery tax” on every new action. Errors, hallucinations or changes cause the generated docs to fail create more busywork for the agent but agent time is less valuable than finite human life.


The author makes a point that you should redo every manual commit with AI to align you mental model of actions with how models work. This is something that I’m going to need to try. It’s related to my desire to reduce things like “discovery tax” (the phenomenon whereby a 5 minute agent task is 4 minutes of environment exploration and 1 minute of execution) and makes sure that models get things right the first time around, however, my AI improvement plan didn’t really account for how to improve the model in cases where I ended up manually resolving issues or implementing features.

Some arguments are made about retaining focus and single-mindedness while working on AI. I think these points are important. It’s related to the article on cutting out over-eager orchestration and focusing on validation work (https://sibylline.dev/articles/2026-01-27-stop-orchestrating...). There are a few sides to this covered in the article. You should always have high value task to switch to when the agent is working (instead of scrolling tiktok, instagram,X, youtube, facebook, hackernews .etc). In my case I might try start to read some books that I have on the backburner like Ghost in the Wires. You should disable agent notifications and take control of when you return to check the model context to be less ADHD ridden when programming with agents and actually make meaningful progress on the side task since you only context switch when you are satisfied. The final one is to always have at least one agent and preferably only one agent running in the background. The idea is that always having an agent results in a slow burn of productivity improvements and a process where you can slowly improve the background agent performance. Generally, always having some agent running is a good way to stay on top of what current model capabilities are.

I also really liked the idea of overnight agents for library research, redevelopment of projects to test out new skills, tests and AGENTS.md modifications.


A 77% score on terminal-bench 2 is really impressive. I remember reading the article about the pi coding agent (https://mariozechner.at/posts/2025-11-30-pi-coding-agent/) getting into the top ten percent of agents on that benchmark. It got about 50%. While it may still be in the top ten, that category just turned into one champion and a long of inferior offerings.

I was shocked to see that in the prompt for one of the landing pages the text “lavender to blue gradient” was included as if that’s something that anybody actually wants. It’s like going to the barber and saying “just make me look awful”.

This was my first time actually seeing what the GDPval benchmark looked like. Essentially they benchmark for all the artifacts that HR/finance might make or work on (onboarding documents, accounting spreadsheets, powerpoint presentations .etc). I think it’s good that models are trained to generate things like this well since people are going to use AI to do such anyway. If the middlemen passing AI ouputs around are going to be lazy I’m grateful that at least OpenAI researchers are cooking something behind the scenes.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: