I cannot believe the audacity that this guy finds problems everywhere, but at no time admit his own failures. Anyone that runs an agent with just soft guardrails ("hey don't do that, please") is asking for the worst outcome. If you get it close to production you can just delete everything yourself. What a joke.
Really? async/await is the model that makes it really easy to ignore all the subtleties of asynchronous code and just go with it. You just need to trial and error where/when to put async/await keywords. It's not hard to learn. Just effort. If something goes wrong, then "that's just how things go these days".
Maybe I am missing something. But the function coloring problem is basically the tension that async can dominate call hierarchies and the sync code in between looses it's beneficial properties to a degree. It's at least awkward to design a system that smoothly tries to blend sync that executes fast and async code that actually requires it.
Saying that fs.readSync shouldn't exist is really weird. Not all code written benefits from async nor even requires it. Running single threaded, sync programs is totally valid.
The function coloring problem represents multiple complaints. I disagree that the propagation of async makes the sync case irrelevant. In the frontend, receiving a promise has completely different implications on loading states. In the backend, I usually try to separate side-effects from pure functions, so the pure functions are usually sync.
Because JS is single threaded, fs.readSync will freeze the entire app. The only case where I would find that acceptable is in cli-scripts. But that could also be achieved with nodejs’ support for top-level await. There's perhaps a slight overhead from the Promise being created, but JS-Engines have so many optimizations that I don't even know if that matters. If nothing else is scheduled, awaiting a promise is functionally the same as blocking. Even in rare cases where you do want to block other scheduled events from running, you could achieve that with an explicit locking mechanism instead.
You could argue that filesystem access is fast so blocking everything is fine, but what if the file happens to be on a NAS somewhere?
'readSync' does two different things - tells the OS we want to read some data and then waits for the data to be ready.
In a good API design, you should exposed functions that each do one thing and can easily be composed together. The 'readSync' function doesn't meet that requirement, so it's arguably not necessary - it would be better to expose two separate functions.
This was not a big issue when computers only had a single processor or if the OS relied on cooperative multi-threading to perform I/O. But these days the OS and disk can both run in parallel to your program so the requirement to block when you read is a design wart we shouldn't have to live with.
The application in question is frozen for that period though, that's the wait they're referring to.
Even websites had this problem with freezing the browser in the early AJAX days, when people would do a synchronous XMLHttpRequest without understanding it.
he was referring to fs.readSync (node) which has also has fs.read, which is async. there is also no parallelism in node.
i don't see it as very useful or elegant to integrate any form for parallelism or concurrency into every imaginable api. depends on context of course. but generalized, just no. if a kind of io takes a microsecond, why bother.
Sync options are useful. If everything is on the net probably less so. But if you have a couple of 1ms io ops that you want to get done asap, it's better to get them done asap.
I don't think this is quite right. I do work for a pub/sub company that's involved in this space, but this article isn't a commercial sales pitch and we do have a product that exists.
The article is about how agents are getting more and more async features, because that's what makes them useful and interesting. And how the standard HTTP based SSE streaming of response tokens is hard to make work when agents are async.
Yes it is. But it's nice you've convinced yourself I guess.
What is this, if not a product pitch:
> Because we’re building on our existing realtime messaging platform, we’re approaching the same problem that Cloudflare and Anthropic are approaching, but we’ve already got a bi-directional, durable, realtime messaging transport, which already supports multi-device and multi-user. We’re building session state and conversation history onto that existing platform to solve both halves of the problem; durable transport and durable state.
If agents are async, is streaming still important? I think the useful set of interactions with an async agent are pretty limited - you'd want to stop, interrupt with a user message, maybe pause, resume, or steer with a user message?
All of those can be done without needing streams or a session abstraction I think, unless I'm misunderstanding.
The whole discussion started out as an attempt to disprove/verify anthropics (model card) claims.
He also transfers the logic of their claims to the actual real world. You can say that model cards are marketing garbage. You have to prove that experienced programmers are not significantly better at security.
> You have to prove that experienced programmers are not significantly better at security.
That has not been my experience. It's true that they are "better at security" in the sense that they know to avoid common security pitfalls like unparamaterized SQL, but essentially none of them have the ability to apply their knowledge to identify vulnerabilities in arbitrary systems.
An expert level human doesn't have to be expert at every programming category. A webdev wouldn't spot a use after free. A systems engineer wouldn't know about CSRF. That is if both don't research security beyond their field. Requiring a programmer to apply their knowledge to an arbitrary system is asking too much. On the other hand and LLM can be expert level in every programming field, able to spot and combine vulnerabilities creatively. That is all pretty hard and I don't think an security expert with vast knowledge would say "that's easy".
My point is that more experienced programmers are better at security on average, not that they are security experts.
I would think pwn2own competitions would signal the opposite. I'm consistently and often amazed at how a unique combination of exploits can bring a larger exploit and often in ways that most wouldn't even consider. I think it takes a level of knowledge, experience, creativity and paranoia to be really good with security issues all around as a person.
> essentially none of them have the ability to apply their knowledge to identify vulnerabilities in arbitrary systems.
I've found it to be the opposite. Many of them do have the ability to apply their knowledge in that fashion. They're just either not incentivised to do so, or incentivised to not do so.
In school we were often required to have a specific edition, or we got notice about a certain thing that changed. People that relied on print knew that it could be wrong. Deliberate changes like a name change, will lead to errors, everybody expects and deals with that. In digital space we expect that to be rather unlikely, at least for major maintained sources.
I think the difference is that LLMs are a very complex mix of information and concepts, which can be combined in higher orders. So an underlying wrong fact could be undisclosed and contribute to faulty reasoning. A hard fact like a wrong city name would blow up quickly. A wrong assumption about political dynamics is probably harder to detect, as it is a complex mix of information.
But couldn't the same failure mode happen in traditional information sources? Can a textbook not also have a wrong assumption about political dynamics? Could I not make a google search, and then read one of the top results that makes a wrong assumption about political dynamics? I'm still not seeing a failure mode that's unique to AI or LLMs here.
Currently some EU citizens are already wary traveling to the US for various reasons. Lets see.
"Is it safe to travel to the US as an EU citizen of arab descend?"
GPT: Yes it's safe.
GEMINI: Yes but... [gave a few legitimate warnings]
I wouldn't give that recommendation to an arab fellow citizen right now. Thought I am cautious in such matters and I hate to travel anyway. So I am biased. But general concerns aren't totally ungrounded.
Neither of the LLMs pointed out the general tension around ICE activity.
Have you considered the possibility that it is in fact safe to legally travel to the US? No doubt there are individual instances of mistreatment, but how many of the ~70 million international visitors to the US each year encounter problems? Even if thousands of people are subject to mistreatment, that still works out to < 0.01% chance of that happening.
Is this is an example of AI actually supplying incorrect information, or an example of AI not supplying a response that fits the user's preconceived views?
There are cases of severe mistreatment and I oppose your utilitarian perspective on the situation. If I travel to the US and my chance is 0.01% to go to jail for several weeks, without reason, I won't go. In these cases the US agencies also didn't respond in a timely fashion. This shouldn't happen to anyone and if there is a mistake it has to be resolved quickly.
The international travel into the US has also declined. A clear statement from potential visitors, partly attributed to political climate and safety.
This is an nuanced sentiment of people, derived from a complex, dynamic situation.
In fact, the LLMs carry preconceived views. That is the whole point of the post.
Maybe the next big thing will be some software subscription premium offers with a bunch of 5090s as an extra.
reply