Hacker Newsnew | past | comments | ask | show | jobs | submit | tptacek's commentslogin

Why is that nonsense? Do you think they exhausted all their compute finding just the few big vulnerabilities they've already discussed, and don't have a budget to just keep cranking the machine to generate more?

They're not publishing SHAs for things that aren't confirmed vulnerabilities. They're doing exactly the thing you'd want them to do: they claim to have vulnerabilities when they have actual vulnerabilities.


If I understand Anthropic's statements correctly, they've been cranking for a while, and what they have now is the results of Mythos-enabled vulnerability scans on every important piece of software they could find. (I do want to acknowledge how crazy it is that "vulnerability scan all important software repos in the world" is even an operation that can be performed.)

We talked to Nicholas Carlini on SCW and did not at all get the impression that they've hit everything they can possibly hit. They're still proving the concept one target at a time, last I heard.

which statement, specifically, led you to interpret this claim?

> Over the past few weeks, we have used Claude Mythos Preview to identify thousands of zero-day vulnerabilities (that is, flaws that were previously unknown to the software’s developers), many of them critical, in every major operating system and every major web browser, along with a range of other important pieces of software.

They don’t explicitly rule out, I suppose, that these were only limited partial scans they did to find the vulnerabilities. But I don’t know why they’d do it that way, it’s not like they don’t have the resources to scan the entire Linux kernel.


i was trying to map "vulnerability scan all important software repos in the world" to an actual quote on their writing, but "every major operating system and every major web browser, along with a range of other important pieces of software" is not the same.

Important to understand it's not one-and-done; you can't "Mythos" Chrome and then put a checkmark next to it. It's a continuous process.

Can't you? My understanding is that that's exactly how security scans usually work - you run an analysis, find all the vulnerabilities, and then the continuous process is only there to check against the introduction of new vulnerabilities. Is that not the right mental model?

No, you cannot.

(A "security scanner" is a one-and-done proposition because it's deterministic and is going to find what it finds the first time you run and nothing more. But a software security assessment project you run every year on the same target with different teams will turn up different stuff every year. I'm at pains to remind people how totally lame source code security scanners are. People keep saying "static analyzers already do this" and like, nobody in security takes those tools seriously.)


Interesting. Thanks for the info, I’m going to have to read up on this at some point.

This is obviously just cope (there's a long, strong-form argument for why LLM-agent vulnerability research is plausibly much more potent than fuzzing, but we don't have to reach it because you can dispose of the whole argument by noting that agents can build and drive fuzzers and triage their outputs), but what I'd really like to understand better is why? What's the impetus to come up with these weird rationalizations for why it's not a big deal that frontier models can identify bugs everyone else missed and then construct exploits for them?

I don't have an anti-AI stance. Maybe I should have spelled that out more clearly in my comment above. I'm as excited and terrified by this technology as everyone else. I think we're all in vicious agreement that we need defense-in-depth - including LLMs and fuzzing (and static analysis and so on).

An LLM can guide all of this work, but current models tend to slowly go off the rails if you don't keep a hand on the wheel. I suspect this new model will be the same. I've had Opus4.6 write custom fuzzing tools from scratch, and I've gotten good results from that. But you just know people will prompt this new model by saying "make this software secure". And it'll forget fuzzing exists at all.


Good lord, why such a virulent response to something that seems like we should be considering?

As someone in cybersecurity for 10+ years my immediate assumption is why not both? I don’t think considering that they could both have their uses is “cope”.


Again: LLM agents already are both. But it's also remarkable and worth digging into the fact that LLM agents haven't needed fuzzers to produce many (any? in Anthropic Red's case?) of the vulnerabilities they're discussing.

Do we know that? I'd love to see some of the ways security researchers are using LLMs. We have no idea if claude was using fuzzing here, or just reading the files and spotting bugs directly in the source code.

A few weeks ago someone talked about their method for finding bugs in linux. They prompted claude with "Find the security bug in this program. Hint: It is probably in file X.". And they did that for every file in the repo.


> Since then, this weakness has been missed by every fuzzer and human who has reviewed the code, and points to the qualitative difference that advanced language models provide. [^1]

> At no point in time does the program take some easy-to-identify action that should be prohibited, and so tools like fuzzers can’t easily identify such weaknesses. [^2]

[^1]: https://red.anthropic.com/2026/mythos-preview/#:~:text=Since...

[^2]: https://red.anthropic.com/2026/mythos-preview/#:~:text=At%20...


Are you saying that LLMs can use fuzzers or are you saying that they work like fuzzers? Because one of those is less…deterministic? Then the other.

Regardless and in the spirit of my original response my answer would be to give the LLM access to a fuzzer (plus other tools etc) but also have fuzzers in the pipeline. Partially because that increases the determinism in the mix and partially because why not? Layering is almost always better than not.

But again more than anything I’m focusing on the accusations of cope. People SHOULD have measured reactions to claims about any product. People SHOULD be asking questions like this. I know that the LLM debate is often “spicy” but man let’s just try to lower the temperature a bit yeah?


LLMs can use fuzzers and also LLMs can explore the semantic space of a program in ways fuzzers can't.

You said it yourself. It's cope. That's all it is and all it ever was.

https://en.wikipedia.org/wiki/AI_effect

Every time an AI does something new, there's a human saying "it's not really doing that something", "it's doing that something in a fake way" or "that something was never important in the first place".


Alright, except that’s not what I was saying. I was just pointing out that LLMs don’t replace fuzzing or static analysis. They complement those techniques. And yes, LLMs may drive those techniques directly, but they often don’t. At least not yet.

For whatever it's worth I think I cosign all of this.

In the context of: a green username offering some salacious/conspiratorial things about djb around a topic I'm only a little familiar with... Its worth a lot. Its the difference between me writing it off as (at best) a poorly informed misunderstanding of a complex topic, and me choosing to spend some time learning more. Ty

None of this is really salacious or conspiratorial. I don't know how big a deal the attacks they're citing are. But this is directionally mostly stuff I've heard from lots of cryptography engineers over the last couple years. I know the comment is off comparing attacks on classical NTRU to SNTRUP though!

As someone way out of the loop on pqc, this bit:

> anyway, someone popular among some people in tech (the cryptographer Dan Bernstein) has been trying (successfully) to slow the PQC transition for ~10 years

Sounds enough like throwing shade to make me doubt it's value, in absence of other signals.

My point was your history of posting knowledgeably about security and cryptography provides the credibility for me to go do more reading about the stuff in mswphd's post.


Oh, Bernstein is a vocal and relentless opponent of MLKEM. Both the industry and research cryptography have settled on MLKEM. That's the subtext. You could word it differently and more charitably, but I wouldn't.

I don't think it matters one way or the other to your thesis but I'm skeptical that state-level CNE organizations were hoarding vulnerabilities before; my understanding is that at least on the NATO side of the board they were all basically carefully managing an enablement pipeline that would have put them N deep into reliable exploit packages, for some surprisingly small N. There are a bunch of little reasons why the economics of hoarding aren't all that great.

The economics would be different in say, North Korea, don't you think?

Why? What do you mean?

He really believes that exploits come out of North Korea (as per Daily Post reporting), not from other countries

North Korea uses a lot more them specifically to generate revenue.

That's exactly not what they're doing. They aren't creating operating system vulnerabilities. They're telling you about ones that already existed.

Well, in a slightly indirect manner. Claude is writing a ton of code, and therefore creating a lot of security vulnerabilities.

That's not what's happening here. This announcement is about the velocity with which Claude finds vulnerabilities in already-existing software.

Software already exists that has been written by Claude. They absolutely are selling the means to write software, and the means to securing the insecure software. At least for the time being. In the future Mythos will probably just make it possible to prompt good software from the start.

Ok. But mostly its entirely the old software, not the new software, that the bugs are being found in.

Maybe because there’s no critical and widely used software written by LLMs so far? Which says a lot about LLMs are failing to even approach the level of capabilities you would expect from all the hype? The goal has always been, even before LLMs, to find something smarter than our smarter humans. So far the success at that is really minuscule. Humans are still the benchmark, all things considered. Now they’re saying LLMs are going to be better than our best vulnerability researchers in a few months (literally what an Anthropic researcher said in a conference). Ok, that might happen. But the funny part is that the LLMs will definitely be the ones writing most of these vulnerabilities. So, to hedge against LLMs you must use LLMs. And that is gonna cost you more.

So today, most of the vulnerabilities being found by these tools are in code written by humans. Your hypothesis is that down the road, most of the vulnerabilities will be in code written by LLMs.

What seems more probable is that the same advances that LLMs are shipping to find vulnerabilities will end up baked into developer tooling. So you'll be writing code and using an LLM that knows how to write secure code.


I don't think claude wrote openbsd but to be honest that was before my time so I'm not sure

If it’s very good at finding security vulnerabilities, I would assume that the code it generates is much more hardened than anything your average developer can put out.

Mythos aside, frontier LLMs can already be used to find exploits at faster pace than humans alone. Whether that knowledge gets used to patch them or exploit them is dependent on the user. Cybersecurity has always been an arms race and LLMs are rapidly becoming powerful arms. Whether they like it or not LLM providers are now important dealers in that arms race. I appreciate Anthropic trying to give “good guys” a leg up (if that is indeed their real main motivation which I do find credible but not certain). But it’s still a scary world we’re entering and I doubt the fierce competition will leave all labs acting benevolently.

Dario is big on beating china, and no doubt he believes cyber security is how to do that. You can tell, but anthropic is sht at everything else. Nobody uses it for real research.

Everybody has been saying this for the last 15 years.

We're going to have to put all the bad code into a Wasm sandbox.

I don't think this is broadly true and to the extent it's true for cryptographic software, it's only relatively recently become true; in the 2000s and 2010s, if I was tasked with assessing software that "encrypted a file" (or more likely some kind of "message"), my bet would be on finding a game-over flaw in that.

Big open question what this will do to CNE vendors, who tend to recruit from the most talented vuln/exploit developer cohort. There's lots of interesting dynamics here; for instance, a lot of people's intuitions about how these groups operate (ie, that the USG "stockpiles" zero-days from them) weren't ever real. But maybe they become real now that maintenance prices will plummet. Who knows?

Which PQ algorithms would you be referring to here?


Why don't you go ahead and pick out the attacks in here that you think are relevant to this conversation? It can't be on me to do that, because obviously my subtext is that none of them are.

There are not in fact meaningful questions about whether the settled-on PQC constructions are secure, in the sense of "within the bounds of our current understanding of QC".

Didn't one of the PQC candidates get found to have a fatal classical vulnerability? Are we confident we won't find any future oopsies like that with the current PQC candidates?

The whole point of the competition is to see if anybody can cryptanalyze the contestants. I think part of what's happening here is that people have put all PQC constructions in bucket, as if they shared an underlying technology or theory, so that a break in one calls all of them into question. That is in fact not at all the case. PQC is not a "kind" of cryptography. It's a functional attribute of many different kinds of cryptography.

The algorithm everyone tends to be thinking of when they bring this up has literally nothing to do with any cryptography used anywhere ever; it was wildly novel, and it was interesting only because it (1) had really nice ergonomics and (2) failed spectacularly.


SIKE made it all the way to round 3. It failed spectacularly, but it happened rather abruptly. In one sense it wasn't surprising because of its novelty, but the actual attack was somewhat surprising--nobody was predicting it would crumble so thoroughly so quickly. Notably, the approach undergirding it is still thought secure; it was the particular details that caused it to fail.

It's hubris to say there are no questions, especially for key exchange. The general classes of mathematical problems for PQC seem robust, but that's generally not how crypto systems fail. They fail in the details, both algorithmically and in implementation gotchas.

From a security engineering perspective, there's no persuasive reason to avoid general adoption of, e.g., the NIST selections and related approaches. But when people suggest not to use hybrid schemes because the PQC selections are clearly robust on their own, well then reasonable people can disagree. Because, again, the devil is in the details.

The need to proclaim "no questions" feels more like a reaction to lay skepticism and potential FUD, for fear it will slow the adoption of PQC. But that's a social issue, and imbibing that urge may cause security engineers to let their guard down.


What's your point? SIKE has literally nothing to do with MLKEM. There is no relationship between the algorithms. Essentially everybody working on PQC, including Bernstein himself, have converged on lattices, which, again, were a competitor to curves as a successor to RSA --- they are old.

SIKE: not lattices. Literally moon math. Do you understand how SIKE/SIDH works? It's fucking wild.

I'm going to keep saying this: you know the discussion is fully off the rails when people bring SIKE/SIDH into it as evidence against MLKEM.


You may not have any questions about the security of ML-KEM, but many people do. See, for example, DJB's compilation of such doubts from the IETF WG: https://blog.cr.yp.to/20260221-structure.html

DJB himself seems to prefer hybrid over non-hybrid precisely over concern about the unknowns: https://blog.cr.yp.to/20260219-obaa.html

These doubts may not be the kind curious onlookers have in mind, but to say there are no doubts among researchers and practitioners is a misrepresentation. In fact, you're flatly contradicting what DJB has said on the matter:

> SIKE is not an isolated example: https://cr.yp.to/papers.html#qrcsp shows that 48% of the 69 round-1 submissions to the NIST competition have been broken by now.

https://archive.cr.yp.to/2026-02-21/18:04:14/o2UJA4Um1j0ursy...

Unqualified assurances is what you hear from a salesman. You're trying to sell people on PQC. There's no reason to believe ML-KEM is a lemon, but you're effectively saying, "it's the last KEX scheme we'll ever need", and that's just not honest from an engineering point of view, even if it's what people need to hear.


I think you just gave away the game. To the extent I believe a CRQC is imminent, I suppose I am "trying to sell people on PQC". But then, so is Daniel Bernstein, your only cryptographically authoritative cite to your concern. Bernstein's problem isn't that we're rushing to PQC. It's that we didn't pick his personal lattice proposal.

And, if we're on the subject of how trustworthy Bernstein's concerns are, I'll note again: in his own writing about the potential frailty of MLKEM, he cites SIKE, because, again, he thinks you're too dumb to understand the difference between a module lattice and a generic lattice.

Finally, I'm going to keep saying this until I don't have to say it anymore: PQC is not a "kind" of cryptography. It doesn't mean anything that N% of the Round 1 submissions to the NIST PQC Contest were cryptanalyzed. Multivariate quadratic equation cryptography, supersingular isogeny cryptography, and F_2^128 code-based cryptography are not related to each other. The point of the contest was for that to happen.


Yeah I get that, what I am really asking is that I know in my field, I can quickly get a vibe as to whether certain new work is good or not so good, and where any bugaboos are likely to be. For those who know PQC like I know economics, do they believe at this point that the algorithms have been analyzed successfully to a level comparable to DH or RSA? Or is this really gonna be a rush job under the gun because we have no choice?

Lattice cryptography was a contender alongside curves as a successor to RSA. It's not new. The specific lattice constructions we looked at during NIST PQC were new iterations on it, but so was Curve25519 when it was introduced. It's extremely not a rush job.

The elephant in the room in these conversations is Daniel Bernstein and the shade he has been casting on MLKEM for the last few years. The things I think you should remember about that particular elephant are (1) that he's cited SIDH as a reason to be suspicious of MLKEM, which indicates that he thinks you're an idiot, and (2) that he himself participated in the NIST PQC KEM contest with a lattice construction.


Bernstein's ego is at a level where he thinks most other people are idiots (not without some justification), that's been clear for decades. What are you hinting at?

I'm not saying anything about his ego or trying to psychoanalyze him. I'm saying: he attempted to get a lattice scheme standardized under the NIST PQC contest, and now fiercely opposes the standard that was chosen instead.

It's the same situation with classical encryption. It's not uncommon for a candidate algorithm [to be discovered ] to be broken during the selection process.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: