Hacker Newsnew | past | comments | ask | show | jobs | submit | apgwoz's commentslogin

Yes! I’ve been trying (and failing!) to get people to understand this. Build the high leverage tools while the tokens are cheap. Unfortunately, I haven’t figured out the right set of high leverage tools. :)

As another data point, I pay for Pro for a personal account, and use no skills, do nothing fancy, use the default settings, and am out of tokens, with one terminal, after an hour. This is typically working on a < 5,000 line code base, sometimes in C, sometimes in Go. Not doing incredibly complicated things.

The benefit here is reducing the time to find vulnerabilities; faster than humans, right? So if you can rig a harness for each function in the system, by first finding where it’s used, its expected input, etc, and doing that for all functions, does it discover vulnerabilities faster than humans?

Doesn’t matter that they isolated one thing. It matters that the context they provided was discoverable by the model.


There is absolutely zero reason to believe you could use this same approach to find and exploit vulns without Mythos finding them first. We already know that older LLMs can’t do what Mythos has done. Anthropic and others have been trying for years.

> There is absolutely zero reason to believe you could use this same approach to find and exploit vulns without Mythos finding them first.

There's one huge reason to believe it: we can actually use small models, but we cant use Anthropic's special marketing model that's too dangerous for mere mortals.


If all you have is a spade, that is _not_ evidence that spades are good for excavating an entire hill.

It takes longer, but a spade is better than bare hands. The goal is to speed up finding valid vulnerabilities, and be faster than humans can do it.

> If all you have is a spade, that is _not_ evidence that spades are good for excavating an entire hill.

If you have an automated spade, that's still often better for excavating that hill than you using a shovel by hand.


From the article:

>At AISLE, we've been running a discovery and remediation system against live targets since mid-2025: 15 CVEs in OpenSSL (including 12 out of 12 in a single security release, with bugs dating back 25+ years and a CVSS 9.8 Critical), 5 CVEs in curl, over 180 externally validated CVEs across 30+ projects spanning deep infrastructure, cryptography, middleware, and the application layer.

So there is pretty good evidence that yes you can use this approach. In fact I would wager that running a more systematic approach will yield better results than just bruteforcing, by running the biggest model across everything. It definitely will be cheaper.


Why? They claim this small model found a bug given some context. I assume the context wasn’t “hey! There’s a very specific type of bug sitting in this function when certain conditions are met.”

We keep assuming that the models need to get bigger and better, and the reality is we’ve not exhausted the ways in which to use the smaller models. It’s like the Playstation 2 games that came out 10 years later. Well now all the tricks were found, and everything improved.


If this were true, we're essentially saying that no one tried to scan vulnerabilities using existing models, despite vulnerabilities being extremely lucrative and a large professional industry. Vulnerability research has been one of the single most talked about risks of powerful AI so it wasn't exactly a novel concept, either.

If it is true that existing models can do this, it would imply that LLMs are being under marketed, not over marketed, since industry didn't think this was worth trying previously(?). Which I suspect is not the opinion of HN upvoters here.


I use the models to look for vulnerabilities all the time. I find stuff often. Have I tried to do build a new harness, or develop more sophisticated techniques? No. I suspect there are some spending lots of tokens developing more sophisticated strategies, in the same way software engineers are seeking magical one-shot harnesses.

...The absolute last thing I'd want to do is feed AI companies my proprietary codebase. Which is exactly what using these things to scan for vulns requires. You want to hand me the weights, and let me set up the hardware to run and serve the thing in my network boundary with no calling home to you? That'd be one thing. Literally handing you the family jewels? Hell no. Not with the non-existence of professional discretion demonstrated by the tech industry. No way, no how.

To be honest, this just sounds like a ploy to get their hands on more training data through fear. Not buying it, and they clearly ain't interested in selling in good faith either. So DoA from my point-of-view anyways.


I don’t think these companies are hurting for access to code.

I like your stuff! I’ve been coveting a plotter for a while, but I’m pretty sure it won’t get used enough to justify the expense. :/

I do find the term “printmaking” hilarious because there’s just sooo many ways to make prints. I tried to get into linocut fairly recently, but the battleship grey linoleum I had wasn’t very good. It cracked and crumbled pretty easily. I did get some of pink Speedball “blocks,” but it gets expensive pretty quickly. I guess more to the point is the feeling that I lack much to say. But, that’s an excuse. :)


Thanks for the compliment. Linocuts and monotypes have been considered printmaking for hundreds of years. Those pen plots are also collaged and modified by hand, so not only mechanical. I see all that as traditional printmaking. As for having nothing to saw, there can be a lot of fulfillment in exploring basic themes, like geometric shapes or silhouettes of animals.

There’s no doubt that stuff is print making. My point is that there are multiple ways of doing (within each of these): relief, Intaglio, lithography, screen printing, offset.

So if you say, “I’m a print maker,” it describes basically nothing. :)

This is just a general statement, not directed at you. Sorry it felt that way.


You _can_ do trampolines, but that is kind of infectious, or needs to be very explicit with extra code, etc.

Indeed. It's not very efficient though. If I remember correctly Scala does this.

It’s also deterministic, unlike llms…

> That's like a truck company using horses to transport parts. Weird choice.

Easy way to claim more “horse power.”


The point this article makes, that suddenly agents can do the work of customizing free software, completely makes sense. But, the reality is that the Free Software movement is opposed to the way Lemons are built today, and would not accept a world like this. (Rightfully!)

My belief is that Lemons effectively kill open source in the long run, and generally speaking, people forget that Free Software is even a thing. The reasoning for that is simple: it’s too easy to produce a “clean” derivative with just the parts you need. Lemons do much better with a fully Lemoned codebase than they do with a hybrid. Incentives to “rewrite” also free people from “licensing burdens” while the law is fuzzy.


wtf is lemon? can't you just write normally



He may be referring to market of lemons

https://en.wikipedia.org/wiki/The_Market_for_Lemons


llm


The key to this argument is that we won’t need to rely on Anthropic/OpenAI soon — will they exist in the same way they do today in 12-18 months? The “open” models are getting better and better, and people are figuring out ways to make inference run on lesser hardware. It already might be viable for people that don’t expect “instantaneous” and are doing more hybrid development.

But you’re also never going to convince the people who still only run vi on the Linux console, without Xorg…


I was hoping someone made this comment! It remains high on my list of Frontalot songs. Big fan of “I’ll Form the Head” and “Stoop Sale” also from that album as well.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: