Great idea to make it just a simple URL change. Reminds me of the youtube download websites.
I made a similar CLI tool[0] with the added feature that you can pass `--outline` and it'll omit function bodies (while leaving their signatures). I've found it works really well for giving a high-level overview of huge repos.
You can then progressively expand specific functions as the LLM needs to see their implementation, without bloating up your context window.
You're right, it's likely that almost nobody is using `fmt.Sprintf` to build SQL queries in production.
Templating and `fmt.Sprintf` are essentially the same thing in this context - `Sprintf` just gets the point across in fewer lines of code, and allows people to come up with realistic scenarios themselves.
This is a great resource on fuzz testing algorithms & internals. I find myself coming back to it occasionally when building new fuzzing techniques at Fuzzbuzz.
I noticed that as well - most fuzzers will have a maximum duration or number of iterations they're allowed to attempt when minimizing so as not to starve out actual inputs. It could be that the fuzzer hit that limit, or potentially prioritizes readable inputs over small inputs.
For string inputs, some form of binary search ("Check if the bug exists in the first half or second half of the string") would be able to reduce this example to "Ö" in only a few iterations. Not sure if this just isn't implemented, or whether there's something more complex going on.
There's also the fact that I'd expect a fuzzer that knows about Unicode and UTF-8 strings to have a known list of weird behavior hardcoded as seed values, and certainly two-byte runes would be on that list.
Of course, this is only the first release with the fuzzer, and it already looks really amazing - all I'm really saying here is that I can't wait for these to be features of the fuzzer in the future!
I agree - I took a look at the minimization algorithm[0] and it seems like it loops through a few basic options, with the last one basically normalizing all possible bytes to something readable (like "0"). Part of the issue with trying to be as generic as possible is you sometimes can't find the best solution to every problem, this might be one of those situations.
I know the goal of 1.18 was to get the UX down, so I'm interested to see how it improves for 1.19.
The algorithm has similar complexity as binary search, but is a bit smarter on deciding how to split the test input at each iteration.
I’ve been studying this in my masters, and we’ve recently had to write a Java implementation. I’m keen to start on a Go package soon that might work well with fuzz testing.
Thanks for the questions & feedback! Concise docs are really important so this is all super useful. To answer your questions one by one:
1) The BrokenMethods are simple examples of programs that crash on buffer overflows/index out of range errors. If you were to pass "FUZ" into the Go method, it would check Data[3], thus causing a panic since there are only 3 elements in the string.
ninja edit: that python method in your comment IS a valid method with no error - a bit of a brain fart on my end when writing out the docs. It's been changed :)
2) In general a failure is any non-zero exit. We do this to be flexible in the way you report bugs. For C/C++ and Python this is usually with assertions, and in Go you can achieve something similar with:
if !x {
panic("Error")
}
We also have other checkers or "sanitizers" that run with your code to look for certain bugs. For C and C++ code we support tools like Address Sanitizer, which report memory bugs like Heap Buffer Overflows and UAFs, and for Golang you can choose to fuzz your code with a race condition checker. These are just some of the examples of more advanced fuzzing methods we support, and we'll be making nicer tutorials/screencasts to showcase those over the coming week.
3) Thanks for the fixes - much appreciated. And yeah, we know GitBook is pretty slow, and we're in the process of moving to another docs provider.
If you've got any more questions please let me know!
Great thanks for the answers. Maybe on the broken examples just put a comment on the line that's broken so the reader doesn't have to expend mental effort understanding why it's broken.
Sure! Some of the classes of bugs that remain low-hanging fruit for languages like Python include slowness, hangs, panics, race conditions, assert failures, excessive resource consumption and other Denial of Service attacks.
Other use cases include using fuzzing to compare implementations of libs that require the same functionality, detecting invariant violations, testing implementations that are meant to work together (i.e. serialize(deserialize(x)) == x).
In general fuzzing C/C++ libraries for memory bugs is the most commonly described use-case, but I think there are tons of fuzzing use cases that haven't been thoroughly explored yet.
Thanks for the link! We've been looking at all the current AFL-like/AFL wrappers for Java as we decide how best to implement Java fuzzing in Fuzzbuzz, and yours looks pretty nice.
We have! afl.rs[1] is awesome, and seeing as it's found some interesting bugs, I think Rust would be a great addition to Fuzzbuzz. It's on our roadmap.
Yep, we're definitely going to integrate more automated analysis. As of now we do some rudimentary analysis based off the type of the bug (Heap buffer overflow, UAF), read/write size, and similar metrics, but we'll be adding more advanced methods of categorization as the platform matures.
We've been thinking about the best way to use Fuzzbuzz to benefit the OSS/bug hunting community, and the integration idea is a great one. We're also providing free plans with extra CPU power for security researchers & bounty hunters.
Nice. I'm sure you've looked into various backends to use (in addition to AFL). Just wanted to give a shoutout to radamsa[0]. My [somewhat not-up-to-date] experience has been that it produced sometimes findings that AFL didn't (because of e.g. different approach relative to infinite input space).
In regard to CPUs - my laptop reports 4 CPUs, my workstation 16 - where the value for someone involved in fuzzizng would come in my mind would be if you could take away the hassle of scaling fuzzing 'transparently' to 100 or 1000 CPUs. What I am suggesting here is that on your pricing page you might be off by factor of 100 in regard to what number of CPUs actually make offering compelling to someone who would consider outsourcing their fuzzing infrastructure.
Radamsa is awesome! Definitely agree, and one of the goals for Fuzzbuzz is to be able to hot-swap between fuzzing backends without any interface changes (or to use all backends at the same time, to account for differences in findings).
re: pricing, we do offer infinite scalability in terms of CPUs, but that might not be as clear as we'd like it from our pricing page. Or maybe I'm misunderstanding you. Either way, if you have any more thoughts/suggestions on pricing I'd love to hear it.
I made a similar CLI tool[0] with the added feature that you can pass `--outline` and it'll omit function bodies (while leaving their signatures). I've found it works really well for giving a high-level overview of huge repos.
You can then progressively expand specific functions as the LLM needs to see their implementation, without bloating up your context window.
[0] https://github.com/everestmz/llmcat