linux-kernel - Re: [PATCH v2 0/10] KFuzzTest: a new kernel fuzzing framework

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAG_fn=XSUw=4tVpKE7Q+R2qsBzbA5+_XC1xH=goxAUZiRD7iyQ@mail.gmail.com>
Date: Tue, 28 Oct 2025 18:38:43 +0100
From: Alexander Potapenko <glider@...gle.com>
To: Johannes Berg <johannes@...solutions.net>
Cc: Ethan Graham <ethan.w.s.graham@...il.com>, ethangraham@...gle.com, 
	andreyknvl@...il.com, andy@...nel.org, brauner@...nel.org, 
	brendan.higgins@...ux.dev, davem@...emloft.net, davidgow@...gle.com, 
	dhowells@...hat.com, dvyukov@...gle.com, elver@...gle.com, 
	herbert@...dor.apana.org.au, ignat@...udflare.com, jack@...e.cz, 
	jannh@...gle.com, kasan-dev@...glegroups.com, kees@...nel.org, 
	kunit-dev@...glegroups.com, linux-crypto@...r.kernel.org, 
	linux-kernel@...r.kernel.org, linux-mm@...ck.org, lukas@...ner.de, 
	rmoar@...gle.com, shuah@...nel.org, sj@...nel.org, tarasmadan@...gle.com
Subject: Re: [PATCH v2 0/10] KFuzzTest: a new kernel fuzzing framework

On Fri, Oct 24, 2025 at 10:38 AM Johannes Berg
<johannes@...solutions.net> wrote:
>
> Hi Ethan, all,


Hi Johannes,

> > I would argue that it only depends on syzkaller because it is currently
> > the only fuzzer that implements support for KFuzzTest. The communication
> > interface itself is agnostic.
>
> Yeah I can see how you could argue that. However, syzkaller is also
> effectively the only fuzzer now that supports what you later call "smart
> input generation", and adding it to any other fuzzer is really not
> straight-forward, at least to me. No other fuzzer seems to really have
> felt a need to have this, and there are ... dozens?

Structure-aware fuzzing is not unique to syzkaller, nor are domain
constraints for certain values.
https://github.com/google/fuzztest is one example of a fuzzer that
supports both.
libFuzzer also supports custom mutators
(https://github.com/google/fuzzing/blob/master/docs/structure-aware-fuzzing.md)

> > Since a KFuzzTest target is
> > invoked when you write encoded data into its debugfs input file, any
> > fuzzer that is able to do this is able to fuzz it - this is what syzkaller
> > does. The bridge tool was added to provide an out-of-the-box tool
> > for fuzzing KFuzzTest targets with arbitrary data that doesn't depend
> > on syzkaller at all.
>
> Yes, I understand, I guess it just feels a bit like a fig-leaf to me to
> paper over "you need syzkaller" because there's no way to really
> (efficiently) use it for fuzzing.

When designing KFuzzTest, we anticipated two potential user scenarios:
1. The code author develops the fuzz test and runs it locally to
ensure its sanity and catch obvious errors.
2. The fuzz test lands upstream and syzkaller runs it continuously.

Ethan initially developed tools for both scenarios on the syzkaller
side, prioritizing simplicity of use over the diversity of potential
non-default fuzzing engines.
However, because smoke testing does not require a syzkaller
dependency, he added the bridge utility (I believe David Gow suggested
it).
That utility is easy to use for smoke testing, as it requires only a
one-line structure description.
I understand it may not be suitable for users who want to extensively
fuzz a particular test on their own machine without involving
syzkaller.

I agree we can do a better job by implementing some of the following options:
1. For tests without nested structures, or for tests that request it
explicitly, allow a simpler input format via a separate debugfs file.
2. Export the constraints/annotations via debugfs in a string format
so that fuzzers do not need vmlinux access to obtain them.
3. Export the fuzz test input structure as a string. (We've looked
into this and deemed it infeasible because test inputs may reference C
structures, and we don't have a reflection mechanism that would allow
us to dump the contents of existing structs).


> > This is exactly right. It's not used by syzkaller, but this is how it's
> > intended to work when it's used as a standalone tool, or for bridging
> > between KFuzzTest targets and an arbitrary fuzzer that doesn't
> > implement the required encoding logic.
>
> Yeah I guess, but that still requires hand-coding the descriptions (or
> writing a separate parser), and notably doesn't work with a sort of in-
> process fuzzing I was envisioning for ARCH=um. Which ought to be much
> faster, and even combinable with fork() as I alluded to in earlier
> emails.

Can you describe the interface between the fuzz test and the fuzzing
engine that you have in mind?
For ARCH=um, if you don't need structure awareness, I think the
easiest solution would be to make FUZZ_TEST wrap the code into
something akin to LLVMFuzzerTestOneInput()
(https://llvm.org/docs/LibFuzzer.html) that would directly pass random
data into the function under test. The debugfs interface is probably
excessive in this case.

But let's say we want to run in-kernel fuzzing with e.g. AFL++ - will
a simplified debugfs interface solve the problem?
What special cases can we omit to simplify the interface?

> I mean, yeah, I guess but ... Is there a fuzzer that is able generate
> such input? I haven't seen one. And running the bridge tool separately
> is going to be rather expensive (vs. in-process like I'm thinking
> about), and some form of data extraction is needed to make this scale at
> all.
>
> Sure, I can do it all manually for a single test, but is it really a
> good idea that syzkaller is the only thing that could possibly run this
> at scale?

Adding more fuzzing engines will not automatically allow us to run
this at scale.
For that, we'll need a continuous fuzzing system to manage the kernels
and corpora, report bugs, find reproducers, and bisect the causes.
I don't think building one for another fuzzing engine will be worth it.
That said, we can help developers better fuzz their code during local
runs by not always requiring the serialization format.

> > You're right that the provided examples don't leverage the feature of
> > being able to pass more complex nested data into the kernel. Perhaps
> > for a future iteration, it might be worth adding a target for a function
> > that takes more complex input. What do you think?
>
> Well, I guess my thought is that there isn't actually going to be a good
> example that really _requires_ all this flexibility. We're going to want
> to test (mostly?) functions that consume untrusted data, but untrusted
> data tends to come in the form of a linear blob, via the network, from a
> file, from userspace, etc. Pretty much only the syscall boundary has
> highly structured untrusted data, but syzkaller already fuzzes that and
> we're not likely to write special kfuzztests for syscalls?

We are not limited to fuzzing parsers of untrusted data. The idea
behind KFuzzTest is to validate that a piece of code can cope with any
input satisfying the constraints.
We could just as well fuzz a sorting algorithm or the bitops.
E.g. Will Deacon had the idea of fuzzing a hypervisor, which
potentially has many parameters, not all of which are necessarily
blobs.

> > I'm not sure how much of the kernel complexity really could be reduced
> > if we decided to support only simpler inputs (e.g., linear buffers).
> > It would certainly simplify the fuzzer implementation, but the kernel
> > code would likely be similar if not the same.
>
> Well, you wouldn't need the whole custom serialization format and
> deserialization code for a start, nor the linker changes around
> KFUZZTEST_TABLE since run-time discovery would likely be sufficient,
> though of course those are trivial. And the deserialization is almost
> half of the overall infrastructure code?

We could indeed organize the code so that simpler test cases (e.g. the
examples provided in this series) do not require the custom
serialization format.
I am still not convinced the whole serialization idea is useless, but
perhaps having a simplified version will unblock more users.

>
> Anyway, I don't really know what to do. Maybe this has even landed by
> now ;-) I certainly would've preferred something that was easier to use
> with other fuzzers and in-process fuzzing in ARCH=um, but then that'd
> now mean I need to plug it in at a completely different level, or write
> a DWARF parser and serializer if I don't want to have to hand-code each
> target.
>
> I really do want to do fuzz testing on wifi, but with kfuzztest it
> basically means I rely on syzbot to actually run it or have to run
> syzkaller myself, rather than being able to integrate it with other
> fuzzers say in ARCH=um. Personally, I think it'd be worthwhile to have
> that, but I don't see how to integrate it well with this infrastructure.

Can you please share some potential entry points you have in mind?
Understanding which functions you want to fuzz will help us simplify the format.

Thank you for your input!

> Also, more generally, it seems unlikely that _anyone_ would ever do
> this, and then it's basically only syzbot that will ever run it.
>
> johannes
>
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@...glegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/kasan-dev/438ff89e22a815c81406c3c8761a951b0c7e6916.camel%40sipsolutions.net.



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg