[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <438ff89e22a815c81406c3c8761a951b0c7e6916.camel@sipsolutions.net>
Date: Fri, 24 Oct 2025 10:37:57 +0200
From: Johannes Berg <johannes@...solutions.net>
To: Ethan Graham <ethan.w.s.graham@...il.com>
Cc: ethangraham@...gle.com, glider@...gle.com, andreyknvl@...il.com,
andy@...nel.org, brauner@...nel.org, brendan.higgins@...ux.dev,
davem@...emloft.net, davidgow@...gle.com, dhowells@...hat.com,
dvyukov@...gle.com, elver@...gle.com, herbert@...dor.apana.org.au,
ignat@...udflare.com, jack@...e.cz, jannh@...gle.com,
kasan-dev@...glegroups.com, kees@...nel.org, kunit-dev@...glegroups.com,
linux-crypto@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, lukas@...ner.de, rmoar@...gle.com, shuah@...nel.org,
sj@...nel.org, tarasmadan@...gle.com
Subject: Re: [PATCH v2 0/10] KFuzzTest: a new kernel fuzzing framework
Hi Ethan, all,
Sorry, my current foray into fuzzing got preempted by other things ...
> > So ... I guess I understand the motivation to make this easy for
> > developers, but I'm not sure I'm happy to have all of this effectively
> > depend on syzkaller.
>
> I would argue that it only depends on syzkaller because it is currently
> the only fuzzer that implements support for KFuzzTest. The communication
> interface itself is agnostic.
Yeah I can see how you could argue that. However, syzkaller is also
effectively the only fuzzer now that supports what you later call "smart
input generation", and adding it to any other fuzzer is really not
straight-forward, at least to me. No other fuzzer seems to really have
felt a need to have this, and there are ... dozens?
> > the record, and everyone else who might be reading, here's my
> > understanding:
> >
> > - the FUZZ_TEST() macro declares some magic in the Linux binary,
> > including the name of the struct that describes the necessary input
> >
> > - there's a parser in syzkaller (and not really usable standalone) that
> > can parse the vmlinux binary (and doesn't handle modules) and
> > generates descriptions for the input from it
> >
> > - I _think_ that the bridge tool uses these descriptions, though the
> > example you have in the documentation just says "use this command for
> > this test" and makes no representation as to how the first argument
> > to the bridge tool is created, it just appears out of thin air
>
> syzkaller doesn't use the bridge tool at all.
Right.
> Since a KFuzzTest target is
> invoked when you write encoded data into its debugfs input file, any
> fuzzer that is able to do this is able to fuzz it - this is what syzkaller
> does. The bridge tool was added to provide an out-of-the-box tool
> for fuzzing KFuzzTest targets with arbitrary data that doesn't depend
> on syzkaller at all.
Yes, I understand, I guess it just feels a bit like a fig-leaf to me to
paper over "you need syzkaller" because there's no way to really
(efficiently) use it for fuzzing.
> In the provided examples, the kfuzztest-bridge descriptions were
> hand-written, but it's also feasible to generate them with the ELF
> metadata in vmlinux. It would be easy to implement support for
> this in syzkaller, but then we would depend on an external tool
> for autogenerating these descriptions which we wanted to avoid.
Oh, I get that you wouldn't necessarily want to have a dependency on
syzkaller in the kernel example code, but in a sense my argument is that
there's no such tool at all since syzkaller cannot output anything, and
then you need to write all the descriptions by hand. Which is fine for
an _example_ but really doesn't scale to actually running fuzzing.
So then we're mostly back to "you need syzkaller to run fuzzing against
this", which at least to me isn't a great situation.
> > - the bridge tool will then parse the description and use some random
> > data to create the serialised data that's deserialized in the kernel
> > and then passed to the test
>
> This is exactly right. It's not used by syzkaller, but this is how it's
> intended to work when it's used as a standalone tool, or for bridging
> between KFuzzTest targets and an arbitrary fuzzer that doesn't
> implement the required encoding logic.
Yeah I guess, but that still requires hand-coding the descriptions (or
writing a separate parser), and notably doesn't work with a sort of in-
process fuzzing I was envisioning for ARCH=um. Which ought to be much
faster, and even combinable with fork() as I alluded to in earlier
emails.
> > I was really hoping to integrate this with ARCH=um and other fuzzers[1],
> > but ... I don't really think it's entirely feasible. I can basically
> > only require hard-coding the input description like the bridge tool
> > does, but that doesn't scale, or attempt to extract a few thousand lines
> > of code from syzkaller to extract the data...
>
> I would argue that integrating with other fuzzers is feasible, but it does
> require some if not a lot of work depending on the level of support. syzkaller
> already did most of the heavy lifting with smart input generation and mutation
> for kernel functions, so the changes needed for KFuzzTest were mainly:
>
> - Dynamically discovering targets, but you could just as easily write a
> syzkaller description for them.
> - Encoding logic for the input format.
>
> Assuming a fuzzer is able to generate C-struct inputs for a kernel function,
> the only further requirement is being able to encode the input and write
> it into the debugfs input file. The ELF data extraction is a nice-to-have
> for sure, but it's not a strict requirement.
I mean, yeah, I guess but ... Is there a fuzzer that is able generate
such input? I haven't seen one. And running the bridge tool separately
is going to be rather expensive (vs. in-process like I'm thinking
about), and some form of data extraction is needed to make this scale at
all.
Sure, I can do it all manually for a single test, but is it really a
good idea that syzkaller is the only thing that could possibly run this
at scale?
> > I guess the biggest question to me is ultimately why all that is
> > necessary? Right now, there's only the single example kfuzztest that
> > even uses this infrastructure beyond a single linear buffer [2]. Where
> > is all that complexity even worth it? It's expressly intended for
> > simpler pieces of code that parse something ("data parsers, format
> > converters").
>
> You're right that the provided examples don't leverage the feature of
> being able to pass more complex nested data into the kernel. Perhaps
> for a future iteration, it might be worth adding a target for a function
> that takes more complex input. What do you think?
Well, I guess my thought is that there isn't actually going to be a good
example that really _requires_ all this flexibility. We're going to want
to test (mostly?) functions that consume untrusted data, but untrusted
data tends to come in the form of a linear blob, via the network, from a
file, from userspace, etc. Pretty much only the syscall boundary has
highly structured untrusted data, but syzkaller already fuzzes that and
we're not likely to write special kfuzztests for syscalls?
> I'm not sure how much of the kernel complexity really could be reduced
> if we decided to support only simpler inputs (e.g., linear buffers).
> It would certainly simplify the fuzzer implementation, but the kernel
> code would likely be similar if not the same.
Well, you wouldn't need the whole custom serialization format and
deserialization code for a start, nor the linker changes around
KFUZZTEST_TABLE since run-time discovery would likely be sufficient,
though of course those are trivial. And the deserialization is almost
half of the overall infrastructure code?
Anyway, I don't really know what to do. Maybe this has even landed by
now ;-) I certainly would've preferred something that was easier to use
with other fuzzers and in-process fuzzing in ARCH=um, but then that'd
now mean I need to plug it in at a completely different level, or write
a DWARF parser and serializer if I don't want to have to hand-code each
target.
I really do want to do fuzz testing on wifi, but with kfuzztest it
basically means I rely on syzbot to actually run it or have to run
syzkaller myself, rather than being able to integrate it with other
fuzzers say in ARCH=um. Personally, I think it'd be worthwhile to have
that, but I don't see how to integrate it well with this infrastructure.
Also, more generally, it seems unlikely that _anyone_ would ever do
this, and then it's basically only syzbot that will ever run it.
johannes
Powered by blists - more mailing lists