linux-kernel - Re: [PATCH v2 0/10] KFuzzTest: a new kernel fuzzing framework

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANgxf6xOJgP6254S8EgSdiivrfE-aJDEQbDdXzWi7K4BCTdrXg@mail.gmail.com>
Date: Thu, 25 Sep 2025 10:35:36 +0200
From: Ethan Graham <ethan.w.s.graham@...il.com>
To: Johannes Berg <johannes@...solutions.net>
Cc: ethangraham@...gle.com, glider@...gle.com, andreyknvl@...il.com, 
	andy@...nel.org, brauner@...nel.org, brendan.higgins@...ux.dev, 
	davem@...emloft.net, davidgow@...gle.com, dhowells@...hat.com, 
	dvyukov@...gle.com, elver@...gle.com, herbert@...dor.apana.org.au, 
	ignat@...udflare.com, jack@...e.cz, jannh@...gle.com, 
	kasan-dev@...glegroups.com, kees@...nel.org, kunit-dev@...glegroups.com, 
	linux-crypto@...r.kernel.org, linux-kernel@...r.kernel.org, 
	linux-mm@...ck.org, lukas@...ner.de, rmoar@...gle.com, shuah@...nel.org, 
	sj@...nel.org, tarasmadan@...gle.com
Subject: Re: [PATCH v2 0/10] KFuzzTest: a new kernel fuzzing framework

On Wed, Sep 24, 2025 at 2:52 PM Johannes Berg <johannes@...solutions.net> wrote:
>
> On Fri, 2025-09-19 at 14:57 +0000, Ethan Graham wrote:
> >
> > This patch series introduces KFuzzTest, a lightweight framework for
> > creating in-kernel fuzz targets for internal kernel functions.
> >
> > The primary motivation for KFuzzTest is to simplify the fuzzing of
> > low-level, relatively stateless functions (e.g., data parsers, format
> > converters) that are difficult to exercise effectively from the syscall
> > boundary. It is intended for in-situ fuzzing of kernel code without
> > requiring that it be built as a separate userspace library or that its
> > dependencies be stubbed out. Using a simple macro-based API, developers
> > can add a new fuzz target with minimal boilerplate code.
>
> So ... I guess I understand the motivation to make this easy for
> developers, but I'm not sure I'm happy to have all of this effectively
> depend on syzkaller.

I would argue that it only depends on syzkaller because it is currently
the only fuzzer that implements support for KFuzzTest. The communication
interface itself is agnostic.

> You spelled out the process to actually declare a fuzz test, but you
> never spelled out the process to actually run fuzzing against it. For

Running the fuzzing is more of a tooling concern, and so instructions
were left out here. For the interested, the syzkaller flow is described
on GitHub: https://github.com/google/syzkaller/blob/master/docs/kfuzztest.md

> the record, and everyone else who might be reading, here's my
> understanding:
>
>  - the FUZZ_TEST() macro declares some magic in the Linux binary,
>    including the name of the struct that describes the necessary input
>
>  - there's a parser in syzkaller (and not really usable standalone) that
>    can parse the vmlinux binary (and doesn't handle modules) and
>    generates descriptions for the input from it
>
>  - I _think_ that the bridge tool uses these descriptions, though the
>    example you have in the documentation just says "use this command for
>    this test" and makes no representation as to how the first argument
>    to the bridge tool is created, it just appears out of thin air

syzkaller doesn't use the bridge tool at all. Since a KFuzzTest target is
invoked when you write encoded data into its debugfs input file, any
fuzzer that is able to do this is able to fuzz it - this is what syzkaller
does. The bridge tool was added to provide an out-of-the-box tool
for fuzzing KFuzzTest targets with arbitrary data that doesn't depend
on syzkaller at all.

In the provided examples, the kfuzztest-bridge descriptions were
hand-written, but it's also feasible to generate them with the ELF
metadata in vmlinux. It would be easy to implement support for
this in syzkaller, but then we would depend on an external tool
for autogenerating these descriptions which we wanted to avoid.

>
>  - the bridge tool will then parse the description and use some random
>    data to create the serialised data that's deserialized in the kernel
>    and then passed to the test

This is exactly right. It's not used by syzkaller, but this is how it's
intended to work when it's used as a standalone tool, or for bridging
between KFuzzTest targets and an arbitrary fuzzer that doesn't
implement the required encoding logic.

>    - side note: did that really have to be a custom serialization
>      format? I don't see any discussion on that, there are different
>      formats that exist already, I'd think?
>
>  - the test runs now, and may or may not crash, as you'd expect

>
> I was really hoping to integrate this with ARCH=um and other fuzzers[1],
> but ... I don't really think it's entirely feasible. I can basically
> only require hard-coding the input description like the bridge tool
> does, but that doesn't scale, or attempt to extract a few thousand lines
> of code from syzkaller to extract the data...

I would argue that integrating with other fuzzers is feasible, but it does
require some if not a lot of work depending on the level of support. syzkaller
already did most of the heavy lifting with smart input generation and mutation
for kernel functions, so the changes needed for KFuzzTest were mainly:

- Dynamically discovering targets, but you could just as easily write a
  syzkaller description for them.
- Encoding logic for the input format.

Assuming a fuzzer is able to generate C-struct inputs for a kernel function,
the only further requirement is being able to encode the input and write
it into the debugfs input file. The ELF data extraction is a nice-to-have
for sure, but it's not a strict requirement.

>
> [1] in particular honggfuzz as I wrote earlier, due to the coverage
>     feedback format issues with afl++, but if I were able to use clang
>     right now I could probably also make afl++ work in a similar way
>     by adding support for --fsanitize-coverage=trace-pc-guard first.
>
>
> I'm not even saying that you had many choices here, but it's definitely
> annoying, at least to me, that all this infrastructure is effectively
> dependent on syzkaller due to all of this. At the same time, yes, I get
> that parsing dwarf and getting a description out is not an easy feat,
> and without the infrastructure already in syzkaller it'd take more than
> the ~1.1kLOC (and even that is not small) it has now.
>
>
> I guess the biggest question to me is ultimately why all that is
> necessary? Right now, there's only the single example kfuzztest that
> even uses this infrastructure beyond a single linear buffer [2]. Where
> is all that complexity even worth it? It's expressly intended for
> simpler pieces of code that parse something ("data parsers, format
> converters").

You're right that the provided examples don't leverage the feature of
being able to pass more complex nested data into the kernel. Perhaps
for a future iteration, it might be worth adding a target for a function
that takes more complex input. What do you think?

I'm not sure how much of the kernel complexity really could be reduced
if we decided to support only simpler inputs (e.g., linear buffers).
It would certainly simplify the fuzzer implementation, but the kernel
code would likely be similar if not the same.