[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3562eeeb276dc9cc5f3b238a3f597baebfa56bad.camel@sipsolutions.net>
Date: Wed, 24 Sep 2025 14:52:32 +0200
From: Johannes Berg <johannes@...solutions.net>
To: Ethan Graham <ethan.w.s.graham@...il.com>, ethangraham@...gle.com,
glider@...gle.com
Cc: andreyknvl@...il.com, andy@...nel.org, brauner@...nel.org,
brendan.higgins@...ux.dev, davem@...emloft.net, davidgow@...gle.com,
dhowells@...hat.com, dvyukov@...gle.com, elver@...gle.com,
herbert@...dor.apana.org.au, ignat@...udflare.com, jack@...e.cz,
jannh@...gle.com, kasan-dev@...glegroups.com, kees@...nel.org,
kunit-dev@...glegroups.com, linux-crypto@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org, lukas@...ner.de,
rmoar@...gle.com, shuah@...nel.org, sj@...nel.org, tarasmadan@...gle.com
Subject: Re: [PATCH v2 0/10] KFuzzTest: a new kernel fuzzing framework
On Fri, 2025-09-19 at 14:57 +0000, Ethan Graham wrote:
>
> This patch series introduces KFuzzTest, a lightweight framework for
> creating in-kernel fuzz targets for internal kernel functions.
>
> The primary motivation for KFuzzTest is to simplify the fuzzing of
> low-level, relatively stateless functions (e.g., data parsers, format
> converters) that are difficult to exercise effectively from the syscall
> boundary. It is intended for in-situ fuzzing of kernel code without
> requiring that it be built as a separate userspace library or that its
> dependencies be stubbed out. Using a simple macro-based API, developers
> can add a new fuzz target with minimal boilerplate code.
So ... I guess I understand the motivation to make this easy for
developers, but I'm not sure I'm happy to have all of this effectively
depend on syzkaller.
You spelled out the process to actually declare a fuzz test, but you
never spelled out the process to actually run fuzzing against it. For
the record, and everyone else who might be reading, here's my
understanding:
- the FUZZ_TEST() macro declares some magic in the Linux binary,
including the name of the struct that describes the necessary input
- there's a parser in syzkaller (and not really usable standalone) that
can parse the vmlinux binary (and doesn't handle modules) and
generates descriptions for the input from it
- I _think_ that the bridge tool uses these descriptions, though the
example you have in the documentation just says "use this command for
this test" and makes no representation as to how the first argument
to the bridge tool is created, it just appears out of thin air
- the bridge tool will then parse the description and use some random
data to create the serialised data that's deserialized in the kernel
and then passed to the test
- side note: did that really have to be a custom serialization
format? I don't see any discussion on that, there are different
formats that exist already, I'd think?
- the test runs now, and may or may not crash, as you'd expect
I was really hoping to integrate this with ARCH=um and other fuzzers[1],
but ... I don't really think it's entirely feasible. I can basically
only require hard-coding the input description like the bridge tool
does, but that doesn't scale, or attempt to extract a few thousand lines
of code from syzkaller to extract the data...
[1] in particular honggfuzz as I wrote earlier, due to the coverage
feedback format issues with afl++, but if I were able to use clang
right now I could probably also make afl++ work in a similar way
by adding support for --fsanitize-coverage=trace-pc-guard first.
I'm not even saying that you had many choices here, but it's definitely
annoying, at least to me, that all this infrastructure is effectively
dependent on syzkaller due to all of this. At the same time, yes, I get
that parsing dwarf and getting a description out is not an easy feat,
and without the infrastructure already in syzkaller it'd take more than
the ~1.1kLOC (and even that is not small) it has now.
I guess the biggest question to me is ultimately why all that is
necessary? Right now, there's only the single example kfuzztest that
even uses this infrastructure beyond a single linear buffer [2]. Where
is all that complexity even worth it? It's expressly intended for
simpler pieces of code that parse something ("data parsers, format
converters").
[2] admittedly the auxdisplay one is slightly different and uses a
string, but that's pretty much equivalent
johannes
Powered by blists - more mailing lists