linux-kernel - Re: [PATCH 2/2] moderr: add module error injection tool

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3ehu3r4hlsf7cpptofz2y5aq2bazidq4buxbddqj6gzvzd3eh3@wzlnbvdsc6ty>
Date: Fri, 28 Feb 2025 10:27:17 +0100
From: Daniel Gomez <da.gomez@...nel.org>
To: Lucas De Marchi <lucas.demarchi@...el.com>
Cc: Luis Chamberlain <mcgrof@...nel.org>, 
	Alexei Starovoitov <alexei.starovoitov@...il.com>, Daniel Gomez <da.gomez@...sung.com>, 
	Petr Pavlu <petr.pavlu@...e.com>, Sami Tolvanen <samitolvanen@...gle.com>, 
	Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, 
	Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau <martin.lau@...ux.dev>, 
	Eduard Zingerman <eddyz87@...il.com>, Song Liu <song@...nel.org>, 
	Yonghong Song <yonghong.song@...ux.dev>, John Fastabend <john.fastabend@...il.com>, 
	KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...ichev.me>, 
	Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>, 
	Nathan Chancellor <nathan@...nel.org>, Nick Desaulniers <ndesaulniers@...gle.com>, 
	Bill Wendling <morbo@...gle.com>, Justin Stitt <justinstitt@...gle.com>, 
	linux-modules@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>, bpf <bpf@...r.kernel.org>, 
	clang-built-linux <llvm@...ts.linux.dev>, iovisor-dev <iovisor-dev@...ts.iovisor.org>, 
	gost.dev@...sung.com
Subject: Re: [PATCH 2/2] moderr: add module error injection tool

On Mon, Feb 24, 2025 at 08:43:45AM +0100, Lucas De Marchi wrote:
> On Sat, Feb 22, 2025 at 10:35:07PM +0100, Daniel Gomez wrote:
> > On Fri, Feb 21, 2025 at 12:15:40PM +0100, Luis Chamberlain wrote:
> > > On Wed, Feb 19, 2025 at 02:17:48PM -0600, Lucas De Marchi wrote:
> > > > On Tue, Jan 28, 2025 at 12:57:05PM -0800, Luis Chamberlain wrote:
> > > > > On Wed, Jan 22, 2025 at 09:02:19AM -0800, Alexei Starovoitov wrote:
> > > > > > On Wed, Jan 22, 2025 at 5:12 AM Daniel Gomez <da.gomez@...sung.com> wrote:
> > > > > > >
> > > > > > > Add support for a module error injection tool. The tool
> > > > > > > can inject errors in the annotated module kernel functions
> > > > > > > such as complete_formation(), do_init_module() and
> > > > > > > module_enable_rodata_after_init(). Module name and module function are
> > > > > > > required parameters to have control over the error injection.
> > > > > > >
> > > > > > > Example: Inject error -22 to module_enable_rodata_ro_after_init for
> > > > > > > brd module:
> > > > > > >
> > > > > > > sudo moderr --modname=brd --modfunc=module_enable_rodata_ro_after_init \
> > > > > > > --error=-22 --trace
> > > > > > > Monitoring module error injection... Hit Ctrl-C to end.
> > > > > > > MODULE     ERROR FUNCTION
> > > > > > > brd        -22   module_enable_rodata_after_init()
> > > > > > >
> > > > > > > Kernel messages:
> > > > > > > [   89.463690] brd: module loaded
> > > > > > > [   89.463855] brd: module_enable_rodata_ro_after_init() returned -22,
> > > > > > > ro_after_init data might still be writable
> > > > > > >
> > > > > > > Signed-off-by: Daniel Gomez <da.gomez@...sung.com>
> > > > > > > ---
> > > > > > >  tools/bpf/Makefile            |  13 ++-
> > > > > > >  tools/bpf/moderr/.gitignore   |   2 +
> > > > > > >  tools/bpf/moderr/Makefile     |  95 +++++++++++++++++
> > > > > > >  tools/bpf/moderr/moderr.bpf.c | 127 +++++++++++++++++++++++
> > > > > > >  tools/bpf/moderr/moderr.c     | 236 ++++++++++++++++++++++++++++++++++++++++++
> > > > > > >  tools/bpf/moderr/moderr.h     |  40 +++++++
> > > > > > >  6 files changed, 510 insertions(+), 3 deletions(-)
> > > > > >
> > > > > > The tool looks useful, but we don't add tools to the kernel repo.
> > > > > > It has to stay out of tree.
> > > > >
> > > > > For selftests we do add random tools.
> > > > >
> > > > > > The value of error injection is not clear to me.
> > > > >
> > > > > It is of great value, since it deals with corner cases which are
> > > > > otherwise hard to reproduce in places which a real error can be
> > > > > catostrophic.
> > > > >
> > > > > > Other places in the kernel use it to test paths in the kernel
> > > > > > that are difficult to do otherwise.
> > > > >
> > > > > Right.
> > > > >
> > > > > > These 3 functions don't seem to be in this category.
> > > > >
> > > > > That's the key here we should focus on. The problem is when a maintainer
> > > > > *does* agree that adding an error injection entry is useful for testing,
> > > > > and we have a developer willing to do the work to help test / validate
> > > > > it. In this case, this error case is rare but we do want to strive to
> > > > > test this as we ramp up and extend our modules selftests.
> > > > >
> > > > > Then there is the aspect of how to mitigate how instrusive code changes
> > > > > to allow error injection are. In 2021 we evaluated the prospect of error
> > > > > injection in-kernel long ago for other areas like the block layer for
> > > > > add_disk() failures [0] but the minimal interface to enable this from
> > > > > userspace with debugfs was considered just too intrusive.
> > > > >
> > > > > This effort tried to evaluate what this could look like with eBPF to
> > > > > mitigate the required in-kernel code, and I believe the light weight
> > > > > nature of it by just requiring a sprinkle with ALLOW_ERROR_INJECTION()
> > > > > suffices to my taste.
> > > > >
> > > > > So, perhaps the tools aspect can just go in:
> > > > >
> > > > > tools/testing/selftests/module/
> > > >
> > > > but why would it be module-specific?
> > > 
> > > Gotta start somewhere.
> > > 
> > > > Based on its current implementation
> > > > and discussion about inject.py it seems to be generic enough to be
> > > > useful to test any function annotated with ALLOW_ERROR_INJECTION().
> > > >
> > > > As xe driver maintainer, it may be interesting to use such a tool:
> > > >
> > > > 	$ git grep ALLOW_ERROR_INJECT -- drivers/gpu/drm/xe | wc -l  	23
> > > >
> > > > How does this approach compare to writing the function name on debugfs
> > > > (the current approach in xe's testsuite)?
> > > >
> > > > 	fail_function @ https://docs.kernel.org/fault-injection/fault-injection.html#fault-injection-capabilities-infrastructure
> > > > 	https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tests/intel/xe_fault_injection.c?ref_type=heads#L108
> > > >
> > > > If you decide to have the tool to live somewhere else, then kmod repo
> > > > could be a candidate.
> > > 
> > > Would we install this upon install target?
> > > 
> > > Danny can decide on this :)
> > > 
> > > > Although I think having it in kernel tree is
> > > > simpler maintenance-wise.
> > > 
> > > I think we have at least two users upstream who can make use of it. If
> > > we end up going through tools/testing/selftests/module/ first, can't
> > > you make use of it later?
> > 
> > What are the features in debugfs required to be useful for xe that we can
> > port to an eBPF version? I see from the link provided the use of probability,
> > interval, times and space but these are configured to allways trigger the error.
> > Is that right?
> 
> I don't think we use them... we just set them to "always trigger" and
> then create the conditions for that to happen.  But my question still
> remains:  what is the benefit of using the bpf approach over writing
> these files?

The code to trigger the error injection still needs to exist with both
approaches. My understanding from debugfs and the comment from Luis earlier in
the thread is that with eBPF you can mitigate how intrusive in-kernel code
changes are to allow error injection. Luis added the example here [1] for
debugfs.

[1] https://lore.kernel.org/all/20210512064629.13899-9-mcgrof@kernel.org/

To compare patch 8 in [1] with eBPF approach: the patch describes
all the necessary changes required to allow error injection on the
add_disk() path. With eBPF one would simply annotate the function(s) with
ALLOW_ERROR_INJECTION(), e.g. device_add() and replace the return value
in eBPF code with bpf_override_return() as implemented in moderr tool for
module_enable_rdata_after_init() for example.

New error injection users such modules or block/disk would benefit of the eBPF
approach with having a more simpler in-kernel code for the same purpose. Current
users of debugfs/errorinj would have to remove all the upstream intrusive error
injection related code if they want to adopt the eBPF approach.

Daniel

> 
> Lucas De Marchi
> 
> > 
> > > 
> > >   Luis