linux-kernel - Re: [PATCH 2/2] moderr: add module error injection tool

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <vo4aommlbguz7kll5svvgbodykxx7chywhf2wz5bbz4mibvver@cxrceuzt3dzw>
Date: Sat, 22 Feb 2025 22:29:22 +0100
From: Daniel Gomez <da.gomez@...nel.org>
To: Lucas De Marchi <lucas.demarchi@...el.com>
Cc: Luis Chamberlain <mcgrof@...nel.org>, 
	Alexei Starovoitov <alexei.starovoitov@...il.com>, Daniel Gomez <da.gomez@...sung.com>, 
	Petr Pavlu <petr.pavlu@...e.com>, Sami Tolvanen <samitolvanen@...gle.com>, 
	Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, 
	Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau <martin.lau@...ux.dev>, 
	Eduard Zingerman <eddyz87@...il.com>, Song Liu <song@...nel.org>, 
	Yonghong Song <yonghong.song@...ux.dev>, John Fastabend <john.fastabend@...il.com>, 
	KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...ichev.me>, 
	Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>, 
	Nathan Chancellor <nathan@...nel.org>, Nick Desaulniers <ndesaulniers@...gle.com>, 
	Bill Wendling <morbo@...gle.com>, Justin Stitt <justinstitt@...gle.com>, 
	linux-modules@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>, bpf <bpf@...r.kernel.org>, 
	clang-built-linux <llvm@...ts.linux.dev>, iovisor-dev <iovisor-dev@...ts.iovisor.org>, 
	gost.dev@...sung.com
Subject: Re: [PATCH 2/2] moderr: add module error injection tool

On Wed, Feb 19, 2025 at 02:17:48PM +0100, Lucas De Marchi wrote:
> On Tue, Jan 28, 2025 at 12:57:05PM -0800, Luis Chamberlain wrote:
> > On Wed, Jan 22, 2025 at 09:02:19AM -0800, Alexei Starovoitov wrote:
> > > On Wed, Jan 22, 2025 at 5:12 AM Daniel Gomez <da.gomez@...sung.com> wrote:
> > > >
> > > > Add support for a module error injection tool. The tool
> > > > can inject errors in the annotated module kernel functions
> > > > such as complete_formation(), do_init_module() and
> > > > module_enable_rodata_after_init(). Module name and module function are
> > > > required parameters to have control over the error injection.
> > > >
> > > > Example: Inject error -22 to module_enable_rodata_ro_after_init for
> > > > brd module:
> > > >
> > > > sudo moderr --modname=brd --modfunc=module_enable_rodata_ro_after_init \
> > > > --error=-22 --trace
> > > > Monitoring module error injection... Hit Ctrl-C to end.
> > > > MODULE     ERROR FUNCTION
> > > > brd        -22   module_enable_rodata_after_init()
> > > >
> > > > Kernel messages:
> > > > [   89.463690] brd: module loaded
> > > > [   89.463855] brd: module_enable_rodata_ro_after_init() returned -22,
> > > > ro_after_init data might still be writable
> > > >
> > > > Signed-off-by: Daniel Gomez <da.gomez@...sung.com>
> > > > ---
> > > >  tools/bpf/Makefile            |  13 ++-
> > > >  tools/bpf/moderr/.gitignore   |   2 +
> > > >  tools/bpf/moderr/Makefile     |  95 +++++++++++++++++
> > > >  tools/bpf/moderr/moderr.bpf.c | 127 +++++++++++++++++++++++
> > > >  tools/bpf/moderr/moderr.c     | 236 ++++++++++++++++++++++++++++++++++++++++++
> > > >  tools/bpf/moderr/moderr.h     |  40 +++++++
> > > >  6 files changed, 510 insertions(+), 3 deletions(-)
> > > 
> > > The tool looks useful, but we don't add tools to the kernel repo.
> > > It has to stay out of tree.
> > 
> > For selftests we do add random tools.
> > 
> > > The value of error injection is not clear to me.
> > 
> > It is of great value, since it deals with corner cases which are
> > otherwise hard to reproduce in places which a real error can be
> > catostrophic.
> > 
> > > Other places in the kernel use it to test paths in the kernel
> > > that are difficult to do otherwise.
> > 
> > Right.
> > 
> > > These 3 functions don't seem to be in this category.
> > 
> > That's the key here we should focus on. The problem is when a maintainer
> > *does* agree that adding an error injection entry is useful for testing,
> > and we have a developer willing to do the work to help test / validate
> > it. In this case, this error case is rare but we do want to strive to
> > test this as we ramp up and extend our modules selftests.
> > 
> > Then there is the aspect of how to mitigate how instrusive code changes
> > to allow error injection are. In 2021 we evaluated the prospect of error
> > injection in-kernel long ago for other areas like the block layer for
> > add_disk() failures [0] but the minimal interface to enable this from
> > userspace with debugfs was considered just too intrusive.
> > 
> > This effort tried to evaluate what this could look like with eBPF to
> > mitigate the required in-kernel code, and I believe the light weight
> > nature of it by just requiring a sprinkle with ALLOW_ERROR_INJECTION()
> > suffices to my taste.
> > 
> > So, perhaps the tools aspect can just go in:
> > 
> > tools/testing/selftests/module/
> 
> but why would it be module-specific? Based on its current implementation
> and discussion about inject.py it seems to be generic enough to be
> useful to test any function annotated with ALLOW_ERROR_INJECTION().

Right, but inject.py is based on the old/deprecated Python eBPF/BCC
infrastructure (although I think it's still a working and maintained tool based
on commit history). However, as I noted in the cover letter, I think it would be
useful to port it to use libbpf instead.

> 
> As xe driver maintainer, it may be interesting to use such a tool:
> 
> 	$ git grep ALLOW_ERROR_INJECT -- drivers/gpu/drm/xe | wc -l  	23

I was wondering if users of ALLOW_ERROR_INJECTION() still depend on inject.py
tool, or other tools were used. Or perhaps all are using debugfs?

> 
> How does this approach compare to writing the function name on debugfs
> (the current approach in xe's testsuite)?
> 
> 	fail_function @ https://docs.kernel.org/fault-injection/fault-injection.html#fault-injection-capabilities-infrastructure
> 	https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tests/intel/xe_fault_injection.c?ref_type=heads#L108

IMHO and looking at the references above, looks "simpler" to handle these cases
in eBPF. Specially because the error injection logic does not have to live along
with the code but in a separate tool. Link from Luis [0] (also [1]) is a good
example of when debugfs may be seen a bit too intrusive and how eBPF may resolve
the problems maintainers have.

[1] https://lore.kernel.org/all/20210512064629.13899-1-mcgrof@kernel.org/

In summary, a function annotated with the error injection tag can modify
its return value in eBPF code by using the bpf_override_return() helper. The
logic for deciding when to inject the error is likely similar to the current
implementation in debugfs. This can be seen in patch 2, file tools/bpf/
moderr/moderr.bpf.c where error is injected based on module name and target
function.

> 
> If you decide to have the tool to live somewhere else, then kmod repo
> could be a candidate. Although I think having it in kernel tree is

Thanks Lucas.

> simpler maintenance-wise.

I agree with you that having the tool (or maybe tools if we decide to split) in
tree it may be easier and may be a way to showcase its usage more effectively.

> 
> Lucas De Marchi
> 
> > 
> > [0] https://www.spinics.net/lists/linux-block/msg68159.html
> > 
> >  Luis