linux-kernel - Re: [PATCH 2/2] moderr: add module error injection tool

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <qnfhjhyqlagmrmk3dwfb2ay37ihi6dlkzs67bzxpu7izz6wqc5@aiohaxlgzx5r>
Date: Wed, 19 Feb 2025 14:17:48 -0600
From: Lucas De Marchi <lucas.demarchi@...el.com>
To: Luis Chamberlain <mcgrof@...nel.org>
CC: Alexei Starovoitov <alexei.starovoitov@...il.com>, Daniel Gomez
	<da.gomez@...sung.com>, Petr Pavlu <petr.pavlu@...e.com>, Sami Tolvanen
	<samitolvanen@...gle.com>, Alexei Starovoitov <ast@...nel.org>, "Daniel
 Borkmann" <daniel@...earbox.net>, Andrii Nakryiko <andrii@...nel.org>,
	"Martin KaFai Lau" <martin.lau@...ux.dev>, Eduard Zingerman
	<eddyz87@...il.com>, "Song Liu" <song@...nel.org>, Yonghong Song
	<yonghong.song@...ux.dev>, "John Fastabend" <john.fastabend@...il.com>, KP
 Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...ichev.me>, Hao Luo
	<haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>, Nathan Chancellor
	<nathan@...nel.org>, Nick Desaulniers <ndesaulniers@...gle.com>, Bill
 Wendling <morbo@...gle.com>, Justin Stitt <justinstitt@...gle.com>,
	<linux-modules@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>, bpf
	<bpf@...r.kernel.org>, clang-built-linux <llvm@...ts.linux.dev>, iovisor-dev
	<iovisor-dev@...ts.iovisor.org>, <gost.dev@...sung.com>
Subject: Re: [PATCH 2/2] moderr: add module error injection tool

On Tue, Jan 28, 2025 at 12:57:05PM -0800, Luis Chamberlain wrote:
>On Wed, Jan 22, 2025 at 09:02:19AM -0800, Alexei Starovoitov wrote:
>> On Wed, Jan 22, 2025 at 5:12 AM Daniel Gomez <da.gomez@...sung.com> wrote:
>> >
>> > Add support for a module error injection tool. The tool
>> > can inject errors in the annotated module kernel functions
>> > such as complete_formation(), do_init_module() and
>> > module_enable_rodata_after_init(). Module name and module function are
>> > required parameters to have control over the error injection.
>> >
>> > Example: Inject error -22 to module_enable_rodata_ro_after_init for
>> > brd module:
>> >
>> > sudo moderr --modname=brd --modfunc=module_enable_rodata_ro_after_init \
>> > --error=-22 --trace
>> > Monitoring module error injection... Hit Ctrl-C to end.
>> > MODULE     ERROR FUNCTION
>> > brd        -22   module_enable_rodata_after_init()
>> >
>> > Kernel messages:
>> > [   89.463690] brd: module loaded
>> > [   89.463855] brd: module_enable_rodata_ro_after_init() returned -22,
>> > ro_after_init data might still be writable
>> >
>> > Signed-off-by: Daniel Gomez <da.gomez@...sung.com>
>> > ---
>> >  tools/bpf/Makefile            |  13 ++-
>> >  tools/bpf/moderr/.gitignore   |   2 +
>> >  tools/bpf/moderr/Makefile     |  95 +++++++++++++++++
>> >  tools/bpf/moderr/moderr.bpf.c | 127 +++++++++++++++++++++++
>> >  tools/bpf/moderr/moderr.c     | 236 ++++++++++++++++++++++++++++++++++++++++++
>> >  tools/bpf/moderr/moderr.h     |  40 +++++++
>> >  6 files changed, 510 insertions(+), 3 deletions(-)
>>
>> The tool looks useful, but we don't add tools to the kernel repo.
>> It has to stay out of tree.
>
>For selftests we do add random tools.
>
>> The value of error injection is not clear to me.
>
>It is of great value, since it deals with corner cases which are
>otherwise hard to reproduce in places which a real error can be
>catostrophic.
>
>> Other places in the kernel use it to test paths in the kernel
>> that are difficult to do otherwise.
>
>Right.
>
>> These 3 functions don't seem to be in this category.
>
>That's the key here we should focus on. The problem is when a maintainer
>*does* agree that adding an error injection entry is useful for testing,
>and we have a developer willing to do the work to help test / validate
>it. In this case, this error case is rare but we do want to strive to
>test this as we ramp up and extend our modules selftests.
>
>Then there is the aspect of how to mitigate how instrusive code changes
>to allow error injection are. In 2021 we evaluated the prospect of error
>injection in-kernel long ago for other areas like the block layer for
>add_disk() failures [0] but the minimal interface to enable this from
>userspace with debugfs was considered just too intrusive.
>
>This effort tried to evaluate what this could look like with eBPF to
>mitigate the required in-kernel code, and I believe the light weight
>nature of it by just requiring a sprinkle with ALLOW_ERROR_INJECTION()
>suffices to my taste.
>
>So, perhaps the tools aspect can just go in:
>
>tools/testing/selftests/module/

but why would it be module-specific? Based on its current implementation
and discussion about inject.py it seems to be generic enough to be
useful to test any function annotated with ALLOW_ERROR_INJECTION().

As xe driver maintainer, it may be interesting to use such a tool:

	$ git grep ALLOW_ERROR_INJECT -- drivers/gpu/drm/xe | wc -l  
	23

How does this approach compare to writing the function name on debugfs
(the current approach in xe's testsuite)?

	fail_function @ https://docs.kernel.org/fault-injection/fault-injection.html#fault-injection-capabilities-infrastructure
	https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tests/intel/xe_fault_injection.c?ref_type=heads#L108

If you decide to have the tool to live somewhere else, then kmod repo
could be a candidate. Although I think having it in kernel tree is
simpler maintenance-wise.

Lucas De Marchi

>
>[0] https://www.spinics.net/lists/linux-block/msg68159.html
>
>  Luis