[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAADnVQKgEViz3gQ2QJzCmnm-ou-r-=_i3yLaW5JoKK9okVcGzA@mail.gmail.com>
Date: Fri, 2 May 2025 19:10:45 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Sergey Senozhatsky <senozhatsky@...omium.org>
Cc: Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>, bpf <bpf@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC] BPF fault/jitter-injection framework
On Thu, May 1, 2025 at 9:10 PM Sergey Senozhatsky
<senozhatsky@...omium.org> wrote:
>
> Greetings,
>
> I've been thinking what if we had a BPF jitter/fault injection framework
> for more fine-grained and configurable kernel testing. Current fault
> injection doesn't support function arguments analysis, with BPF we
> can have something like
>
> // of course bpf_schedule_timeout() doesn't exist yet
> call bpf_schedule_timeout(120) in blk_execute_rq(rq) if
> rq->q->disk->major == 8 && rq->q->disk->first_minor == 0
>
> So that would introduce blk request execution timeouts/jitters for a
> particular gendisk only. And so on.
>
> Has this been discussed before? Does this approach even make sense
> or is there a better (another) way to do this?
I think it makes sense.
That was the motivation for us to do:
$ git grep ALLOW_ERROR_INJECTION fs/
fs/btrfs/ctree.c:ALLOW_ERROR_INJECTION(btrfs_cow_block, ERRNO);
fs/btrfs/ctree.c:ALLOW_ERROR_INJECTION(btrfs_search_slot, ERRNO);
fs/btrfs/disk-io.c:ALLOW_ERROR_INJECTION(open_ctree, ERRNO);
fs/btrfs/free-space-cache.c:ALLOW_ERROR_INJECTION(io_ctl_init, ERRNO);
fs/btrfs/relocation.c:ALLOW_ERROR_INJECTION(btrfs_should_cancel_balance, TRUE);
fs/btrfs/tree-checker.c:ALLOW_ERROR_INJECTION(btrfs_check_leaf, ERRNO);
fs/btrfs/tree-checker.c:ALLOW_ERROR_INJECTION(btrfs_check_node, ERRNO);
The one in open_ctree() actually found a few bugs.
It's a success story.
Targeted error injection works better than random fuzzing.
To call schedule_timeout() bpf program needs to be sleepable.
Majority of LSM and ALLOW_ERROR_INJECTION hooks are sleepable.
All syscalls are sleepable too.
So most of the infrastructure is already available.
Add bpf_schedule_timeout() kfunc and ALLOW_ERROR_INJECTION
where it matters and it's good to go.
kfunc and error inject marks are non binding.
We can remove them if this experiment doesn't work out.
Powered by blists - more mailing lists