netdev - Re: [BUG] possible deadlock in __schedule (with reproducer available)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAEf4BzYHeh_=iHOYL88pXXdHGZuAmQNM0jM+9iPUou+7+YLjjQ@mail.gmail.com>
Date: Tue, 26 Nov 2024 13:15:48 -0800
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Ruan Bonan <bonan.ruan@...us.edu>, Steven Rostedt <rostedt@...dmis.org>, 
	Alexei Starovoitov <alexei.starovoitov@...il.com>, "mingo@...hat.com" <mingo@...hat.com>, 
	"will@...nel.org" <will@...nel.org>, "longman@...hat.com" <longman@...hat.com>, 
	"boqun.feng@...il.com" <boqun.feng@...il.com>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "kpsingh@...nel.org" <kpsingh@...nel.org>, 
	"mattbobrowski@...gle.com" <mattbobrowski@...gle.com>, "ast@...nel.org" <ast@...nel.org>, 
	"daniel@...earbox.net" <daniel@...earbox.net>, "andrii@...nel.org" <andrii@...nel.org>, 
	"martin.lau@...ux.dev" <martin.lau@...ux.dev>, "eddyz87@...il.com" <eddyz87@...il.com>, 
	"song@...nel.org" <song@...nel.org>, "yonghong.song@...ux.dev" <yonghong.song@...ux.dev>, 
	"john.fastabend@...il.com" <john.fastabend@...il.com>, "sdf@...ichev.me" <sdf@...ichev.me>, 
	"haoluo@...gle.com" <haoluo@...gle.com>, "jolsa@...nel.org" <jolsa@...nel.org>, 
	"mhiramat@...nel.org" <mhiramat@...nel.org>, 
	"mathieu.desnoyers@...icios.com" <mathieu.desnoyers@...icios.com>, 
	"bpf@...r.kernel.org" <bpf@...r.kernel.org>, 
	"linux-trace-kernel@...r.kernel.org" <linux-trace-kernel@...r.kernel.org>, 
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>, Fu Yeqi <e1374359@...us.edu>
Subject: Re: [BUG] possible deadlock in __schedule (with reproducer available)

On Mon, Nov 25, 2024 at 1:44 AM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Mon, Nov 25, 2024 at 05:24:05AM +0000, Ruan Bonan wrote:
>
> > From the discussion, it appears that the root cause might involve
> > specific printk or BPF operations in the given context. To clarify and
> > possibly avoid similar issues in the future, are there guidelines or
> > best practices for writing BPF programs/hooks that interact with
> > tracepoints, especially those related to scheduler events, to prevent
> > such deadlocks?
>
> The general guideline and recommendation for all tracepoints is to be
> wait-free. Typically all tracer code should be.
>
> Now, BPF (users) (ab)uses tracepoints to do all sorts and takes certain
> liberties with them, but it is very much at the discretion of the BPF
> user.

We do assume that tracepoints are just like kprobes and can run in
NMI. And in this case BPF is just a vehicle to trigger a
promised-to-be-wait-free strncpy_from_user_nofault(). That's as far as
BPF involvement goes, we should stop discussing BPF in this context,
it's misleading.

As Alexei mentioned, this is the problem with printk code, not in BPF.
I'll just copy-paste the relevant parts of stack trace to make this
clear:

       console_trylock_spinning kernel/printk/printk.c:1990 [inline]
       vprintk_emit+0x414/0xb90 kernel/printk/printk.c:2406
       _printk+0x7a/0xa0 kernel/printk/printk.c:2432
       fail_dump lib/fault-inject.c:46 [inline]
       should_fail_ex+0x3be/0x570 lib/fault-inject.c:154
       strncpy_from_user+0x36/0x230 lib/strncpy_from_user.c:118
       strncpy_from_user_nofault+0x71/0x140 mm/maccess.c:186
       bpf_probe_read_user_str_common kernel/trace/bpf_trace.c:215 [inline]

>
> Slightly relaxed guideline would perhaps be to consider the context of
> the tracepoint, notably one of: NMI, IRQ, SoftIRQ or Task context -- and
> to not exceed the bounds of the given context.
>
> More specifically, when the tracepoint is inside critical sections of
> any sort (as is the case here) then it very much is on the BPF user to
> not cause inversions.
>
> At this point there really is no substitute for knowing what you're
> doing. Knowledge is key.
>
> In short; tracepoints should be wait-free, if you know what you're doing
> you can perhaps get away with a little more.

>From BPF perspective tracepoints are wait-free and we don't allow any
sleepable code to be called (until sleepable tracepoints are properly
supported, which is a separate "blessed" case of tracepoints).