[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y2w9bNhVlAs/PcNV@krava>
Date: Thu, 10 Nov 2022 00:53:16 +0100
From: Jiri Olsa <olsajiri@...il.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Jiri Olsa <olsajiri@...il.com>, Hao Sun <sunhao.th@...il.com>,
Alexei Starovoitov <ast@...nel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Andrii Nakryiko <andrii@...nel.org>, bpf <bpf@...r.kernel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Hao Luo <haoluo@...gle.com>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>,
Stanislav Fomichev <sdf@...gle.com>,
Song Liu <song@...nel.org>, Yonghong Song <yhs@...com>
Subject: Re: WARNING in bpf_bprintf_prepare
On Wed, Nov 09, 2022 at 11:41:54AM -0800, Alexei Starovoitov wrote:
> On Wed, Nov 9, 2022 at 5:49 AM Jiri Olsa <olsajiri@...il.com> wrote:
> >
> > On Mon, Nov 07, 2022 at 12:49:01PM -0800, Alexei Starovoitov wrote:
> > > On Mon, Nov 7, 2022 at 4:31 AM Jiri Olsa <olsajiri@...il.com> wrote:
> > > >
> > > > On Wed, Nov 02, 2022 at 03:28:47PM +0100, Jiri Olsa wrote:
> > > > > On Thu, Oct 27, 2022 at 07:45:16PM +0800, Hao Sun wrote:
> > > > > > Jiri Olsa <olsajiri@...il.com> 于2022年10月27日周四 19:24写道:
> > > > > > >
> > > > > > > On Thu, Oct 27, 2022 at 10:27:28AM +0800, Hao Sun wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > The following warning can be triggered with the C reproducer in the link.
> > > > > > > > Syzbot also reported this several days ago, Jiri posted a patch that
> > > > > > > > uses bpf prog `active` field to fix this by 05b24ff9b2cfab (bpf:
> > > > > > > > Prevent bpf program recursion...) according to syzbot dashboard
> > > > > > > > (https://syzkaller.appspot.com/bug?id=179313fb375161d50a98311a28b8e2fc5f7350f9).
> > > > > > > > But this warning can still be triggered on 247f34f7b803
> > > > > > > > (Linux-v6.1-rc2) that already merged the patch, so it seems that this
> > > > > > > > still is an issue.
> > > > > > > >
> > > > > > > > HEAD commit: 247f34f7b803 Linux 6.1-rc2
> > > > > > > > git tree: upstream
> > > > > > > > console output: https://pastebin.com/raw/kNw8JCu5
> > > > > > > > kernel config: https://pastebin.com/raw/sE5QK5HL
> > > > > > > > C reproducer: https://pastebin.com/raw/X96ASi27
> > > > > > >
> > > > > > > hi,
> > > > > > > right, that fix addressed that issue for single bpf program,
> > > > > > > and it won't prevent if there are multiple programs hook on
> > > > > > > contention_begin tracepoint and calling bpf_trace_printk,
> > > > > > >
> > > > > > > I'm not sure we can do something there.. will check
> > > > > > >
> > > > > > > do you run just the reproducer, or you load the server somehow?
> > > > > > > I cannot hit the issue so far
> > > > > > >
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Last email has format issues, resend it here.
> > > > > >
> > > > > > I built the kernel with the config in the link, which contains
> > > > > > “CONFIG_CMDLINE="earlyprintk=serial net.ifnames=0
> > > > > > sysctl.kernel.hung_task_all_cpu_backtrace=1 panic_on_warn=1 …”, and
> > > > > > boot the kernel with normal qemu setup and then the warning can be
> > > > > > triggered by executing the reproducer.
> > > > > >
> > > > > > Also, I’m willing to test the proposed patch if any.
> > > > >
> > > > > fyi I reproduced that.. will check if we can do anything about that
> > > >
> > > > I reproduced this with set of 8 programs all hooked to contention_begin
> > > > tracepoint and here's what I think is happening:
> > > >
> > > > all programs (prog1 .. prog8) call just bpf_trace_printk helper and I'm
> > > > running 'perf bench sched messaging' to load the machine
> > > >
> > > > at some point some contended lock triggers trace_contention_begin:
> > > >
> > > > trace_contention_begin
> > > > __traceiter_contention_begin <-- iterates all functions attached to tracepoint
> > > > __bpf_trace_run(prog1)
> > > > prog1->active = 1
> > > > bpf_prog_run(prog1)
> > > > bpf_trace_printk
> > > > bpf_bprintf_prepare <-- takes buffer 1 out of 3
> > > > raw_spin_lock_irqsave(trace_printk_lock)
> > > >
> > > > # we have global single trace_printk_lock, so we will trigger
> > > > # its trace_contention_begin at some point
> > > >
> > > > trace_contention_begin
> > > > __traceiter_contention_begin
> > > > __bpf_trace_run(prog1)
> > > > prog1->active block <-- prog1 is already 'running', skipping the execution
> > > > __bpf_trace_run(prog2)
> > > > prog2->active = 1
> > > > bpf_prog_run(prog2)
> > > > bpf_trace_printk
> > > > bpf_bprintf_prepare <-- takes buffer 2 out of 3
> > > > raw_spin_lock_irqsave(trace_printk_lock)
> > > > trace_contention_begin
> > > > __traceiter_contention_begin
> > > > __bpf_trace_run(prog1)
> > > > prog1->active block <-- prog1 is already 'running', skipping the execution
> > > > __bpf_trace_run(prog2)
> > > > prog2->active block <-- prog2 is already 'running', skipping the execution
> > > > __bpf_trace_run(prog3)
> > > > prog3->active = 1
> > > > bpf_prog_run(prog3)
> > > > bpf_trace_printk
> > > > bpf_bprintf_prepare <-- takes buffer 3 out of 3
> > > > raw_spin_lock_irqsave(trace_printk_lock)
> > > > trace_contention_begin
> > > > __traceiter_contention_begin
> > > > __bpf_trace_run(prog1)
> > > > prog1->active block <-- prog1 is already 'running', skipping the execution
> > > > __bpf_trace_run(prog2)
> > > > prog2->active block <-- prog2 is already 'running', skipping the execution
> > > > __bpf_trace_run(prog3)
> > > > prog3->active block <-- prog3 is already 'running', skipping the execution
> > > > __bpf_trace_run(prog4)
> > > > prog4->active = 1
> > > > bpf_prog_run(prog4)
> > > > bpf_trace_printk
> > > > bpf_bprintf_prepare <-- tries to take buffer 4 out of 3 -> WARNING
> > > >
> > > >
> > > > the code path may vary based on the contention of the trace_printk_lock,
> > > > so I saw different nesting within 8 programs, but all eventually ended up
> > > > at 4 levels of nesting and hit the warning
> > > >
> > > > I think we could perhaps move the 'active' flag protection from program
> > > > to the tracepoint level (in the patch below), to prevent nesting execution
> > > > of the same tracepoint, so it'd look like:
> > > >
> > > > trace_contention_begin
> > > > __traceiter_contention_begin
> > > > __bpf_trace_run(prog1) {
> > > > contention_begin.active = 1
> > > > bpf_prog_run(prog1)
> > > > bpf_trace_printk
> > > > bpf_bprintf_prepare
> > > > raw_spin_lock_irqsave(trace_printk_lock)
> > > > trace_contention_begin
> > > > __traceiter_contention_begin
> > > > __bpf_trace_run(prog1)
> > > > blocked because contention_begin.active == 1
> > > > __bpf_trace_run(prog2)
> > > > blocked because contention_begin.active == 1
> > > > __bpf_trace_run(prog3)
> > > > ...
> > > > __bpf_trace_run(prog8)
> > > > blocked because contention_begin.active == 1
> > > >
> > > > raw_spin_unlock_irqrestore
> > > > bpf_bprintf_cleanup
> > > >
> > > > contention_begin.active = 0
> > > > }
> > > >
> > > > __bpf_trace_run(prog2) {
> > > > contention_begin.active = 1
> > > > bpf_prog_run(prog2)
> > > > ...
> > > > contention_begin.active = 0
> > > > }
> > > >
> > > > do we need bpf program execution in nested tracepoints?
> > > > we could actually allow 3 nesting levels for this case.. thoughts?
> > > >
> > > > thanks,
> > > > jirka
> > > >
> > > >
> > > > ---
> > > > diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
> > > > index 6a13220d2d27..5a354ae096e5 100644
> > > > --- a/include/trace/bpf_probe.h
> > > > +++ b/include/trace/bpf_probe.h
> > > > @@ -78,11 +78,15 @@
> > > > #define CAST_TO_U64(...) CONCATENATE(__CAST, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
> > > >
> > > > #define __BPF_DECLARE_TRACE(call, proto, args) \
> > > > +static DEFINE_PER_CPU(int, __bpf_trace_tp_active_##call); \
> > > > static notrace void \
> > > > __bpf_trace_##call(void *__data, proto) \
> > > > { \
> > > > struct bpf_prog *prog = __data; \
> > > > - CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
> > > > + \
> > > > + if (likely(this_cpu_inc_return(__bpf_trace_tp_active_##call) == 1)) \
> > > > + CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
> > > > + this_cpu_dec(__bpf_trace_tp_active_##call); \
> > > > }
> > >
> > > This approach will hurt real use cases where
> > > multiple and different raw_tp progs run on the same cpu.
> >
> > would the 2 levels of nesting help in here?
> >
> > I can imagine the change above would break use case where we want to
> > trigger tracepoints in irq context that interrupted task that's already
> > in the same tracepoint
> >
> > with 2 levels of nesting we would trigger that tracepoint from irq and
> > would still be safe with bpf_bprintf_prepare buffer
>
> How would these 2 levels work?
just using the active counter like below, but I haven't tested it yet
jirka
---
diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
index 6a13220d2d27..ca5dd34478b7 100644
--- a/include/trace/bpf_probe.h
+++ b/include/trace/bpf_probe.h
@@ -78,11 +78,15 @@
#define CAST_TO_U64(...) CONCATENATE(__CAST, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
#define __BPF_DECLARE_TRACE(call, proto, args) \
+static DEFINE_PER_CPU(int, __bpf_trace_tp_active_##call); \
static notrace void \
__bpf_trace_##call(void *__data, proto) \
{ \
struct bpf_prog *prog = __data; \
- CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
+ \
+ if (likely(this_cpu_inc_return(__bpf_trace_tp_active_##call) < 3)) \
+ CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
+ this_cpu_dec(__bpf_trace_tp_active_##call); \
}
#undef DECLARE_EVENT_CLASS
Powered by blists - more mailing lists