lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4Bzb4ywpMxchWcMfW9Lzh=re4x1zbMfz2aPRiUa29nUMB=g@mail.gmail.com>
Date: Thu, 24 Oct 2024 10:12:25 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, 
	Alexei Starovoitov <alexei.starovoitov@...il.com>, Jordan Rife <jrife@...gle.com>, 
	Arnaldo Carvalho de Melo <acme@...nel.org>, Alexander Shishkin <alexander.shishkin@...ux.intel.com>, 
	Alexei Starovoitov <ast@...nel.org>, bpf <bpf@...r.kernel.org>, 
	Joel Fernandes <joel@...lfernandes.org>, LKML <linux-kernel@...r.kernel.org>, 
	Mark Rutland <mark.rutland@....com>, Masami Hiramatsu <mhiramat@...nel.org>, 
	Ingo Molnar <mingo@...hat.com>, Michael Jeanson <mjeanson@...icios.com>, 
	Namhyung Kim <namhyung@...nel.org>, "Paul E. McKenney" <paulmck@...nel.org>, 
	Peter Zijlstra <peterz@...radead.org>, 
	syzbot+b390c8062d8387b6272a@...kaller.appspotmail.com, 
	Yonghong Song <yhs@...com>
Subject: Re: [RFC PATCH] tracing: Fix syscall tracepoint use-after-free

On Wed, Oct 23, 2024 at 7:05 PM Steven Rostedt <rostedt@...dmis.org> wrote:
>
> On Wed, 23 Oct 2024 11:19:40 -0400
> Mathieu Desnoyers <mathieu.desnoyers@...icios.com> wrote:
> >
> > > Looks like Mathieu patch broke bpf program contract somewhere.
> >
> > My patch series introduced this in the probe:
> >
> > #define __BPF_DECLARE_TRACE_SYSCALL(call, proto, args)                  \
> > static notrace void                                                     \
> > __bpf_trace_##call(void *__data, proto)                                 \
> > {                                                                       \
> >          might_fault();                                                  \
> >          preempt_disable_notrace();                                      \
>
> Is the problem that we can call this function *after* the prog has been
> freed? That is, the preempt_disable_notrace() here is meaningless.
>

Yes, I think so.

> Is there a way to add something here to make sure the program is still
> valid? Like set a flag in the link structure?

So I think a big assumption right now is that waiting for RCU grace
period is enough to make sure that BPF link (and thus its underlying
BPF program) are not referenced from that tracepoint anymore, and so
we can proceed with freeing the memory.

Now that some tracepoints are sleepable and are RCU Tasks Trace
protected, this assumption is wrong.

One solution might be to teach BPF raw tracepoint link to recognize
sleepable tracepoints, and then go through cal_rcu_task_trace ->
call_rcu chain instead of normal call_rcu. Similarly, for such cases
we'd need to do the same chain for underlying BPF program, even if BPF
program itself is not sleepable.

Alternatively, we'd need to add synchronize_rcu() +
synchronize_rcu_task_trace() somewhere inside
tracepoint_probe_unregister() or bpf_probe_unregister(), which is just
a thin wrapper around the former. Which would make detaching multiple
tracepoints extremely slow (just like the problem we had with kprobe
detachment before multi-kprobes were added).

The fundamental reason for this is how we do lifetime tracking between
tracepoint object and bpf_link/bpf_program. tracepoint doesn't hold a
refcount on bpf_link. It's rather that when bpf_link's last refcount
drops to zero, we go and do this:

static void bpf_raw_tp_link_release(struct bpf_link *link)
{
        struct bpf_raw_tp_link *raw_tp =
                container_of(link, struct bpf_raw_tp_link, link);

        bpf_probe_unregister(raw_tp->btp, raw_tp);
        bpf_put_raw_tracepoint(raw_tp->btp);
}

And the assumption is that after bpf_probe_unregister() it should be
fine to free BPF link and program after call_rcu(). So we either make
bpf_probe_unregister() synchronously wait for
synchronize_rcu[_task_trace](), or we make sure to not free link/prog
until call_rcu_tasks_trace->call_rcu.

I'd prefer the former (add call_rcu_tasks_trace for sleepable BPF raw
tracepoint link).

You guys said you have a reproducer, right? Can you please share
details (I know it's somewhere on another thread, but let's put all
this in this thread). And as Alexei asked, where are the applied
patches adding faultable tracepoints?

>
> (I don't know how BPF works well enough to know what is involved here,
> so excuse me if this is totally off)

it's not off, but I don't think we want tracepoint to hold a refcount
on bpf_link (and thus bpf_program), because that invalidates how we do
detachment.

>
> -- Steve
>
>
> >          CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(__data, CAST_TO_U64(args));        \
> >          preempt_enable_notrace();                                       \
> > }
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ