[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzbzXT6e-dKtxr6SDzekXC+Zu45uX10dL+DuTA8Xn=cgjw@mail.gmail.com>
Date: Mon, 9 Dec 2024 14:45:41 -0800
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Jann Horn <jannh@...gle.com>, Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>
Cc: John Fastabend <john.fastabend@...il.com>, Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>, Eduard Zingerman <eddyz87@...il.com>, Song Liu <song@...nel.org>,
Yonghong Song <yonghong.song@...ux.dev>, KP Singh <kpsingh@...nel.org>,
Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
Steven Rostedt <rostedt@...dmis.org>, Masami Hiramatsu <mhiramat@...nel.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, Delyan Kratunov <delyank@...com>, bpf@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
stable@...r.kernel.org
Subject: Re: [PATCH bpf v2] bpf: Fix prog_array UAF in __uprobe_perf_func()
On Mon, Dec 9, 2024 at 2:14 PM Andrii Nakryiko
<andrii.nakryiko@...il.com> wrote:
>
> On Mon, Dec 9, 2024 at 10:22 AM Jann Horn <jannh@...gle.com> wrote:
> >
> > On Sat, Dec 7, 2024 at 12:15 AM Andrii Nakryiko
> > <andrii.nakryiko@...il.com> wrote:
> > > On Fri, Dec 6, 2024 at 3:14 PM Jann Horn <jannh@...gle.com> wrote:
> > > > On Fri, Dec 6, 2024 at 11:43 PM Jann Horn <jannh@...gle.com> wrote:
> > > > > On Fri, Dec 6, 2024 at 11:30 PM Andrii Nakryiko
> > > > > <andrii.nakryiko@...il.com> wrote:
> > > > > > On Fri, Dec 6, 2024 at 2:25 PM Jann Horn <jannh@...gle.com> wrote:
> > > > > > >
> > > > > > > On Fri, Dec 6, 2024 at 11:15 PM Andrii Nakryiko
> > > > > > > <andrii.nakryiko@...il.com> wrote:
> > > > > > > > On Fri, Dec 6, 2024 at 12:45 PM Jann Horn <jannh@...gle.com> wrote:
> > > > > > > > >
> > > > > > > > > Currently, the pointer stored in call->prog_array is loaded in
> > > > > > > > > __uprobe_perf_func(), with no RCU annotation and no RCU protection, so the
> > > > > > > > > loaded pointer can immediately be dangling. Later,
> > > > > > > > > bpf_prog_run_array_uprobe() starts a RCU-trace read-side critical section,
> > > > > > > > > but this is too late. It then uses rcu_dereference_check(), but this use of
> > > > > > > > > rcu_dereference_check() does not actually dereference anything.
> > > > > > > > >
> > > > > > > > > It looks like the intention was to pass a pointer to the member
> > > > > > > > > call->prog_array into bpf_prog_run_array_uprobe() and actually dereference
> > > > > > > > > the pointer in there. Fix the issue by actually doing that.
> > > > > > > > >
> > > > > > > > > Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps")
> > > > > > > > > Cc: stable@...r.kernel.org
> > > > > > > > > Signed-off-by: Jann Horn <jannh@...gle.com>
> > > > > > > > > ---
> > > > > > > > > To reproduce, in include/linux/bpf.h, patch in a mdelay(10000) directly
> > > > > > > > > before the might_fault() in bpf_prog_run_array_uprobe() and add an
> > > > > > > > > include of linux/delay.h.
> > > > > > > > >
> > > > > > > > > Build this userspace program:
> > > > > > > > >
> > > > > > > > > ```
> > > > > > > > > $ cat dummy.c
> > > > > > > > > #include <stdio.h>
> > > > > > > > > int main(void) {
> > > > > > > > > printf("hello world\n");
> > > > > > > > > }
> > > > > > > > > $ gcc -o dummy dummy.c
> > > > > > > > > ```
> > > > > > > > >
> > > > > > > > > Then build this BPF program and load it (change the path to point to
> > > > > > > > > the "dummy" binary you built):
> > > > > > > > >
> > > > > > > > > ```
> > > > > > > > > $ cat bpf-uprobe-kern.c
> > > > > > > > > #include <linux/bpf.h>
> > > > > > > > > #include <bpf/bpf_helpers.h>
> > > > > > > > > #include <bpf/bpf_tracing.h>
> > > > > > > > > char _license[] SEC("license") = "GPL";
> > > > > > > > >
> > > > > > > > > SEC("uprobe//home/user/bpf-uprobe-uaf/dummy:main")
> > > > > > > > > int BPF_UPROBE(main_uprobe) {
> > > > > > > > > bpf_printk("main uprobe triggered\n");
> > > > > > > > > return 0;
> > > > > > > > > }
> > > > > > > > > $ clang -O2 -g -target bpf -c -o bpf-uprobe-kern.o bpf-uprobe-kern.c
> > > > > > > > > $ sudo bpftool prog loadall bpf-uprobe-kern.o uprobe-test autoattach
> > > > > > > > > ```
> > > > > > > > >
> > > > > > > > > Then run ./dummy in one terminal, and after launching it, run
> > > > > > > > > `sudo umount uprobe-test` in another terminal. Once the 10-second
> > > > > > > > > mdelay() is over, a use-after-free should occur, which may or may
> > > > > > > > > not crash your kernel at the `prog->sleepable` check in
> > > > > > > > > bpf_prog_run_array_uprobe() depending on your luck.
> > > > > > > > > ---
> > > > > > > > > Changes in v2:
> > > > > > > > > - remove diff chunk in patch notes that confuses git
> > > > > > > > > - Link to v1: https://lore.kernel.org/r/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@google.com
> > > > > > > > > ---
> > > > > > > > > include/linux/bpf.h | 4 ++--
> > > > > > > > > kernel/trace/trace_uprobe.c | 2 +-
> > > > > > > > > 2 files changed, 3 insertions(+), 3 deletions(-)
> > > > > > > > >
> > > > > > > >
> > > > > > > > Looking at how similar in spirit bpf_prog_run_array() is meant to be
> > > > > > > > used, it seems like it is the caller's responsibility to
> > > > > > > > RCU-dereference array and keep RCU critical section before calling
> > > > > > > > into bpf_prog_run_array(). So I wonder if it's best to do this instead
> > > > > > > > (Gmail will butcher the diff, but it's about the idea):
> > > > > > >
> > > > > > > Yeah, that's the other option I was considering. That would be more
> > > > > > > consistent with the existing bpf_prog_run_array(), but has the
> > > > > > > downside of unnecessarily pushing responsibility up to the caller...
> > > > > > > I'm fine with either.
> > > > > >
> > > > > > there is really just one caller ("legacy" singular uprobe handler), so
> > > > > > I think this should be fine. Unless someone objects I'd keep it
> > > > > > consistent with other "prog_array_run" helpers
> > > > >
> > > > > Ack, I will make it consistent.
> > > > >
> > > > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > > > > > index eaee2a819f4c..4b8a9edd3727 100644
> > > > > > > > --- a/include/linux/bpf.h
> > > > > > > > +++ b/include/linux/bpf.h
> > > > > > > > @@ -2193,26 +2193,25 @@ bpf_prog_run_array(const struct bpf_prog_array *array,
> > > > > > > > * rcu-protected dynamically sized maps.
> > > > > > > > */
> > > > > > > > static __always_inline u32
> > > > > > > > -bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu,
> > > > > > > > +bpf_prog_run_array_uprobe(const struct bpf_prog_array *array,
> > > > > > > > const void *ctx, bpf_prog_run_fn run_prog)
> > > > > > > > {
> > > > > > > > const struct bpf_prog_array_item *item;
> > > > > > > > const struct bpf_prog *prog;
> > > > > > > > - const struct bpf_prog_array *array;
> > > > > > > > struct bpf_run_ctx *old_run_ctx;
> > > > > > > > struct bpf_trace_run_ctx run_ctx;
> > > > > > > > u32 ret = 1;
> > > > > > > >
> > > > > > > > might_fault();
> > > > > > > > + RCU_LOCKDEP_WARN(!rcu_read_lock_trace_held(), "no rcu lock held");
> > > > > > > > +
> > > > > > > > + if (unlikely(!array))
> > > > > > > > + goto out;
> > > > > > > >
> > > > > > > > - rcu_read_lock_trace();
> > > > > > > > migrate_disable();
> > > > > > > >
> > > > > > > > run_ctx.is_uprobe = true;
> > > > > > > >
> > > > > > > > - array = rcu_dereference_check(array_rcu, rcu_read_lock_trace_held());
> > > > > > > > - if (unlikely(!array))
> > > > > > > > - goto out;
> > > > > > > > old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> > > > > > > > item = &array->items[0];
> > > > > > > > while ((prog = READ_ONCE(item->prog))) {
> > > > > > > > @@ -2229,7 +2228,6 @@ bpf_prog_run_array_uprobe(const struct
> > > > > > > > bpf_prog_array __rcu *array_rcu,
> > > > > > > > bpf_reset_run_ctx(old_run_ctx);
> > > > > > > > out:
> > > > > > > > migrate_enable();
> > > > > > > > - rcu_read_unlock_trace();
> > > > > > > > return ret;
> > > > > > > > }
> > > > > > > >
> > > > > > > > diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> > > > > > > > index fed382b7881b..87a2b8fefa90 100644
> > > > > > > > --- a/kernel/trace/trace_uprobe.c
> > > > > > > > +++ b/kernel/trace/trace_uprobe.c
> > > > > > > > @@ -1404,7 +1404,9 @@ static void __uprobe_perf_func(struct trace_uprobe *tu,
> > > > > > > > if (bpf_prog_array_valid(call)) {
> > > > > > > > u32 ret;
> > > > > > > >
> > > > > > > > + rcu_read_lock_trace();
> > > > > > > > ret = bpf_prog_run_array_uprobe(call->prog_array,
> > > > > > > > regs, bpf_prog_run);
> > > > > > >
> > > > > > > But then this should be something like this (possibly split across
> > > > > > > multiple lines with a helper variable or such):
> > > > > > >
> > > > > > > ret = bpf_prog_run_array_uprobe(rcu_dereference_check(call->prog_array,
> > > > > > > rcu_read_lock_trace_held()), regs, bpf_prog_run);
> > > > > >
> > > > > > Yeah, absolutely, forgot to move the RCU dereference part, good catch!
> > > > > > But I wouldn't do the _check() variant here, literally the previous
> > > > > > line does rcu_read_trace_lock(), so this check part seems like just
> > > > > > unnecessary verboseness, I'd go with a simple rcu_dereference().
> > > > >
> > > > > rcu_dereference() is not legal there - that asserts that we are in a
> > > > > normal RCU read-side critical section, which we are not.
> > > > > rcu_dereference_raw() would be, but I think it is nice to document the
> > > > > semantics to make it explicit under which lock we're operating.
> > >
> > > sure, I don't mind
> > >
> > > > >
> > > > > I'll send a v3 in a bit after testing it.
> > > >
> > > > Actually, now I'm still hitting a page fault with my WIP v3 fix
> > > > applied... I'll probably poke at this some more next week.
> > >
> > > OK, that's interesting, keep us posted!
> >
> > If I replace the "uprobe/" in my reproducer with "uprobe.s/", the
> > reproducer stops crashing even on bpf/master without this fix -
> > because it happens that handle_swbp() is already holding a
> > rcu_read_lock_trace() lock way up the stack. So I think this fix
> > should still be applied, but it probably doesn't need to go into
> > stable unless there is another path to the buggy code that doesn't
> > come from handle_swbp(). I guess I probably should resend my patch
> > with an updated commit message pointing out this caveat?
> >
> > The problem I'm actually hitting seems to be a use-after-free of a
> > "struct bpf_prog" because of mismatching RCU flavors. Uprobes always
> > use bpf_prog_run_array_uprobe() under tasks-trace-RCU protection. But
> > it is possible to attach a non-sleepable BPF program to a uprobe, and
> > non-sleepable BPF programs are freed via normal RCU (see
> > __bpf_prog_put_noref()). And that is what happens with the reproducer
> > from my initial post
> > (https://lore.kernel.org/all/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@google.com/)
> > - I can see that __bpf_prog_put_noref runs with prog->sleepable==0.
> >
> > So I think that while I am delaying execution in
> > bpf_prog_run_array_uprobe(), perf_event_detach_bpf_prog() NULLs out
> > the event->tp_event->prog_array pointer and does
> > bpf_prog_array_free_sleepable() followed by bpf_prog_put(), and then
> > the BPF program can be freed since the reader doesn't hold an RCU read
> > lock. This seems a bit annoying to fix - there could legitimately be
> > several versions of the bpf_prog_array that are still used by
> > tasks-trace-RCU readers, so I think we can't just NULL out the array
> > entry and use RCU for the bpf_prog_array access on the reader side. I
> > guess we could add another flag on BPF programs that answers "should
> > this program be freed via tasks-trace-RCU" (separately from whether
> > the program is sleepable)?
>
> Yes, we shouldn't NULL anything out.
>
> This is the same issue we've been solving recently for sleepable
> tracepoints, see [0] and other patches in the same patch set. We
> solved it for sleepable (aka "faultable") tracepoints, but uprobes
> have the same problem where the attachment point is sleepable by
> nature (and thus protected by RCU Tasks Trace), but BPF program itself
> is non-sleepable (and thus we only wait for synchronize_rcu() before
> freeing), which causes a disconnect.
>
> We can easily fix this for BPF link-based uprobes, but legacy uprobes
> can be directly attached to perf event, so that's a bit more
> cumbersome. Let me think what should be the best way to handle this.
>
> Meanwhile, I agree, please send your original fix (with changes we
> discussed), it's good to have them, even if they don't fix your
> original issue. I'll CC you on fixes once I have them.
Ok, weeding through the perf/uprobe plumbing for BPF, I think we avoid
this problem with uprobe BPF link because uprobe_unregister_sync()
waits for RCU Tasks Trace GP, and so once we finish uprobe
unregistration, we have a guarantee that there is no more uprobe that
might dereference our BPF program. (I might have thought about this
problem when fixing BPF link for sleepable tracepoints, but I missed
the legacy perf_event attach/detach case).
With legacy perf event perf_event_detach_bpf_prog() we don't do any of
that, we just NULL out pointer and do bpf_prog_put(), not caring
whether this is uprobe, kprobe, or tracepoint...
So one way to solve this is to either teach
perf_event_detach_bpf_prog() to delay bpf_prog_put() until after RCU
Tasks Trace GP (which is what we do with bpf_prog_array, but not the
program itself), or add prog->aux->sleepable_hook flag in addition to
prog->aux->sleepable, and then inside bpf_prog_put() check
(prog->aux->sleepable || prog->aux->sleepable_hook) and do RCU Tasks
Trace delay (in addition to normal call_rcu()).
Third alternative would be to have something like
bpf_prog_put_sleepable() (just like we have
bpf_prog_array_free_sleepable()), where this would do additional
call_rcu_tasks_trace() even if BPF program itself isn't sleepable.
Alexei, Daniel, any thoughts or preferences?
>
> [0] https://lore.kernel.org/all/20241101181754.782341-1-andrii@kernel.org/
Powered by blists - more mailing lists