[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAEf4BzbCT5wmYWF2N8wU-FtujYvmQpb_3zN3v797KScUhxUFEw@mail.gmail.com>
Date: Fri, 25 Oct 2024 13:19:46 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Jordan Rife <jrife@...gle.com>
Cc: acme@...nel.org, alexander.shishkin@...ux.intel.com,
alexei.starovoitov@...il.com, ast@...nel.org, bpf@...r.kernel.org,
joel@...lfernandes.org, linux-kernel@...r.kernel.org, mark.rutland@....com,
mathieu.desnoyers@...icios.com, mhiramat@...nel.org, mingo@...hat.com,
mjeanson@...icios.com, namhyung@...nel.org, paulmck@...nel.org,
peterz@...radead.org, rostedt@...dmis.org,
syzbot+b390c8062d8387b6272a@...kaller.appspotmail.com, yhs@...com
Subject: Re: [RFC PATCH] tracing: Fix syscall tracepoint use-after-free
On Fri, Oct 25, 2024 at 8:01 AM Jordan Rife <jrife@...gle.com> wrote:
>
> > One solution might be to teach BPF raw tracepoint link to recognize
> > sleepable tracepoints, and then go through cal_rcu_task_trace ->
> > call_rcu chain instead of normal call_rcu. Similarly, for such cases
> > we'd need to do the same chain for underlying BPF program, even if BPF
> > program itself is not sleepable.
>
> I don't suppose that tracepoints could themselves be marked as sleepable
> (e.g. a new 'sleepable' member of `struct tracepoint`), which could be
> checked when initializing or freeing the link? Something like this,
>
> static void bpf_link_defer_bpf_prog_put(struct rcu_head *rcu)
> {
> struct bpf_prog_aux *aux = container_of(rcu, struct bpf_prog_aux, rcu);
> bpf_prog_put(aux->prog);
> }
>
> /* bpf_link_free is guaranteed to be called from process context */
> static void bpf_link_free(struct bpf_link *link)
> {
> const struct bpf_link_ops *ops = link->ops;
> bool sleepable = false;
>
> + if (ops->attachment_is_sleepable)
> + sleepable = ops->attachment_is_sleepable(link);
> +
> bpf_link_free_id(link->id);
> if (link->prog) {
> - sleepable = link->prog->sleepable;
> + sleepable = sleepable || link->prog->sleepable;
> /* detach BPF program, clean up used resources */
> ops->release(link);
> - bpf_prog_put(link->prog);
> + if (sleepable)
> + call_rcu_tasks_trace(&link->prog->aux->rcu,
> + bpf_link_defer_bpf_prog_put);
> + else
> + bpf_prog_put(link->prog);
> }
> if (ops->dealloc_deferred) {
> /* schedule BPF link deallocation; if underlying BPF program
> ...
> }
>
> static bool bpf_raw_tp_link_attachment_is_sleepable(struct bpf_link *link)
> {
> struct bpf_raw_tp_link *raw_tp =
> container_of(link, struct bpf_raw_tp_link, link);
>
> return raw_tp->btp->tp->sleepable;
> }
>
> where if the attachment point of the link is sleepable as with BPF raw
> syscall tracepoints then wait for the RCU tasks trace grace period
> to elapse before freeing up the program and link.
Yes, that's the direction I'm leaning towards (though implementation
details would be different, I don't think we need
attachment_is_sleepable callback). I replied to Mathieu's patch, I'll
try to do fixes for the BPF side next week, hope that works.
>
> -Jordan
Powered by blists - more mailing lists