linux-kernel - Re: [RFC PATCH] tracing: Fix syscall tracepoint use-after-free

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f63cc172-72a7-4666-a15f-c53d8562d7d7@efficios.com>
Date: Wed, 23 Oct 2024 11:13:53 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Jordan Rife <jrife@...gle.com>
Cc: acme@...nel.org, alexander.shishkin@...ux.intel.com,
 andrii.nakryiko@...il.com, ast@...nel.org, bpf@...r.kernel.org,
 joel@...lfernandes.org, linux-kernel@...r.kernel.org, mark.rutland@....com,
 mhiramat@...nel.org, mingo@...hat.com, mjeanson@...icios.com,
 namhyung@...nel.org, paulmck@...nel.org, peterz@...radead.org,
 rostedt@...dmis.org, syzbot+b390c8062d8387b6272a@...kaller.appspotmail.com,
 yhs@...com
Subject: Re: [RFC PATCH] tracing: Fix syscall tracepoint use-after-free

On 2024-10-23 10:56, Jordan Rife wrote:
> Mathieu's patch alone does not seem to be enough to prevent the
> use-after-free issue reported by syzbot.
> 
> Link: https://lore.kernel.org/bpf/67121037.050a0220.10f4f4.000f.GAE@google.com/T/#u
> 
> I reran the repro script with his patch applied to my tree and was
> still able to get the same KASAN crash to occur.
> 
> In this case, when bpf_link_free is invoked it kicks off three instances
> of call_rcu*.
> 
> bpf_link_free()
>    ops->release()
>       bpf_raw_tp_link_release()
>         bpf_probe_unregister()
>           tracepoint_probe_unregister()
>             tracepoint_remove_func()
>               release_probes()
>                 call_rcu()               [1]
>    bpf_prog_put()
>      __bpf_prog_put()
>        bpf_prog_put_deferred()
>          __bpf_prog_put_noref()
>             call_rcu()                   [2]
>    call_rcu()                            [3]
> 
> With Mathieu's patch, [1] is chained with call_rcu_tasks_trace()
> making the grace period suffiently long to safely free the probe itself.
> The callback for [2] and [3] may be invoked before the
> call_rcu_tasks_trace() grace period has elapsed leading to the link or
> program itself being freed while still in use. I was able to prevent
> any crashes with the patch below which also chains
> call_rcu_tasks_trace() and call_rcu() at [2] and [3].

Right, so removal of the tracepoint probe is done by
tracepoint_probe_unregister by effectively removing the
probe function from the array. The read-side counterpart
of that is in __DO_TRACE(), where the rcu dereference is
protected by rcu_read_lock_trace for syscall tracepoints
now.

We cannot expect that surrounding the ebpf probe execution
with preempt disable like so:

#define __BPF_DECLARE_TRACE_SYSCALL(call, proto, args)                  \
static notrace void                                                     \
__bpf_trace_##call(void *__data, proto)                                 \
{                                                                       \
         might_fault();                                                  \
         preempt_disable_notrace();                                      \
         CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(__data, CAST_TO_U64(args));        \
         preempt_enable_notrace();                                       \
}

Is sufficient to delay reclaim with call_rcu() after a tracepoint
unregister, because the preempt disable does not include the rcu
dereference done by the tracepoint in its critical section.

So relying on a call_rcu() to delay reclaim of the bpf objects
after unregistering their associated tracepoint is indeed not
enough. Chaining call_rcu with call_rcu_tasks_trace works though.

That question is relevant for ftrace and perf too: are there data
structures that are reclaimed with call_rcu() after being unregistered
from syscall tracepoints ?

Thanks Jordan for your thorough analysis,

Mathieu

> 
> ---
>   kernel/bpf/syscall.c | 24 ++++++++++--------------
>   1 file changed, 10 insertions(+), 14 deletions(-)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 59de664e580d..5290eccb465e 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2200,6 +2200,14 @@ static void __bpf_prog_put_rcu(struct rcu_head *rcu)
>   	bpf_prog_free(aux->prog);
>   }
>   
> +static void __bpf_prog_put_tasks_trace_rcu(struct rcu_head *rcu)
> +{
> +	if (rcu_trace_implies_rcu_gp())
> +		__bpf_prog_put_rcu(rcu);
> +	else
> +		call_rcu(rcu, __bpf_prog_put_rcu);
> +}
> +
>   static void __bpf_prog_put_noref(struct bpf_prog *prog, bool deferred)
>   {
>   	bpf_prog_kallsyms_del_all(prog);
> @@ -2212,10 +2220,7 @@ static void __bpf_prog_put_noref(struct bpf_prog *prog, bool deferred)
>   		btf_put(prog->aux->attach_btf);
>   
>   	if (deferred) {
> -		if (prog->sleepable)
> -			call_rcu_tasks_trace(&prog->aux->rcu, __bpf_prog_put_rcu);
> -		else
> -			call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
> +		call_rcu_tasks_trace(&prog->aux->rcu, __bpf_prog_put_tasks_trace_rcu);
>   	} else {
>   		__bpf_prog_put_rcu(&prog->aux->rcu);
>   	}
> @@ -2996,24 +3001,15 @@ static void bpf_link_defer_dealloc_mult_rcu_gp(struct rcu_head *rcu)
>   static void bpf_link_free(struct bpf_link *link)
>   {
>   	const struct bpf_link_ops *ops = link->ops;
> -	bool sleepable = false;
>   
>   	bpf_link_free_id(link->id);
>   	if (link->prog) {
> -		sleepable = link->prog->sleepable;
>   		/* detach BPF program, clean up used resources */
>   		ops->release(link);
>   		bpf_prog_put(link->prog);
>   	}
>   	if (ops->dealloc_deferred) {
> -		/* schedule BPF link deallocation; if underlying BPF program
> -		 * is sleepable, we need to first wait for RCU tasks trace
> -		 * sync, then go through "classic" RCU grace period
> -		 */
> -		if (sleepable)
> -			call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_mult_rcu_gp);
> -		else
> -			call_rcu(&link->rcu, bpf_link_defer_dealloc_rcu_gp);
> +		call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_mult_rcu_gp);
>   	} else if (ops->dealloc)
>   		ops->dealloc(link);
>   }

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com