[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200214191126.lbiusetaxecdl3of@localhost>
Date: Fri, 14 Feb 2020 14:11:26 -0500
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: LKML <linux-kernel@...r.kernel.org>,
David Miller <davem@...emloft.net>, bpf@...r.kernel.org,
netdev@...r.kernel.org, Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Sebastian Sewior <bigeasy@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>,
Clark Williams <williams@...hat.com>,
Steven Rostedt <rostedt@...dmis.org>,
Juri Lelli <juri.lelli@...hat.com>,
Ingo Molnar <mingo@...nel.org>
Subject: Re: [RFC patch 14/19] bpf: Use migrate_disable() in hashtab code
On 14-Feb-2020 02:39:31 PM, Thomas Gleixner wrote:
> The required protection is that the caller cannot be migrated to a
> different CPU as these places take either a hash bucket lock or might
> trigger a kprobe inside the memory allocator. Both scenarios can lead to
> deadlocks. The deadlock prevention is per CPU by incrementing a per CPU
> variable which temporarily blocks the invocation of BPF programs from perf
> and kprobes.
>
> Replace the preempt_disable/enable() pairs with migrate_disable/enable()
> pairs to prepare BPF to work on PREEMPT_RT enabled kernels. On a non-RT
> kernel this maps to preempt_disable/enable(), i.e. no functional change.
Will that _really_ work on RT ?
I'm puzzled about what will happen in the following scenario on RT:
Thread A is preempted within e.g. htab_elem_free_rcu, and Thread B is
scheduled and runs through a bunch of tracepoints. Both are on the
same CPU's runqueue:
CPU 1
Thread A is scheduled
(Thread A) htab_elem_free_rcu()
(Thread A) migrate disable
(Thread A) __this_cpu_inc(bpf_prog_active); -> per-cpu variable for
deadlock prevention.
Thread A is preempted
Thread B is scheduled
(Thread B) Runs through various tracepoints:
trace_call_bpf()
if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) {
-> will skip any instrumentation that happens to be on
this CPU until...
Thread B is preempted
Thread A is scheduled
(Thread A) __this_cpu_dec(bpf_prog_active);
(Thread A) migrate enable
Having all those events randomly and silently discarded might be quite
unexpected from a user standpoint. This turns the deadlock prevention
mechanism into a random tracepoint-dropping facility, which is
unsettling. One alternative approach we could consider to solve this
is to make this deadlock prevention nesting counter per-thread rather
than per-cpu.
Also, I don't think using __this_cpu_inc() without preempt-disable or
irq off is safe. You'll probably want to move to this_cpu_inc/dec
instead, which can be heavier on some architectures.
Thanks,
Mathieu
>
> Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
> ---
> kernel/bpf/hashtab.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -698,11 +698,11 @@ static void htab_elem_free_rcu(struct rc
> * we're calling kfree, otherwise deadlock is possible if kprobes
> * are placed somewhere inside of slub
> */
> - preempt_disable();
> + migrate_disable();
> __this_cpu_inc(bpf_prog_active);
> htab_elem_free(htab, l);
> __this_cpu_dec(bpf_prog_active);
> - preempt_enable();
> + migrate_enable();
> }
>
> static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l)
> @@ -1327,7 +1327,7 @@ static int
> }
>
> again:
> - preempt_disable();
> + migrate_disable();
> this_cpu_inc(bpf_prog_active);
> rcu_read_lock();
> again_nocopy:
> @@ -1347,7 +1347,7 @@ static int
> raw_spin_unlock_irqrestore(&b->lock, flags);
> rcu_read_unlock();
> this_cpu_dec(bpf_prog_active);
> - preempt_enable();
> + migrate_enable();
> goto after_loop;
> }
>
> @@ -1356,7 +1356,7 @@ static int
> raw_spin_unlock_irqrestore(&b->lock, flags);
> rcu_read_unlock();
> this_cpu_dec(bpf_prog_active);
> - preempt_enable();
> + migrate_enable();
> kvfree(keys);
> kvfree(values);
> goto alloc;
> @@ -1406,7 +1406,7 @@ static int
>
> rcu_read_unlock();
> this_cpu_dec(bpf_prog_active);
> - preempt_enable();
> + migrate_enable();
> if (bucket_cnt && (copy_to_user(ukeys + total * key_size, keys,
> key_size * bucket_cnt) ||
> copy_to_user(uvalues + total * value_size, values,
>
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Powered by blists - more mailing lists