[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <874ke5we1j.fsf@toke.dk>
Date: Fri, 11 Jun 2021 00:27:52 +0200
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Daniel Borkmann <daniel@...earbox.net>,
Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: bpf <bpf@...r.kernel.org>,
Network Development <netdev@...r.kernel.org>,
Martin KaFai Lau <kafai@...com>,
Hangbin Liu <liuhangbin@...il.com>,
Jesper Dangaard Brouer <brouer@...hat.com>,
Magnus Karlsson <magnus.karlsson@...il.com>,
"Paul E . McKenney" <paulmck@...nel.org>
Subject: Re: [PATCH bpf-next 02/17] bpf: allow RCU-protected lookups to
happen from bh context
Daniel Borkmann <daniel@...earbox.net> writes:
> Hi Paul,
>
> On 6/10/21 8:38 PM, Alexei Starovoitov wrote:
>> On Wed, Jun 9, 2021 at 7:24 AM Toke Høiland-Jørgensen <toke@...hat.com> wrote:
>>>
>>> XDP programs are called from a NAPI poll context, which means the RCU
>>> reference liveness is ensured by local_bh_disable(). Add
>>> rcu_read_lock_bh_held() as a condition to the RCU checks for map lookups so
>>> lockdep understands that the dereferences are safe from inside *either* an
>>> rcu_read_lock() section *or* a local_bh_disable() section. This is done in
>>> preparation for removing the redundant rcu_read_lock()s from the drivers.
>>>
>>> Signed-off-by: Toke Høiland-Jørgensen <toke@...hat.com>
>>> ---
>>> kernel/bpf/hashtab.c | 21 ++++++++++++++-------
>>> kernel/bpf/helpers.c | 6 +++---
>>> kernel/bpf/lpm_trie.c | 6 ++++--
>>> 3 files changed, 21 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>>> index 6f6681b07364..72c58cc516a3 100644
>>> --- a/kernel/bpf/hashtab.c
>>> +++ b/kernel/bpf/hashtab.c
>>> @@ -596,7 +596,8 @@ static void *__htab_map_lookup_elem(struct bpf_map *map, void *key)
>>> struct htab_elem *l;
>>> u32 hash, key_size;
>>>
>>> - WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
>>> + WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
>>> + !rcu_read_lock_bh_held());
>>
>> It's not clear to me whether rcu_read_lock_held() is still needed.
>> All comments sound like rcu_read_lock_bh_held() is a superset of rcu
>> that includes bh.
>> But reading rcu source code it looks like RCU_BH is its own rcu flavor...
>> which is confusing.
>
> The series is a bit confusing to me as well. I recall we had a discussion with
> Paul, but it was back in 2016 aka very early days of XDP to get some clarifications
> about RCU vs RCU-bh flavour on this. Paul, given the series in here, I assume the
> below is not true anymore, and in this case (since we're removing rcu_read_lock()
> from drivers), the RCU-bh acts as a real superset?
>
> Back then from your clarifications this was not the case:
>
> On Mon, Jul 25, 2016 at 11:26:02AM -0700, Alexei Starovoitov wrote:
> > On Mon, Jul 25, 2016 at 11:03 AM, Paul E. McKenney
> > <paulmck@...ux.vnet.ibm.com> wrote:
> [...]
> >>> The crux of the question is whether a particular driver rx handler, when
> >>> called from __do_softirq, needs to add an additional rcu_read_lock or
> >>> whether it can rely on the mechanics of softirq.
> >>
> >> If it was rcu_read_lock_bh(), you could.
> >>
> >> But you didn't say rcu_read_lock_bh(), you instead said rcu_read_lock(),
> >> which means that you absolutely cannot rely on softirq semantics.
> >>
> >> In particular, in CONFIG_PREEMPT=y kernels, rcu_preempt_check_callbacks()
> >> will notice that there is no rcu_read_lock() in effect and report
> >> a quiescent state for that CPU. Because rcu_preempt_check_callbacks()
> >> is invoked from the scheduling-clock interrupt, it absolutely can
> >> execute during do_softirq(), and therefore being in softirq context
> >> in no way provides rcu_read_lock()-style protection.
> >>
> >> Now, Alexei's question was for CONFIG_PREEMPT=n kernels. However, in
> >> that case, rcu_read_lock() and rcu_read_unlock() generate no code
> >> in recent production kernels, so there is no performance penalty for
> >> using them. (In older kernels, they implied a barrier().)
> >>
> >> So either way, with or without CONFIG_PREEMPT, you should use
> >> rcu_read_lock() to get RCU protection.
> >>
> >> One alternative might be to switch to rcu_read_lock_bh(), but that
> >> will add local_disable_bh() overhead to your read paths.
> >>
> >> Does that help, or am I missing the point of the question?
> >
> > thanks a lot for explanation.
>
> Glad you liked it!
>
> > I mistakenly assumed that _bh variants are 'stronger' and
> > act as inclusive, but sounds like they're completely orthogonal
> > especially with preempt_rcu=y.
>
> Yes, they are pretty much orthogonal.
>
> > With preempt_rcu=n and preempt=y, it would be the case, since
> > bh disables preemption and rcu_read_lock does the same as well,
> > right? Of course, the code shouldn't be relying on that, so we
> > have to fix our stuff.
>
> Indeed, especially given that the kernel currently won't allow you
> to configure CONFIG_PREEMPT_RCU=n and CONFIG_PREEMPT=y. If it does,
> please let me know, as that would be a bug that needs to be fixed.
> (For one thing, I do not test that combination.)
>
> Thanx, Paul
>
> And now, fast-forward again to 2021 ... :)
We covered this in the thread I linked from the cover letter.
Specifically, this seems to have been a change from v4.20, see Paul's
reply here:
https://lore.kernel.org/bpf/20210417002301.GO4212@paulmck-ThinkPad-P17-Gen-1/
and the follow-up covering -rt here:
https://lore.kernel.org/bpf/20210419165837.GA975577@paulmck-ThinkPad-P17-Gen-1/
-Toke
Powered by blists - more mailing lists