[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d22b1596-e9ba-307c-7033-20f6a074a9df@nvidia.com>
Date: Thu, 30 Sep 2021 00:21:55 +0300
From: Nikolay Aleksandrov <nikolay@...dia.com>
To: Vladimir Oltean <vladimir.oltean@....com>, netdev@...r.kernel.org
Cc: "Paul E . McKenney" <paulmck@...nel.org>,
rcu <rcu@...r.kernel.org>, Roopa Prabhu <roopa@...dia.com>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Tobias Waldekranz <tobias@...dekranz.com>,
Florian Fainelli <f.fainelli@...il.com>,
Andrew Lunn <andrew@...n.ch>,
Vivien Didelot <vivien.didelot@...il.com>,
Vladimir Oltean <olteanv@...il.com>
Subject: Re: [RFC PATCH net] net: dsa: tag_dsa: fix suspicious
rcu_dereference_check() with br_vlan_get_pvid_rcu
On 29/09/2021 02:37, Vladimir Oltean wrote:
> __dev_queue_xmit(), which is our caller, does run under rcu_read_lock_bh(),
> but in my foolishness I had thought this would be enough to make the
> access, lockdep complains that rcu_read_lock() is not held.
>
> Which it isn't - as it turns out, RCU preempt and RCU-bh are two
> different flavors, and although Paul McKenney has consolidated
> synchronize_rcu() to wait for both preempt as well as bh read-side
> critical sections [1], the reader-side API is different, the lockdep
> maps and keys are different.
>
> The bridge calls synchronize_rcu() in br_vlan_flush(), and this does
> wait for our TX fastpath reader of the br_vlan_group_rcu to complete
> even though it is in an rcu-bh read side section. So even though this is
> in premise safe, to lockdep this is a case of "who are you? I don't know
> you, you're suspicious".
>
> Side note, I still don't really understand the different RCU flavors.
> For example, as far as I can see, the core network stack has never
> directly called synchronize_rcu_bh, not even once. Just the initial
> synchronize_kernel(), replaced later with the RCU preempt variant -
> synchronize_rcu(). Very very long story short, dev_queue_xmit has
> started calling this exact variant - rcu_read_lock_bh() - since [2], to
> make dev_deactivate properly wait for network interfaces with
> NETIF_F_LLTX to finish their dev_queue_xmit(). But that relied on an
> existing synchronize_rcu(), not synchronize_rcu_bh(). So does this mean
> that synchronize_net() never really waited for the rcu-bh critical
> section in dev_queue_xmit to finish? I've no idea.
>
> So basically there are multiple options.
>
> First would be to duplicate br_vlan_get_pvid_rcu() into a new
> br_vlan_get_pvid_rcu_bh() to appease lockdep for the TX path case. But
> this function already has another brother, br_vlan_get_pvid(), which is
> protected by the update-side rtnl_mutex. We don't want to grow the
> family too big too, especially since br_vlan_get_pvid_rcu_bh() would not
> be a function used by the bridge at all, just exported by it and used by
> the DSA layer.
>
> The option of getting to the bottom of why does __dev_queue_xmit use
> rcu-bh, and splitting that into local_bh_disable + rcu_read_lock, as it
> was before [3], might be impractical. There have been 15 years of
> development since then, and there are lots of code paths that use
> rcu_dereference_bh() in the TX path. Plus, with the consolidation work
> done in [1], I'm not even sure what are the practical benefits of rcu-bh
> any longer, if the whole point was for synchronize_rcu() to wait for
> everything in sight - how can spammy softirqs like networking paint
> themselves red any longer, and how can certain RCU updaters not wait for
> them now, in order to avoid denial of service? It doesn't appear
> possible from the distance from which I'm looking at the problem.
> So the effort of converting __dev_queue_xmit from rcu-bh to rcu-preempt
> would only appear justified if it went together with the complete
> elimination of rcu-bh. Also, it would appear to be quite a strange and
> roundabout way to fix a "suspicious RCU usage" lockdep message.
>
> Last, it appears possible to just give lockdep what it wants, and hold
> an rcu-preempt read-side critical section when calling br_vlan_get_pvid_rcu
> from the TX path. In terms of lines of code and amount of thought needed
> it is certainly the easiest path forward, even though it incurs a small
> (negligible) performance overhead (and avoidable, at that). This is what
> this patch does, in lack of a deeper understanding of lockdep, RCU or
> the network transmission process.
>
> [1] https://lwn.net/Articles/777036/
> [2] commit d4828d85d188 ("[NET]: Prevent transmission after dev_deactivate")
> [3] commit 43da55cbd54e ("[NET]: Do less atomic count changes in dev_queue_xmit.")
>
> Fixes: d82f8ab0d874 ("net: dsa: tag_dsa: offload the bridge forwarding process")
> Signed-off-by: Vladimir Oltean <vladimir.oltean@....com>
> ---
> net/dsa/tag_dsa.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
To answer the question about the patch - it is correct and ok, the dereference needed
is done inside the function and after that the value is copied. I'd only check how the
DSA bridge pointer is handled, it seems it is dereferenced directly and I don't know
the DSA locking rules (i.e. if anything is held by that time to ensure the pointer is
valid). As for the patch:
Reviewed-by: Nikolay Aleksandrov <nikolay@...dia.com>
Cheers,
Nik
Powered by blists - more mailing lists