netdev - [RFC PATCH net] net: dsa: tag_dsa: fix suspicious rcu_dereference_check() with br_vlan_get_pvid

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20210928233708.1246774-1-vladimir.oltean@nxp.com>
Date:   Wed, 29 Sep 2021 02:37:08 +0300
From:   Vladimir Oltean <vladimir.oltean@....com>
To:     netdev@...r.kernel.org
Cc:     "Paul E . McKenney" <paulmck@...nel.org>,
        rcu <rcu@...r.kernel.org>, Roopa Prabhu <roopa@...dia.com>,
        Nikolay Aleksandrov <nikolay@...dia.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Tobias Waldekranz <tobias@...dekranz.com>,
        Florian Fainelli <f.fainelli@...il.com>,
        Andrew Lunn <andrew@...n.ch>,
        Vivien Didelot <vivien.didelot@...il.com>,
        Vladimir Oltean <olteanv@...il.com>
Subject: [RFC PATCH net] net: dsa: tag_dsa: fix suspicious rcu_dereference_check() with br_vlan_get_pvid_rcu

__dev_queue_xmit(), which is our caller, does run under rcu_read_lock_bh(),
but in my foolishness I had thought this would be enough to make the
access, lockdep complains that rcu_read_lock() is not held.

Which it isn't - as it turns out, RCU preempt and RCU-bh are two
different flavors, and although Paul McKenney has consolidated
synchronize_rcu() to wait for both preempt as well as bh read-side
critical sections [1], the reader-side API is different, the lockdep
maps and keys are different.

The bridge calls synchronize_rcu() in br_vlan_flush(), and this does
wait for our TX fastpath reader of the br_vlan_group_rcu to complete
even though it is in an rcu-bh read side section. So even though this is
in premise safe, to lockdep this is a case of "who are you? I don't know
you, you're suspicious".

Side note, I still don't really understand the different RCU flavors.
For example, as far as I can see, the core network stack has never
directly called synchronize_rcu_bh, not even once. Just the initial
synchronize_kernel(), replaced later with the RCU preempt variant -
synchronize_rcu(). Very very long story short, dev_queue_xmit has
started calling this exact variant - rcu_read_lock_bh() - since [2], to
make dev_deactivate properly wait for network interfaces with
NETIF_F_LLTX to finish their dev_queue_xmit(). But that relied on an
existing synchronize_rcu(), not synchronize_rcu_bh(). So does this mean
that synchronize_net() never really waited for the rcu-bh critical
section in dev_queue_xmit to finish? I've no idea.

So basically there are multiple options.

First would be to duplicate br_vlan_get_pvid_rcu() into a new
br_vlan_get_pvid_rcu_bh() to appease lockdep for the TX path case. But
this function already has another brother, br_vlan_get_pvid(), which is
protected by the update-side rtnl_mutex. We don't want to grow the
family too big too, especially since br_vlan_get_pvid_rcu_bh() would not
be a function used by the bridge at all, just exported by it and used by
the DSA layer.

The option of getting to the bottom of why does __dev_queue_xmit use
rcu-bh, and splitting that into local_bh_disable + rcu_read_lock, as it
was before [3], might be impractical. There have been 15 years of
development since then, and there are lots of code paths that use
rcu_dereference_bh() in the TX path. Plus, with the consolidation work
done in [1], I'm not even sure what are the practical benefits of rcu-bh
any longer, if the whole point was for synchronize_rcu() to wait for
everything in sight - how can spammy softirqs like networking paint
themselves red any longer, and how can certain RCU updaters not wait for
them now, in order to avoid denial of service? It doesn't appear
possible from the distance from which I'm looking at the problem.
So the effort of converting __dev_queue_xmit from rcu-bh to rcu-preempt
would only appear justified if it went together with the complete
elimination of rcu-bh. Also, it would appear to be quite a strange and
roundabout way to fix a "suspicious RCU usage" lockdep message.

Last, it appears possible to just give lockdep what it wants, and hold
an rcu-preempt read-side critical section when calling br_vlan_get_pvid_rcu
from the TX path. In terms of lines of code and amount of thought needed
it is certainly the easiest path forward, even though it incurs a small
(negligible) performance overhead (and avoidable, at that). This is what
this patch does, in lack of a deeper understanding of lockdep, RCU or
the network transmission process.

[1] https://lwn.net/Articles/777036/
[2] commit d4828d85d188 ("[NET]: Prevent transmission after dev_deactivate")
[3] commit 43da55cbd54e ("[NET]: Do less atomic count changes in dev_queue_xmit.")

Fixes: d82f8ab0d874 ("net: dsa: tag_dsa: offload the bridge forwarding process")
Signed-off-by: Vladimir Oltean <vladimir.oltean@....com>
---
 net/dsa/tag_dsa.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/dsa/tag_dsa.c b/net/dsa/tag_dsa.c
index 77d0ce89ab77..178464cd2bdb 100644
--- a/net/dsa/tag_dsa.c
+++ b/net/dsa/tag_dsa.c
@@ -150,10 +150,9 @@ static struct sk_buff *dsa_xmit_ll(struct sk_buff *skb, struct net_device *dev,
 		 * that's where the packets ingressed from.
 		 */
 		if (!br_vlan_enabled(br)) {
-			/* Safe because __dev_queue_xmit() runs under
-			 * rcu_read_lock_bh()
-			 */
+			rcu_read_lock();
 			err = br_vlan_get_pvid_rcu(br, &pvid);
+			rcu_read_unlock();
 			if (err)
 				return NULL;
 		}
-- 
2.25.1