lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220727152000.3616086-1-vladimir.oltean@nxp.com>
Date:   Wed, 27 Jul 2022 18:20:00 +0300
From:   Vladimir Oltean <vladimir.oltean@....com>
To:     netdev@...r.kernel.org
Cc:     "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>, Andrew Lunn <andrew@...n.ch>,
        Vivien Didelot <vivien.didelot@...il.com>,
        Florian Fainelli <f.fainelli@...il.com>,
        Jonathan Toppins <jtoppins@...hat.com>,
        Jay Vosburgh <j.vosburgh@...il.com>,
        Veaceslav Falico <vfalico@...il.com>,
        Hangbin Liu <liuhangbin@...il.com>,
        Jamal Hadi Salim <jhs@...atatu.com>,
        Cong Wang <xiyou.wangcong@...il.com>,
        Jiri Pirko <jiri@...nulli.us>,
        Brian Hutchinson <b.hutchman@...il.com>
Subject: [PATCH v2 net] net/sched: make dev_trans_start() have a better chance of working with stacked interfaces

Documentation/networking/bonding.rst points out that for ARP monitoring
to work, dev_trans_start() must be able to verify the latest trans_start
update of any slave_dev TX queue. However, with NETIF_F_LLTX,
dev_trans_start() simply doesn't make much sense.

DSA has declared NETIF_F_LLTX to be in line with other stackable
interfaces, and this has introduced a regression in the form of breaking
ARP monitoring with bonding.

There is a workaround already in place in dev_trans_start() to fix just
this kind of breakage for non-stacked cases of vlan and macvlan. Since
DSA doesn't export any flag which says "this interface is DSA", or "this
interface's master is this device", we need to generalize this logic by
traversing the netdev adjacency lists, so that DSA is also covered.

Link to the discussion on a previous approach:
https://patchwork.kernel.org/project/netdevbpf/patch/20220715232641.952532-1-vladimir.oltean@nxp.com/

Fixes: 2b86cb829976 ("net: dsa: declare lockless TX feature for slave ports")
Reported-by: Brian Hutchinson <b.hutchman@...il.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@....com>
---
 net/sched/sch_generic.c | 37 +++++++++++++++++++++++++++++++++----
 1 file changed, 33 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index cc6eabee2830..fb964e2b0436 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -427,20 +427,49 @@ void __qdisc_run(struct Qdisc *q)
 
 unsigned long dev_trans_start(struct net_device *dev)
 {
+	struct net_device *lower;
+	struct list_head *iter;
 	unsigned long val, res;
+	bool have_lowers;
 	unsigned int i;
 
-	if (is_vlan_dev(dev))
-		dev = vlan_dev_real_dev(dev);
-	else if (netif_is_macvlan(dev))
-		dev = macvlan_dev_real_dev(dev);
+	rcu_read_lock();
+
+	/* Stacked network interfaces usually have NETIF_F_LLTX so
+	 * netdev_start_xmit() -> txq_trans_update() fails to do anything,
+	 * because they don't lock the TX queue. Calling dev_trans_start() on a
+	 * virtual device makes little sense, since it is a mechanism intended
+	 * for the TX watchdog. That notwithstanding, layers such as the
+	 * bonding arp monitor may still use dev_trans_start() on slave
+	 * interfaces, probably to see if any transmission took place in the
+	 * last ARP interval. This use is antiquated, however we don't know
+	 * what to replace it with. While we can't solve the general case of
+	 * virtual interfaces, for stackable ones (vlan, macvlan, DSA or
+	 * potentially stacked combinations), we can work around by returning
+	 * the trans_start of the physical, real device backing them. In this
+	 * case, walk the adjacency lists all the way down, hoping that the
+	 * lower-most device won't have NETIF_F_LLTX.
+	 */
+	do {
+		have_lowers = false;
+
+		netdev_for_each_lower_dev(dev, lower, iter) {
+			have_lowers = true;
+			dev = lower;
+			break;
+		}
+	} while (have_lowers);
+
 	res = READ_ONCE(netdev_get_tx_queue(dev, 0)->trans_start);
+
 	for (i = 1; i < dev->num_tx_queues; i++) {
 		val = READ_ONCE(netdev_get_tx_queue(dev, i)->trans_start);
 		if (val && time_after(val, res))
 			res = val;
 	}
 
+	rcu_read_unlock();
+
 	return res;
 }
 EXPORT_SYMBOL(dev_trans_start);
-- 
2.34.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ