netdev - Re: [PATCH net] net: dsa: fix bonding with ARP monitoring by updating trans

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220725203125.kaxokkhyrb4aerp5@skbuf>
Date:   Mon, 25 Jul 2022 20:31:25 +0000
From:   Vladimir Oltean <vladimir.oltean@....com>
To:     Jakub Kicinski <kuba@...nel.org>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Paolo Abeni <pabeni@...hat.com>, Andrew Lunn <andrew@...n.ch>,
        Vivien Didelot <vivien.didelot@...il.com>,
        Florian Fainelli <f.fainelli@...il.com>,
        Jonathan Toppins <jtoppins@...hat.com>,
        Jay Vosburgh <j.vosburgh@...il.com>,
        Veaceslav Falico <vfalico@...il.com>,
        Hangbin Liu <liuhangbin@...il.com>,
        Brian Hutchinson <b.hutchman@...il.com>
Subject: Re: [PATCH net] net: dsa: fix bonding with ARP monitoring by updating
 trans_start manually

Hi Jakub,

On Sat, Jul 16, 2022 at 04:33:38PM -0700, Jakub Kicinski wrote:
> On Sat, 16 Jul 2022 13:30:10 +0000 Vladimir Oltean wrote:
> > I would need some assistance from Jay or other people more familiar with
> > bonding to do that. I'm not exactly clear which packets the bonding
> > driver wants to check they have been transmitted in the last interval:
> > ARP packets? any packets?
> 
> And why - stack has queued a packet to the driver, how is that useful
> to assess the fact that the link is up? Can bonding check instead
> whether any queue is running? Or maybe the whole thing is just a remnant
> of times before the centralized watchdog tx and should be dropped?

Yes, indeed, why? I don't know.

> > With DSA and switchdev drivers in general,
> > they have an offloaded forwarding path as well, so expect that what you
> > get through ndo_get_stats64 may also report packets which egressed a
> > physical port but weren't originated by the network stack.
> > I simply don't know what is a viable substitute for dev_trans_start()
> > because I don't understand very well what it intends to capture.
> 
> Looking thru the code I stumbled on the implementation of
> dev_trans_start() - it already takes care of some upper devices.
> Adding DSA handling there would offend my sensibilities slightly
> less, if you want to do that. At least it's not on the fast path.

How are your sensibilities feeling about this change?

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index cc6eabee2830..b844eb0bde7e 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -427,20 +427,43 @@ void __qdisc_run(struct Qdisc *q)
 
 unsigned long dev_trans_start(struct net_device *dev)
 {
+	struct net_device *lower;
+	struct list_head *iter;
 	unsigned long val, res;
+	bool have_lowers;
 	unsigned int i;
 
-	if (is_vlan_dev(dev))
-		dev = vlan_dev_real_dev(dev);
-	else if (netif_is_macvlan(dev))
-		dev = macvlan_dev_real_dev(dev);
+	rcu_read_lock();
+
+	/* Stacked network interfaces usually have NETIF_F_LLTX so
+	 * netdev_start_xmit() -> txq_trans_update() fails to do anything,
+	 * because they don't lock the TX queue. However, layers such as the
+	 * bonding arp monitor may still use dev_trans_start() on slave
+	 * interfaces such as vlan, macvlan, DSA (or potentially stacked
+	 * combinations of the above). In this case, walk the adjacency lists
+	 * all the way down, hoping that the lower-most device won't have
+	 * NETIF_F_LLTX.
+	 */
+	do {
+		have_lowers = false;
+
+		netdev_for_each_lower_dev(dev, lower, iter) {
+			have_lowers = true;
+			dev = lower;
+			break;
+		}
+	} while (have_lowers);
+
 	res = READ_ONCE(netdev_get_tx_queue(dev, 0)->trans_start);
+
 	for (i = 1; i < dev->num_tx_queues; i++) {
 		val = READ_ONCE(netdev_get_tx_queue(dev, i)->trans_start);
 		if (val && time_after(val, res))
 			res = val;
 	}
 
+	rcu_read_unlock();
+
 	return res;
 }
 EXPORT_SYMBOL(dev_trans_start);