[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220725203125.kaxokkhyrb4aerp5@skbuf>
Date: Mon, 25 Jul 2022 20:31:25 +0000
From: Vladimir Oltean <vladimir.oltean@....com>
To: Jakub Kicinski <kuba@...nel.org>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Paolo Abeni <pabeni@...hat.com>, Andrew Lunn <andrew@...n.ch>,
Vivien Didelot <vivien.didelot@...il.com>,
Florian Fainelli <f.fainelli@...il.com>,
Jonathan Toppins <jtoppins@...hat.com>,
Jay Vosburgh <j.vosburgh@...il.com>,
Veaceslav Falico <vfalico@...il.com>,
Hangbin Liu <liuhangbin@...il.com>,
Brian Hutchinson <b.hutchman@...il.com>
Subject: Re: [PATCH net] net: dsa: fix bonding with ARP monitoring by updating
trans_start manually
Hi Jakub,
On Sat, Jul 16, 2022 at 04:33:38PM -0700, Jakub Kicinski wrote:
> On Sat, 16 Jul 2022 13:30:10 +0000 Vladimir Oltean wrote:
> > I would need some assistance from Jay or other people more familiar with
> > bonding to do that. I'm not exactly clear which packets the bonding
> > driver wants to check they have been transmitted in the last interval:
> > ARP packets? any packets?
>
> And why - stack has queued a packet to the driver, how is that useful
> to assess the fact that the link is up? Can bonding check instead
> whether any queue is running? Or maybe the whole thing is just a remnant
> of times before the centralized watchdog tx and should be dropped?
Yes, indeed, why? I don't know.
> > With DSA and switchdev drivers in general,
> > they have an offloaded forwarding path as well, so expect that what you
> > get through ndo_get_stats64 may also report packets which egressed a
> > physical port but weren't originated by the network stack.
> > I simply don't know what is a viable substitute for dev_trans_start()
> > because I don't understand very well what it intends to capture.
>
> Looking thru the code I stumbled on the implementation of
> dev_trans_start() - it already takes care of some upper devices.
> Adding DSA handling there would offend my sensibilities slightly
> less, if you want to do that. At least it's not on the fast path.
How are your sensibilities feeling about this change?
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index cc6eabee2830..b844eb0bde7e 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -427,20 +427,43 @@ void __qdisc_run(struct Qdisc *q)
unsigned long dev_trans_start(struct net_device *dev)
{
+ struct net_device *lower;
+ struct list_head *iter;
unsigned long val, res;
+ bool have_lowers;
unsigned int i;
- if (is_vlan_dev(dev))
- dev = vlan_dev_real_dev(dev);
- else if (netif_is_macvlan(dev))
- dev = macvlan_dev_real_dev(dev);
+ rcu_read_lock();
+
+ /* Stacked network interfaces usually have NETIF_F_LLTX so
+ * netdev_start_xmit() -> txq_trans_update() fails to do anything,
+ * because they don't lock the TX queue. However, layers such as the
+ * bonding arp monitor may still use dev_trans_start() on slave
+ * interfaces such as vlan, macvlan, DSA (or potentially stacked
+ * combinations of the above). In this case, walk the adjacency lists
+ * all the way down, hoping that the lower-most device won't have
+ * NETIF_F_LLTX.
+ */
+ do {
+ have_lowers = false;
+
+ netdev_for_each_lower_dev(dev, lower, iter) {
+ have_lowers = true;
+ dev = lower;
+ break;
+ }
+ } while (have_lowers);
+
res = READ_ONCE(netdev_get_tx_queue(dev, 0)->trans_start);
+
for (i = 1; i < dev->num_tx_queues; i++) {
val = READ_ONCE(netdev_get_tx_queue(dev, i)->trans_start);
if (val && time_after(val, res))
res = val;
}
+ rcu_read_unlock();
+
return res;
}
EXPORT_SYMBOL(dev_trans_start);
Powered by blists - more mailing lists