[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220731191327.cey4ziiez5tvcxpy@skbuf>
Date: Sun, 31 Jul 2022 19:13:28 +0000
From: Vladimir Oltean <vladimir.oltean@....com>
To: Jay Vosburgh <jay.vosburgh@...onical.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Andrew Lunn <andrew@...n.ch>,
Vivien Didelot <vivien.didelot@...il.com>,
Florian Fainelli <f.fainelli@...il.com>,
Jonathan Toppins <jtoppins@...hat.com>,
Veaceslav Falico <vfalico@...il.com>,
Andy Gospodarek <andy@...yhouse.net>,
Hangbin Liu <liuhangbin@...il.com>,
Jamal Hadi Salim <jhs@...atatu.com>,
Cong Wang <xiyou.wangcong@...il.com>,
Jiri Pirko <jiri@...nulli.us>,
Nikolay Aleksandrov <razor@...ckwall.org>,
Stephen Hemminger <stephen@...workplumber.org>
Subject: Re: [PATCH v3 net 1/4] net: bonding: replace dev_trans_start() with
the jiffies of the last ARP/NS
Hello Jay,
On Sun, Jul 31, 2022 at 11:53:55AM -0700, Jay Vosburgh wrote:
> Vladimir Oltean <vladimir.oltean@....com> wrote:
>
> >The bonding driver piggybacks on time stamps kept by the network stack
> >for the purpose of the netdev TX watchdog, and this is problematic
> >because it does not work with NETIF_F_LLTX devices.
> >
> >It is hard to say why the driver looks at dev_trans_start() of the
> >slave->dev, considering that this is updated even by non-ARP/NS probes
> >sent by us, and even by traffic not sent by us at all (for example PTP
> >on physical slave devices). ARP monitoring in active-backup mode appears
> >to still work even if we track only the last TX time of actual ARP
> >probes.
>
> Because it's the closest it can get to "have we sent an ARP," really.
Does it really track this? It seems pretty easy to fool to me.
I don't know why keeping a last_tx the way my patch does wouldn't be
better.
> The issue with LLTX is relatively new (the bonding driver has worked
> this way for longer than I've been involved, so I don't know what the
> original design decisions were).
>
> FWIW, I've been working with the following, which is closer in
> spirit to what Jakub and I discussed previously (i.e., inspecting the
> device stats for virtual devices, relying on dev_trans_start for
> physical devices with ndo_tx_timeout).
>
> This WIP includes one unrelated change: including the ifindex in
> the route lookup; that would be a separate patch if it ends up being
> submitted (it handles the edge case of a route on an interface other
> than the bond matching before the bond itself).
The problem with dev_get_stats() is that it will contain hardware
statistics, which may be completely unrelated to the number of packets
software has sent. DSA can offload the Linux bridge and the bonding
driver as a bridge port, so dev_get_stats() on a physical port will
return the total number of packets that egressed that port, even without
CPU intervention. Again, even easier to fool if "have we sent an ARP"
is what the bonding driver actually wants to know.
Powered by blists - more mailing lists