[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220802014553.rtyzpkdvwnqje44l@skbuf>
Date: Tue, 2 Aug 2022 01:45:54 +0000
From: Vladimir Oltean <vladimir.oltean@....com>
To: Jay Vosburgh <jay.vosburgh@...onical.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Andrew Lunn <andrew@...n.ch>,
Vivien Didelot <vivien.didelot@...il.com>,
Florian Fainelli <f.fainelli@...il.com>,
Jonathan Toppins <jtoppins@...hat.com>,
Veaceslav Falico <vfalico@...il.com>,
Andy Gospodarek <andy@...yhouse.net>,
Hangbin Liu <liuhangbin@...il.com>,
Jamal Hadi Salim <jhs@...atatu.com>,
Cong Wang <xiyou.wangcong@...il.com>,
Jiri Pirko <jiri@...nulli.us>,
Nikolay Aleksandrov <razor@...ckwall.org>,
Stephen Hemminger <stephen@...workplumber.org>
Subject: Re: [PATCH v3 net 1/4] net: bonding: replace dev_trans_start() with
the jiffies of the last ARP/NS
On Mon, Aug 01, 2022 at 06:04:55PM -0700, Jay Vosburgh wrote:
> The ARP monitor in general is pretty easy to fool (which was
> part of the impetus for adding the "arp_validate" logic). Ultimately it
> was simpler to have the ARP monitor logic be "interface sent something
> AND received an appropriate ARP" since the "sent something" came for
> free from ->trans_start (which over time became less useful for this
> purpose).
>
> And, I'm not saying your patch isn't better, rather that what I
> was intending to do is minimize the change in behavior. My concern is
> that some change in semantics will break existing configurations that
> rely on the old behavior. Then, the question becomes whether the broken
> configuration was reasonable or not.
>
> I haven't thought of anything that seems reasonable thus far;
> the major change looks to be that the new logic in your patch presumes
> that arp_xmit cannot fail, so if some device was discarding all TX
> packets, the new logic would update "last_rx" regardless.
I don't see why it matters that my logic presumes that arp_xmit cannot fail.
The bonding driver's logic doesn't put an equality sign anywhere between
"arp_xmit succeeded" and "the packet actually reached the target".
If it would, it would have a very broken understanding of Ethernet networks.
For all practical purposes, updating last_tx in the bonding driver
rather than letting the qdisc do it should make no perceivable
difference at all, even if the driver was to drop all TX packets as you
say.
> >The problem with dev_get_stats() is that it will contain hardware
> >statistics, which may be completely unrelated to the number of packets
> >software has sent. DSA can offload the Linux bridge and the bonding
> >driver as a bridge port, so dev_get_stats() on a physical port will
> >return the total number of packets that egressed that port, even without
> >CPU intervention. Again, even easier to fool if "have we sent an ARP"
> >is what the bonding driver actually wants to know.
>
> I'm not well versed in how DSA works, but if a physical port is,
> well, discrete from the system to a degree, why would it be part of a
> bond running on the host?
>
> Put another way, what sort of topology / plumbing makes sense
> for DSA in conjunction with bonding?
I won't insist on the details because it's going to be a never ending
story otherwise. The point with DSA and switchdev in general is that,
even though we're talking about network accelerating chips that are
"discrete from the system to a degree", they are integrated to the
network stack such that a user wouldn't know that they're running on
dedicated switching hardware rather than plain old software forwarding
between any other physical Ethernet cards. The deployment scripts would
look more or less the same, they would just work faster and with less
CPU involvement.
If you want to enable autonomous forwarding between 2 switch ports,
those 2 have net devices, which you put under the same plain old bridge
device. If you want to add one more logical port to the bridging domain
which is comprised of 2 physical ports in a LAG, you create a bonding
interface and put the net devices of those 2 other physical ports under
the bond, then you put the bond under the bridge. The DSA or switchdev
driver monitors the netdev adjacency events that take place and takes
note, in a "transparent offload" manner.
When a physical port is (directly or indirectly) under a bridge device,
traffic towards it will be a mix between traffic originated by the host
and traffic originated by other ports in the same bridge/VLAN that must
reach there.
The same overall goal of "transparent offload" is what makes it
desirable for DSA and switchdev drivers to report to dev_get_stats()
that their port counters are actually the mix of autonomously forwarded
flows and host terminated flows.
The conclusion is that using dev_get_stats() is just about as
inconclusive as dropping the last_tx time of ARP altogether, since with
offloading drivers, the hardware counters will be as far a metric from
the answer as you can get (you can have tens of millions of packets per
second forwarded without software intervention, and the bonding driver
would think: oh, how many ARPs!).
Powered by blists - more mailing lists