[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87blgnv4rt.fsf@waldekranz.com>
Date: Tue, 27 Oct 2020 20:37:58 +0100
From: Tobias Waldekranz <tobias@...dekranz.com>
To: Vladimir Oltean <olteanv@...il.com>
Cc: Andrew Lunn <andrew@...n.ch>, Marek Behun <marek.behun@....cz>,
vivien.didelot@...il.com, f.fainelli@...il.com,
netdev@...r.kernel.org
Subject: Re: [RFC PATCH 0/4] net: dsa: link aggregation support
On Tue, Oct 27, 2020 at 21:00, Vladimir Oltean <olteanv@...il.com> wrote:
> On Tue, Oct 27, 2020 at 07:25:16PM +0100, Tobias Waldekranz wrote:
>> > 1) trunk user ports, with team/bonding controlling it
>> > 2) trunk DSA ports, i.e. the ports between switches in a D in DSA setup
>> > 3) trunk CPU ports.
> [...]
>> I think that (2) and (3) are essentially the same problem, i.e. creating
>> LAGs out of DSA links, be they switch-to-switch or switch-to-cpu
>> connections. I think you are correct that the CPU port can not be a
>> LAG/trunk, but I believe that limitation only applies to TO_CPU packets.
>
> Which would still be ok? They are called "slow protocol PDUs" for a reason.
Oh yes, completely agree. That was the point I was trying to make :)
>> In order for this to work on transmit, we need to add forward offloading
>> to the bridge so that we can, for example, send one FORWARD from the CPU
>> to send an ARP broadcast to swp1..4 instead of four FROM_CPUs.
>
> That surely sounds like an interesting (and tough to implement)
> optimization to increase the throughput, but why would it be _needed_
> for things to work? What's wrong with 4 FROM_CPU packets?
We have internal patches that do this, and I can confirm that it is
tough :) I really would like to figure out a way to solve this, that
would also be acceptable upstream. I have some ideas, it is on my TODO.
In a single-chip system I agree that it is not needed, the CPU can do
the load-balancing in software. But in order to have the hardware do
load-balancing on a switch-to-switch LAG, you need to send a FORWARD.
FROM_CPUs would just follow whatever is in the device mapping table. You
essentially have the inverse of the TO_CPU problem, but on Tx FROM_CPU
would make up 100% of traffic.
Other than that there are some things that, while strictly speaking
possible to do without FORWARDs, become much easier to deal with:
- Multicast routing. This is one case where performance _really_ suffers
from having to skb_clone() to each recipient.
- Bridging between virtual interfaces and DSA ports. Typical example is
an L2 VPN tunnel or one end of a veth pair. On FROM_CPUs, the switch
can not perform SA learning, which means that once you bridge traffic
from the VPN out to a DSA port, the return traffic will be classified
as unknown unicast by the switch and be flooded everywhere.
Powered by blists - more mailing lists