netdev - Re: [RFC PATCH 0/4] net: dsa: link aggregation support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <878sbrv19i.fsf@waldekranz.com>
Date:   Tue, 27 Oct 2020 21:53:45 +0100
From:   Tobias Waldekranz <tobias@...dekranz.com>
To:     Vladimir Oltean <olteanv@...il.com>
Cc:     Andrew Lunn <andrew@...n.ch>, Marek Behun <marek.behun@....cz>,
        vivien.didelot@...il.com, f.fainelli@...il.com,
        netdev@...r.kernel.org
Subject: Re: [RFC PATCH 0/4] net: dsa: link aggregation support

On Tue, Oct 27, 2020 at 22:02, Vladimir Oltean <olteanv@...il.com> wrote:
> On Tue, Oct 27, 2020 at 08:37:58PM +0100, Tobias Waldekranz wrote:
>> >> In order for this to work on transmit, we need to add forward offloading
>> >> to the bridge so that we can, for example, send one FORWARD from the CPU
>> >> to send an ARP broadcast to swp1..4 instead of four FROM_CPUs.
>
> [...]
>
>> In a single-chip system I agree that it is not needed, the CPU can do
>> the load-balancing in software. But in order to have the hardware do
>> load-balancing on a switch-to-switch LAG, you need to send a FORWARD.
>> 
>> FROM_CPUs would just follow whatever is in the device mapping table. You
>> essentially have the inverse of the TO_CPU problem, but on Tx FROM_CPU
>> would make up 100% of traffic.
>
> Woah, hold on, could you explain in more detail for non-expert people
> like myself to understand.
>
> So FROM_CPU frames (what tag_edsa.c uses now in xmit) can encode a
> _single_ destination port in the frame header.

Correct.

> Whereas the FORWARD frames encode a _source_ port in the frame header.
> You inject FORWARD frames from the CPU port, and you just let the L2
> forwarding process select the adequate destination ports (or LAG, if
> any ports are under one) _automatically_. The reason why you do this, is
> because you want to take advantage of the switch's flooding abilities in
> order to replicate the packet into 4 packets. So you will avoid cloning
> that packet in the bridge in the first place.

Exactly so.

> But correct me if I'm wrong, sending a FORWARD frame from the CPU is a
> slippery slope, since you're never sure that the switch will perform the
> replication exactly as you intended to. The switch will replicate a
> FORWARD frame by looking up the FDB, and we don't even attempt in DSA to
> keep the FDB in sync between software and hardware. And that's why we
> send FROM_CPU frames in tag_edsa.c and not FORWARD frames.

I'm not sure if I agree that it's a slippery slope. The whole point of
the switchdev effort is to sync the switch with the bridge. We trust the
fabric to do all the steps you describe for _all_ other ports.

> What you are really looking for is hardware where the destination field
> for FROM_CPU packets is not a single port index, but a port mask.
>
> Right?

Sure, if that's available it's great. Chips from Marvell's Prestera line
can do this, and many others I'm sure. Alas, LinkStreet devices can not,
and I still want the best performance I can get i that case.

> Also, this problem is completely orthogonal to LAG? Where does LAG even
> come into play here?

It matters if you setup switch-to-switch LAGs. FROM_CPU packets encode
the final device/port, and switches will forward those packet according
to their device mapping tables, which selects a _single_ local port to
use to reach a remote device/port. So all FROM_CPU packets to a given
device/port will always travel through the same set of ports.

In the FORWARD case, you look up the destination in the FDB of each
device, find that it is located on the other side of a LAG, and the
hardware will perform load-balancing.

>> Other than that there are some things that, while strictly speaking
>> possible to do without FORWARDs, become much easier to deal with:
>> 
>> - Multicast routing. This is one case where performance _really_ suffers
>>   from having to skb_clone() to each recipient.
>> 
>> - Bridging between virtual interfaces and DSA ports. Typical example is
>>   an L2 VPN tunnel or one end of a veth pair. On FROM_CPUs, the switch
>>   can not perform SA learning, which means that once you bridge traffic
>>   from the VPN out to a DSA port, the return traffic will be classified
>>   as unknown unicast by the switch and be flooded everywhere.
>
> And how is this going to solve that problem? You mean that the switch
> learns only from FORWARD, but not from FROM_CPU?

Yes, so when you send the FORWARD the switch knows that the station is
located somewhere behind the CPU port. It does not know exactly where,
i.e. it has no knowledge of the VPN tunnel or anything. It just directs
it towards the CPU and the bridge's FDB will take care of the rest.

> Why don't you attempt to solve this more generically somehow? Your
> switch is not the only one that can't perform source address learning
> for injected traffic, there are tons more, some are not even DSA. We
> can't have everybody roll their own solution.

Who said anything about rolling my solution? I'm going for a generic
solution where a netdev can announce to the bridge it is being added to
that it can offload forwarding of packets for all ports belonging to the
same switchdev device. Most probably modeled after how the macvlan
offloading stuff is done.

In the case of mv88e6xxx that would kill two birds with one stone -
great! In other cases you might have to have the DSA subsystem listen to
new neighbors appearing on the bridge and sync those to hardware or
something. Hopefully someone working with that kind of hardware can
solve that problem.