netdev - Re: [RFC PATCH 0/4] net: dsa: link aggregation support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Tue, 10 Nov 2020 20:28:57 -0800
From:   Florian Fainelli <f.fainelli@...il.com>
To:     Tobias Waldekranz <tobias@...dekranz.com>, andrew@...n.ch,
        vivien.didelot@...il.com, olteanv@...il.com
Cc:     netdev@...r.kernel.org
Subject: Re: [RFC PATCH 0/4] net: dsa: link aggregation support



On 10/27/2020 3:51 AM, Tobias Waldekranz wrote:
> This series starts by adding the generic support required to offload
> link aggregates to drivers built on top of the DSA subsystem. It then
> implements offloading for the mv88e6xxx driver, i.e. Marvell's
> LinkStreet family.
> 
> Posting this as an RFC as there are some topics that I would like
> feedback on before going further with testing. Thus far I've done some
> basic tests to verify that:
> 
> - A LAG can be used as a regular interface.
> - Bridging a LAG with other DSA ports works as expected.
> - Load balancing is done by the hardware, both in single- and
>   multi-chip systems.
> - Load balancing is dynamically reconfigured when the state of
>   individual links change.
> 
> Testing as been done on two systems:
> 
> 1. Single-chip system with one Peridot (6390X).
> 2. Multi-chip system with one Agate (6352) daisy-chained with an Opal
>    (6097F).
> 
> I would really appreciate feedback on the following:
> 
> All LAG configuration is cached in `struct dsa_lag`s. I realize that
> the standard M.O. of DSA is to read back information from hardware
> when required. With LAGs this becomes very tricky though. For example,
> the change of a link state on one switch will require re-balancing of
> LAG hash buckets on another one, which in turn depends on the total
> number of active links in the LAG. Do you agree that this is
> motivated?

Yes, this makes sense, I did something quite similar in this branch
nearly 3 years ago, it was tested to the point where the switch was
programmed correctly but I had not configured the CPU port to support
2Gbits/sec (doh) to verify that we got 2x the desired throughput:

https://github.com/ffainelli/linux/commits/b53-bond

Your patch looks definitively more complete.

> 
> The LAG driver ops all receive the LAG netdev as an argument when this
> information is already available through the port's lag pointer. This
> was done to match the way that the bridge netdev is passed to all VLAN
> ops even though it is in the port's bridge_dev. Is there a reason for
> this or should I just remove it from the LAG ops?
> 
> At least on mv88e6xxx, the exact source port is not available when
> packets are received on the CPU. The way I see it, there are two ways
> around that problem:
> 
> - Inject the packet directly on the LAG device (what this series
>   does). Feels right because it matches all that we actually know; the
>   packet came in on the LAG. It does complicate dsa_switch_rcv
>   somewhat as we can no longer assume that skb->dev is a DSA port.
> 
> - Inject the packet on "the designated port", i.e. some port in the
>   LAG. This lets us keep the current Rx path untouched. The problem is
>   that (a) the port would have to be dynamically updated to match the
>   expectations of the LAG driver (team/bond) as links are
>   enabled/disabled and (b) we would be presenting a lie because
>   packets would appear to ingress on netdevs that they might not in
>   fact have been physically received on.
> 
> (mv88e6xxx) What is the policy regarding the use of DSA vs. EDSA?  It
> seems like all chips capable of doing EDSA are using that, except for
> the Peridot.
> 
> (mv88e6xxx) The cross-chip PVT changes required to allow a LAG to
> communicate with the other ports do not feel quite right, but I'm
> unsure about what the proper way of doing it would be. Any ideas?
> 
> (mv88e6xxx) Marvell has historically used the idiosyncratic term
> "trunk" to refer to link aggregates. Somewhere around the Peridot they
> have switched and are now referring to the same registers/tables using
> the term "LAG". In this series I've stuck to using LAG for all generic
> stuff, and only used trunk for driver-internal functions. Do we want
> to rename everything to use the LAG nomenclature?

Yes please!
-- 
Florian