[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220626141139.kbwhpgmwzp7rpxgy@skbuf>
Date: Sun, 26 Jun 2022 17:11:39 +0300
From: Vladimir Oltean <olteanv@...il.com>
To: Horatiu Vultur <horatiu.vultur@...rochip.com>
Cc: linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
UNGLinuxDriver@...rochip.com, davem@...emloft.net,
edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com,
linux@...linux.org.uk
Subject: Re: [PATCH net-next 5/8] net: lan966x: Add lag support for lan966x.
Hi Horatiu,
Just casually browsing through the patches. A comment below.
On Sun, Jun 26, 2022 at 03:04:48PM +0200, Horatiu Vultur wrote:
> Add link aggregation hardware offload support for lan966x
>
> Signed-off-by: Horatiu Vultur <horatiu.vultur@...rochip.com>
> ---
> .../net/ethernet/microchip/lan966x/Makefile | 2 +-
> .../ethernet/microchip/lan966x/lan966x_lag.c | 296 ++++++++++++++++++
> .../ethernet/microchip/lan966x/lan966x_main.h | 28 ++
> .../microchip/lan966x/lan966x_switchdev.c | 78 ++++-
> 4 files changed, 388 insertions(+), 16 deletions(-)
> create mode 100644 drivers/net/ethernet/microchip/lan966x/lan966x_lag.c
>
> diff --git a/drivers/net/ethernet/microchip/lan966x/Makefile b/drivers/net/ethernet/microchip/lan966x/Makefile
> index fd2e0ebb2427..0c22c86bdaa9 100644
> --- a/drivers/net/ethernet/microchip/lan966x/Makefile
> +++ b/drivers/net/ethernet/microchip/lan966x/Makefile
> @@ -8,4 +8,4 @@ obj-$(CONFIG_LAN966X_SWITCH) += lan966x-switch.o
> lan966x-switch-objs := lan966x_main.o lan966x_phylink.o lan966x_port.o \
> lan966x_mac.o lan966x_ethtool.o lan966x_switchdev.o \
> lan966x_vlan.o lan966x_fdb.o lan966x_mdb.o \
> - lan966x_ptp.o lan966x_fdma.o
> + lan966x_ptp.o lan966x_fdma.o lan966x_lag.o
> diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_lag.c b/drivers/net/ethernet/microchip/lan966x/lan966x_lag.c
> new file mode 100644
> index 000000000000..c721a05d44d2
> --- /dev/null
> +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_lag.c
> @@ -0,0 +1,296 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +
> +#include "lan966x_main.h"
> +
> +static void lan966x_lag_set_aggr_pgids(struct lan966x *lan966x)
> +{
> + u32 visited = GENMASK(lan966x->num_phys_ports - 1, 0);
> + int p, lag, i;
> +
> + /* Reset destination and aggregation PGIDS */
> + for (p = 0; p < lan966x->num_phys_ports; ++p)
> + lan_wr(ANA_PGID_PGID_SET(BIT(p)),
> + lan966x, ANA_PGID(p));
> +
> + for (p = PGID_AGGR; p < PGID_SRC; ++p)
> + lan_wr(ANA_PGID_PGID_SET(visited),
> + lan966x, ANA_PGID(p));
> +
> + /* The visited ports bitmask holds the list of ports offloading any
> + * bonding interface. Initially we mark all these ports as unvisited,
> + * then every time we visit a port in this bitmask, we know that it is
> + * the lowest numbered port, i.e. the one whose logical ID == physical
> + * port ID == LAG ID. So we mark as visited all further ports in the
> + * bitmask that are offloading the same bonding interface. This way,
> + * we set up the aggregation PGIDs only once per bonding interface.
> + */
> + for (p = 0; p < lan966x->num_phys_ports; ++p) {
> + struct lan966x_port *port = lan966x->ports[p];
> +
> + if (!port || !port->bond)
> + continue;
> +
> + visited &= ~BIT(p);
> + }
> +
> + /* Now, set PGIDs for each active LAG */
> + for (lag = 0; lag < lan966x->num_phys_ports; ++lag) {
> + struct lan966x_port *port = lan966x->ports[lag];
> + int num_active_ports = 0;
> + struct net_device *bond;
> + unsigned long bond_mask;
> + u8 aggr_idx[16];
> +
> + if (!port || !port->bond || (visited & BIT(lag)))
> + continue;
> +
> + bond = port->bond;
> + bond_mask = lan966x_lag_get_mask(lan966x, bond, true);
> +
> + for_each_set_bit(p, &bond_mask, lan966x->num_phys_ports) {
> + lan_wr(ANA_PGID_PGID_SET(bond_mask),
> + lan966x, ANA_PGID(p));
> + aggr_idx[num_active_ports++] = p;
> + }
This incorrect logic seems to have been copied from ocelot from before
commit a14e6b69f393 ("net: mscc: ocelot: fix incorrect balancing with
down LAG ports").
The issue is that you calculate bond_mask with only_active_ports=true.
This means the for_each_set_bit() will not iterate through the inactive
LAG ports, and won't set the bond_mask as the PGID destination for those
ports.
That isn't what is desired; as explained in that commit, inactive LAG
ports should be removed via the aggregation PGIDs and not via the
destination PGIDs. Otherwise, an FDB entry targeted towards the
LAG (effectively towards the "primary" LAG port, whose logical port ID
gives the LAG ID) will not egress even the "secondary" LAG port if the
primary's link is down.
> +
> + for (i = PGID_AGGR; i < PGID_SRC; ++i) {
> + u32 ac;
> +
> + ac = lan_rd(lan966x, ANA_PGID(i));
> + ac &= ~bond_mask;
> + /* Don't do division by zero if there was no active
> + * port. Just make all aggregation codes zero.
> + */
> + if (num_active_ports)
> + ac |= BIT(aggr_idx[i % num_active_ports]);
> + lan_wr(ANA_PGID_PGID_SET(ac),
> + lan966x, ANA_PGID(i));
> + }
> +
> + /* Mark all ports in the same LAG as visited to avoid applying
> + * the same config again.
> + */
> + for (p = lag; p < lan966x->num_phys_ports; p++) {
> + struct lan966x_port *port = lan966x->ports[p];
> +
> + if (!port)
> + continue;
> +
> + if (port->bond == bond)
> + visited |= BIT(p);
> + }
> + }
> +}
Powered by blists - more mailing lists