[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <641d19c9-cd2d-fb63-de86-150d01bdb17e@gmail.com>
Date: Sat, 12 Dec 2020 19:48:59 -0800
From: Florian Fainelli <f.fainelli@...il.com>
To: Vladimir Oltean <vladimir.oltean@....com>,
Andrew Lunn <andrew@...n.ch>,
Vivien Didelot <vivien.didelot@...il.com>,
Jakub Kicinski <kuba@...nel.org>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, bridge@...ts.linux-foundation.org,
Roopa Prabhu <roopa@...dia.com>,
Nikolay Aleksandrov <nikolay@...dia.com>,
"David S. Miller" <davem@...emloft.net>
Cc: DENG Qingfang <dqfext@...il.com>,
Tobias Waldekranz <tobias@...dekranz.com>,
Marek Behun <marek.behun@....cz>,
Russell King - ARM Linux admin <linux@...linux.org.uk>,
Alexandra Winter <wintera@...ux.ibm.com>,
Jiri Pirko <jiri@...nulli.us>,
Ido Schimmel <idosch@...sch.org>,
Claudiu Manoil <claudiu.manoil@....com>
Subject: Re: [PATCH v2 net-next 5/6] net: dsa: listen for
SWITCHDEV_{FDB,DEL}_ADD_TO_DEVICE on foreign bridge neighbors
On 12/12/2020 6:40 PM, Vladimir Oltean wrote:
> Some DSA switches (and not only) cannot learn source MAC addresses from
> packets injected from the CPU. They only perform hardware address
> learning from inbound traffic.
>
> This can be problematic when we have a bridge spanning some DSA switch
> ports and some non-DSA ports (which we'll call "foreign interfaces" from
> DSA's perspective).
>
> There are 2 classes of problems created by the lack of learning on
> CPU-injected traffic:
> - excessive flooding, due to the fact that DSA treats those addresses as
> unknown
> - the risk of stale routes, which can lead to temporary packet loss
>
> To illustrate the second class, consider the following situation, which
> is common in production equipment (wireless access points, where there
> is a WLAN interface and an Ethernet switch, and these form a single
> bridging domain).
>
> AP 1:
> +------------------------------------------------------------------------+
> | br0 |
> +------------------------------------------------------------------------+
> +------------+ +------------+ +------------+ +------------+ +------------+
> | swp0 | | swp1 | | swp2 | | swp3 | | wlan0 |
> +------------+ +------------+ +------------+ +------------+ +------------+
> | ^ ^
> | | |
> | | |
> | Client A Client B
> |
> |
> |
> +------------+ +------------+ +------------+ +------------+ +------------+
> | swp0 | | swp1 | | swp2 | | swp3 | | wlan0 |
> +------------+ +------------+ +------------+ +------------+ +------------+
> +------------------------------------------------------------------------+
> | br0 |
> +------------------------------------------------------------------------+
> AP 2
>
> - br0 of AP 1 will know that Clients A and B are reachable via wlan0
> - the hardware fdb of a DSA switch driver today is not kept in sync with
> the software entries on other bridge ports, so it will not know that
> clients A and B are reachable via the CPU port UNLESS the hardware
> switch itself performs SA learning from traffic injected from the CPU.
> Nonetheless, a substantial number of switches don't.
> - the hardware fdb of the DSA switch on AP 2 may autonomously learn that
> Client A and B are reachable through swp0. Therefore, the software br0
> of AP 2 also may or may not learn this. In the example we're
> illustrating, some Ethernet traffic has been going on, and br0 from AP
> 2 has indeed learnt that it can reach Client B through swp0.
>
> One of the wireless clients, say Client B, disconnects from AP 1 and
> roams to AP 2. The topology now looks like this:
>
> AP 1:
> +------------------------------------------------------------------------+
> | br0 |
> +------------------------------------------------------------------------+
> +------------+ +------------+ +------------+ +------------+ +------------+
> | swp0 | | swp1 | | swp2 | | swp3 | | wlan0 |
> +------------+ +------------+ +------------+ +------------+ +------------+
> | ^
> | |
> | Client A
> |
> |
> | Client B
> | |
> | v
> +------------+ +------------+ +------------+ +------------+ +------------+
> | swp0 | | swp1 | | swp2 | | swp3 | | wlan0 |
> +------------+ +------------+ +------------+ +------------+ +------------+
> +------------------------------------------------------------------------+
> | br0 |
> +------------------------------------------------------------------------+
> AP 2
>
> - br0 of AP 1 still knows that Client A is reachable via wlan0 (no change)
> - br0 of AP 1 will (possibly) know that Client B has left wlan0. There
> are cases where it might never find out though. Either way, DSA today
> does not process that notification in any way.
> - the hardware FDB of the DSA switch on AP 1 may learn autonomously that
> Client B can be reached via swp0, if it receives any packet with
> Client 1's source MAC address over Ethernet.
> - the hardware FDB of the DSA switch on AP 2 still thinks that Client B
> can be reached via swp0. It does not know that it has roamed to wlan0,
> because it doesn't perform SA learning from the CPU port.
>
> Now Client A contacts Client B.
> AP 1 routes the packet fine towards swp0 and delivers it on the Ethernet
> segment.
> AP 2 sees a frame on swp0 and its fdb says that the destination is swp0.
> Hairpinning is disabled => drop.
>
> This problem comes from the fact that these switches have a 'blind spot'
> for addresses coming from software bridging. The generic solution is not
> to assume that hardware learning can be enabled somehow, but to listen
> to more bridge learning events. It turns out that the bridge driver does
> learn in software from all inbound frames, in __br_handle_local_finish.
> A proper SWITCHDEV_FDB_ADD_TO_DEVICE notification is emitted for the
> addresses serviced by the bridge on 'foreign' interfaces. The software
> bridge also does the right thing on migration, by notifying that the old
> entry is deleted, so that does not need to be special-cased in DSA. When
> it is deleted, we just need to delete our static FDB entry towards the
> CPU too, and wait.
>
> The problem is that DSA currently only cares about SWITCHDEV_FDB_ADD_TO_DEVICE
> events received on its own interfaces, such as static FDB entries.
>
> Luckily we can change that, and DSA can listen to all switchdev FDB
> add/del events in the system and figure out if those events were emitted
> by a bridge that spans at least one of DSA's own ports. In case that is
> true, DSA will also offload that address towards its own CPU port, in
> the eventuality that there might be bridge clients attached to the DSA
> switch who want to talk to the station connected to the foreign
> interface.
>
> In terms of implementation, we need to keep the fdb_info->added_by_user
> check for the case where the switchdev event was targeted directly at a
> DSA switch port. But we don't need to look at that flag for snooped
> events. So the check is currently too late, we need to move it earlier.
> This also simplifies the code a bit, since we avoid uselessly allocating
> and freeing switchdev_work.
>
> We could probably do some improvements in the future. For example,
> multi-bridge support is rudimentary at the moment. If there are two
> bridges spanning a DSA switch's ports, and both of them need to service
> the same MAC address, then what will happen is that the migration of one
> of those stations will trigger the deletion of the FDB entry from the
> CPU port while it is still used by other bridge. That could be improved
> with reference counting but is left for another time.
>
> This behavior needs to be enabled at driver level by setting
> ds->learning_broken_on_cpu_port = true. This is because we don't want to
> inflict a potential performance penalty (accesses through MDIO/I2C/SPI
> are expensive) to hardware that really doesn't need it because address
> learning on the CPU port works there.
>
> Reported-by: DENG Qingfang <dqfext@...il.com>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@....com>
Reviewed-by: Florian Fainelli <f.fainelli@...il.com>
The implementation is much simpler than I though it would be, nice! Just
in case you need to spin a v2, I would probably name the flag
"learning_on_cpu_port_challenged", or preferably
"no_learning_on_cpu_port", the term "broken" is a bit subjective IMHO
(although honestly, why not learn from the CPU port though...)
--
Florian
Powered by blists - more mailing lists