[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALW65jY8vvent1KmAnv2a9BTbmW5C8CHK0DpRRs73yk3L1RXLQ@mail.gmail.com>
Date: Sun, 5 Apr 2020 20:23:36 +0800
From: DENG Qingfang <dqfext@...il.com>
To: netdev@...r.kernel.org
Cc: Vivien Didelot <vivien.didelot@...il.com>,
Andrew Lunn <andrew@...n.ch>,
Florian Fainelli <f.fainelli@...il.com>,
Russell King <linux@...linux.org.uk>,
Chuanhong Guo <gch981213@...il.com>,
René van Dorst <opensource@...rst.com>,
John Crispin <john@...ozen.org>,
Hauke Mehrtens <hauke@...ke-m.de>,
Stijn Segers <foss@...atilesystems.org>,
riddlariddla@...mail.com
Subject: DSA breaks clients' roaming between switch port and host interfaces
Hello,
I found a bug of DSA that breaks WiFi clients roaming.
I set up 2 WiFi routers as AP, both of them run kernel 5.4.30 and use DSA.
+-------------------------+
+-----------------------------+
| | |
|
| | |
|
| AP1 | |
AP2 |
| LAN2+--------------------------->|LAN1
|
| 10.0.0.1/24 | |
10.0.0.2/24 |
| | |
|
| MV88E6XXX DSA | |
MT7530 DSA |
| | |
|
| | |
|
| | |
|
+-------------------------+
+-----------------------------+
^ ^
| |
| Roams |
| -------------------------+
|
+------------ +-------------------+
| Wi-Fi |
| Client |
| |
| 10.0.0.3/24 |
| |
| |
+-------------------+
When the client roams from AP1 to AP2, it cannot ping AP1 anymore for
a few minutes, and vice versa.
With bridge fdb I found out the part that caused the problem.
When the client is connected to AP1, bridge fdb on AP2 shows:
<client's mac> dev lan1 master br-lan
<client's mac> dev lan1 vlan 1 self
It means AP2 should talk to the client via lan1, which is correct.
After the client roams to AP2, the problem comes:
<client's mac> dev wlan0 master br-lan
<client's mac> dev lan1 vlan 1 self
>From iproute2 man page: "self" means the address is associated with
the port drivers fdb. Usually hardware.
The lan1 is still there, which means the kernel has updated the
forwarding table in br-lan, but forgot to delete the one in the switch
hardware.
What happens when the client now tries to talk to AP1, such as ping
10.0.0.1? I debugged with tcpdump:
1. The client sends ARP request: who-has 10.0.0.1?
2. The software part of the bridge of AP2 receives the ARP request,
updates fdb, and sends it to the CPU port
3. The switch receives the client's ARP request from the CPU port, and
floods it out of the LAN1 port. Although the source MAC address of the
request is the client's, _auto learning of the CPU port is disabled in
DSA_, so the switch does not update the MAC table.
4. AP1 receives the ARP request, then responds: 10.0.0.1 is-at <AP1's MAC>.
5. AP2's switch receives the response from LAN1, then looks it up in
the MAC table, the egress port is the same as the ingress port (LAN1).
To avoid loop, the ARP response is discarded.
If I manually delete the leftover fdb entry in the hardware via
"bridge fdb del <client's MAC> dev lan1 vlan 1", the client can talk
to AP1 immediately.
And vice versa, the mv88e6xxx has the same bug, so I think it's with
the general DSA part.
Does anyone know how to fix it?
Thanks.
Qingfang
Powered by blists - more mailing lists