lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 5 Apr 2020 20:23:36 +0800
From:   DENG Qingfang <dqfext@...il.com>
To:     netdev@...r.kernel.org
Cc:     Vivien Didelot <vivien.didelot@...il.com>,
        Andrew Lunn <andrew@...n.ch>,
        Florian Fainelli <f.fainelli@...il.com>,
        Russell King <linux@...linux.org.uk>,
        Chuanhong Guo <gch981213@...il.com>,
        René van Dorst <opensource@...rst.com>,
        John Crispin <john@...ozen.org>,
        Hauke Mehrtens <hauke@...ke-m.de>,
        Stijn Segers <foss@...atilesystems.org>,
        riddlariddla@...mail.com
Subject: DSA breaks clients' roaming between switch port and host interfaces

Hello,
I found a bug of DSA that breaks WiFi clients roaming.

I set up 2 WiFi routers as AP, both of them run kernel 5.4.30 and use DSA.

        +-------------------------+
+-----------------------------+
        |                         |                            |
                      |
        |                         |                            |
                      |
        |       AP1               |                            |
AP2                   |
        |                     LAN2+--------------------------->|LAN1
                      |
        |       10.0.0.1/24       |                            |
10.0.0.2/24           |
        |                         |                            |
                      |
        |       MV88E6XXX DSA     |                            |
MT7530 DSA            |
        |                         |                            |
                      |
        |                         |                            |
                      |
        |                         |                            |
                      |
        +-------------------------+
+-----------------------------+
                     ^                                              ^
                     |                                              |
                     |                      Roams                   |
                     |                     -------------------------+
                     |
                     +------------    +-------------------+
                                      |     Wi-Fi         |
                                      |     Client        |
                                      |                   |
                                      |     10.0.0.3/24   |
                                      |                   |
                                      |                   |
                                      +-------------------+

When the client roams from AP1 to AP2, it cannot ping AP1 anymore for
a few minutes, and vice versa.

With bridge fdb I found out the part that caused the problem.
When the client is connected to AP1, bridge fdb on AP2 shows:

<client's mac> dev lan1 master br-lan
<client's mac> dev lan1 vlan 1 self

It means AP2 should talk to the client via lan1, which is correct.

After the client roams to AP2, the problem comes:

<client's mac>  dev wlan0 master br-lan
<client's mac>  dev lan1 vlan 1 self

>From iproute2 man page: "self" means the address is associated with
the port drivers fdb. Usually hardware.

The lan1 is still there, which means the kernel has updated the
forwarding table in br-lan, but forgot to delete the one in the switch
hardware.

What happens when the client now tries to talk to AP1, such as ping
10.0.0.1? I debugged with tcpdump:

1. The client sends ARP request: who-has 10.0.0.1?
2. The software part of the bridge of AP2 receives the ARP request,
updates fdb, and sends it to the CPU port
3. The switch receives the client's ARP request from the CPU port, and
floods it out of the LAN1 port. Although the source MAC address of the
request is the client's, _auto learning of the CPU port is disabled in
DSA_, so the switch does not update the MAC table.
4. AP1 receives the ARP request, then responds: 10.0.0.1 is-at <AP1's MAC>.
5. AP2's switch receives the response from LAN1, then looks it up in
the MAC table, the egress port is the same as the ingress port (LAN1).
To avoid loop, the ARP response is discarded.

If I manually delete the leftover fdb entry in the hardware via
"bridge fdb del <client's MAC> dev lan1 vlan 1", the client can talk
to AP1 immediately.
And vice versa, the mv88e6xxx has the same bug, so I think it's with
the general DSA part.

Does anyone know how to fix it?

Thanks.
Qingfang

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ