netdev - Re: dsa/mv88e6xxx: leaking packets on MV88E6341 switch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87y2in94o7.fsf@waldekranz.com>
Date:   Thu, 26 Nov 2020 22:41:44 +0100
From:   Tobias Waldekranz <tobias@...dekranz.com>
To:     Peter Vollmer <peter.vollmer@...il.com>,
        Andrew Lunn <andrew@...n.ch>
Cc:     Network Development <netdev@...r.kernel.org>
Subject: Re: dsa/mv88e6xxx: leaking packets on MV88E6341 switch

On Wed, Nov 25, 2020 at 15:09, Peter Vollmer <peter.vollmer@...il.com> wrote:
> Hi,
> I am still investigating the leaking packets problem we are having
> with a setup of an armada-3720 SOC and a 88E6341 switch ( connected
> via cpu port 5 , SGMII ,C_MODE=0xB, 2500 BASE-x). I now jumped to the
> net-next kernel (5.10.0-rc4) and can now use the nice mv88e6xxx_dump
> tool for switch register dumping.
>
> The described packet leaking still occurs, in a setup of ports
> lan0-lan3 (switch ports 1-4)  joined in a bridge br0.
>
> Here is my setup, ports lan0-3 are DSA ports coming in through eth1,
> eth0 is a single 88E1512 phy connected to RGMII
> root@DUT:~# brctl show
> bridge name     bridge id               STP enabled     interfaces
> br0             8000.fafb2fbbd4c6       no              lan0
>                                                         lan1
>                                                         lan2
>                                                         lan3
> root@DUT:~# ip a
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
> group default qlen 1000
>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>     inet 127.0.0.1/8 scope host lo
>        valid_lft forever preferred_lft forever
> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
> group default qlen 1024
>     link/ether c2:49:bc:0d:a8:57 brd ff:ff:ff:ff:ff:ff
>     inet 192.168.90.100/24 brd 192.168.90.255 scope global eth0
>        valid_lft forever preferred_lft forever
> 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1504 qdisc mq state UP
> group default qlen 1024
>     link/ether fa:fb:2f:bb:d4:c6 brd ff:ff:ff:ff:ff:ff
> 4: sit0@...E: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
>     link/sit 0.0.0.0 brd 0.0.0.0
> 5: lan0@...1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
> master br0 state UP group default qlen 1000
>     link/ether fa:fb:2f:bb:d4:c6 brd ff:ff:ff:ff:ff:ff
> 6: lan1@...1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
> master br0 state UP group default qlen 1000
>     link/ether fa:fb:2f:bb:d4:c6 brd ff:ff:ff:ff:ff:ff
> 7: lan2@...1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
> master br0 state UP group default qlen 1000
>     link/ether fa:fb:2f:bb:d4:c6 brd ff:ff:ff:ff:ff:ff
> 8: lan3@...1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc
> noqueue master br0 state LOWERLAYERDOWN group default qlen 1000
>     link/ether fa:fb:2f:bb:d4:c6 brd ff:ff:ff:ff:ff:ff
> 9: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
> UP group default qlen 1000
>     link/ether fa:fb:2f:bb:d4:c6 brd ff:ff:ff:ff:ff:ff
>     inet 172.16.4.1/16 brd 172.16.4.255 scope global br0
>        valid_lft forever preferred_lft forever
>
> - pinging from client0 (connected to lan0 ) to the bridge IP, the ping
> requests (only the requests) are also seen on client1 connected to
> lan1

This is the expected behavior of the current implementation I am
afraid. It stems from the fact that the CPU responds to the echo request
(or to any other request for that matter) with a FROM_CPU. This means
that no learning takes place, and the SA of br0 will thus never reach
the switch's FDB. So while client0 knows the MAC of br0, the switch
(very counter-intuitively) does not.

The result is that the unicast echo request sent by client0 is flooded
as unknown unicast by the switch. This way it reaches the CPU but also,
as you have discovered, all other ports that allow unknown unicast to
egress.

> - the other effect looks more suspicious: when pinging from br0 to the
> IP of client0 connected to port lan0, after ~280 seconds client1
> connected to lan1 will also see the ping replies of client0 (only the
> replies). And after another ~300seconds this stops again. This repeats
> in a cycle .

I can not account for the oscillating effect. In my system I see a
continuous stream of respones from client0 when tcpdumping on
client1. That said, 300s is the default age timeout so I would start by
diffing the ATU when you are seeing replies on client1 and when you are
not.

The echo responses reaches client1 for the same reason as above. It is
just that now that client0 is the pinged host, the responses are
addressed to br0's MAC, which will be classified as unknown unicast.

> I see these problems since at least kernel version 5.4.y, but not with
> the old linux-marvel kernel sources
> (https://github.com/MarvellEmbeddedProcessors/linux-marvell.git)
> Can somebody using this switch in SGMII mode perhaps reproduce this ?

My system is connected to the CPU over RGMII, but I would guess that
that has no impact on this issue. The CPU is not responsible for
flooding the packets to client1, the switch does that autonomously. If
you tcpdump with "-Q out" on your base interface, I bet you will only
see FROM_CPUs to the port that client0 is connected to.

> One thing I noticed is that due to .tag_protocol=DSA_TAG_PROTO_EDSA
> for the 88E6341 switch, EgressMode (port control 0x4 , bit13:12) is
> set to an unsupported value of 0x3 ("reserved for future use" in the
> switch spec). See the value in row 04 Port control for port 5 = 0x373f
> in the following dump:
>
> root@...ard3:~# mv88e6xxx_dump --ports
> Using device <mdio_bus/d0032004.mdio-mii:01>
>                            0    1    2    3    4    5
> 00 Port status            0006 9e4f 9e4f 9e4f 100f 0f0b
> 01 Physical control       0003 0003 0003 0003 0003 20ff
> 02 Jamming control        ff00 0000 0000 0000 0000 0000
> 03 Switch ID              3410 3410 3410 3410 3410 3410
> 04 Port control           007c 043f 043f 043f 043c 373f
> 05 Port control 1         0000 0000 0000 0000 0000 0000
> 06 Port base VLAN map     007e 007c 007a 0076 006e 005f
> 07 Def VLAN ID & Prio     0001 0000 0000 0000 0000 0000
> 08 Port control 2         2080 0080 0080 0080 0080 0080
> 09 Egress rate control    0001 0001 0001 0001 0001 0001
> 0a Egress rate control 2  8000 0000 0000 0000 0000 0000
> 0b Port association vec   0001 0002 0004 0008 0010 0000
> 0c Port ATU control       0000 0000 0000 0000 0000 0000
> 0d Override               0000 0000 0000 0000 0000 0000
> 0e Policy control         0000 0000 0000 0000 0000 0000
> 0f Port ether type        9100 9100 9100 9100 9100 dada
> 10 Reserved               0000 0000 0000 0000 0000 0000
> 11 Reserved               0000 0000 0000 0000 0000 0000
> 12 Reserved               0000 0000 0000 0000 0000 0000
> 13 Reserved               0000 0000 0000 0000 0000 0000
> 14 Reserved               0000 0000 0000 0000 0000 0000
> 15 Reserved               0000 0000 0000 0000 0000 0000
> 16 LED control            0000 10eb 10eb 10eb 10eb 0000
> 17 Reserved               0000 0000 0000 0000 0000 0000
> 18 Tag remap low          3210 3210 3210 3210 3210 3210
> 19 Tag remap high         7654 7654 7654 7654 7654 7654
> 1a Reserved               0000 0000 0000 0000 5ea0 a100
> 1b Queue counters         8000 8000 8000 8000 8000 8000
> 1c Queue control          0000 0000 0000 0000 0000 0000
> 1d queue control 2        0000 0000 0000 0000 0000 0000
> 1e Cut through control    f000 f000 f000 f000 f000 f000
> 1f Debug counters         0000 0014 0015 0012 0000 0010
>
> I tested setting .tag_protocol=DSA_TAG_PROTO_DSA for the 6341 switch
> instead, resulting in a register setting of 04 Port control for port 5
> = 0x053f (i.e. EgressMode=Unmodified mode, frames are transmitted
> unmodified), which looks correct to me. It does not fix the above
> problem, but the change seems to make sense anyhow. Should I send a
> patch ?

This is not up to me, but my guess is that Andrew would like a patch,
yes. On 6390X, I know for a fact that setting the EgressMode to 3 does
indeed produce the behavior that was supported in older devices (like
the 6352), but there is no reason not to change it to regular DSA.