[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87y2in94o7.fsf@waldekranz.com>
Date: Thu, 26 Nov 2020 22:41:44 +0100
From: Tobias Waldekranz <tobias@...dekranz.com>
To: Peter Vollmer <peter.vollmer@...il.com>,
Andrew Lunn <andrew@...n.ch>
Cc: Network Development <netdev@...r.kernel.org>
Subject: Re: dsa/mv88e6xxx: leaking packets on MV88E6341 switch
On Wed, Nov 25, 2020 at 15:09, Peter Vollmer <peter.vollmer@...il.com> wrote:
> Hi,
> I am still investigating the leaking packets problem we are having
> with a setup of an armada-3720 SOC and a 88E6341 switch ( connected
> via cpu port 5 , SGMII ,C_MODE=0xB, 2500 BASE-x). I now jumped to the
> net-next kernel (5.10.0-rc4) and can now use the nice mv88e6xxx_dump
> tool for switch register dumping.
>
> The described packet leaking still occurs, in a setup of ports
> lan0-lan3 (switch ports 1-4) joined in a bridge br0.
>
> Here is my setup, ports lan0-3 are DSA ports coming in through eth1,
> eth0 is a single 88E1512 phy connected to RGMII
> root@DUT:~# brctl show
> bridge name bridge id STP enabled interfaces
> br0 8000.fafb2fbbd4c6 no lan0
> lan1
> lan2
> lan3
> root@DUT:~# ip a
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
> group default qlen 1000
> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> inet 127.0.0.1/8 scope host lo
> valid_lft forever preferred_lft forever
> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
> group default qlen 1024
> link/ether c2:49:bc:0d:a8:57 brd ff:ff:ff:ff:ff:ff
> inet 192.168.90.100/24 brd 192.168.90.255 scope global eth0
> valid_lft forever preferred_lft forever
> 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1504 qdisc mq state UP
> group default qlen 1024
> link/ether fa:fb:2f:bb:d4:c6 brd ff:ff:ff:ff:ff:ff
> 4: sit0@...E: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
> link/sit 0.0.0.0 brd 0.0.0.0
> 5: lan0@...1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
> master br0 state UP group default qlen 1000
> link/ether fa:fb:2f:bb:d4:c6 brd ff:ff:ff:ff:ff:ff
> 6: lan1@...1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
> master br0 state UP group default qlen 1000
> link/ether fa:fb:2f:bb:d4:c6 brd ff:ff:ff:ff:ff:ff
> 7: lan2@...1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
> master br0 state UP group default qlen 1000
> link/ether fa:fb:2f:bb:d4:c6 brd ff:ff:ff:ff:ff:ff
> 8: lan3@...1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc
> noqueue master br0 state LOWERLAYERDOWN group default qlen 1000
> link/ether fa:fb:2f:bb:d4:c6 brd ff:ff:ff:ff:ff:ff
> 9: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
> UP group default qlen 1000
> link/ether fa:fb:2f:bb:d4:c6 brd ff:ff:ff:ff:ff:ff
> inet 172.16.4.1/16 brd 172.16.4.255 scope global br0
> valid_lft forever preferred_lft forever
>
> - pinging from client0 (connected to lan0 ) to the bridge IP, the ping
> requests (only the requests) are also seen on client1 connected to
> lan1
This is the expected behavior of the current implementation I am
afraid. It stems from the fact that the CPU responds to the echo request
(or to any other request for that matter) with a FROM_CPU. This means
that no learning takes place, and the SA of br0 will thus never reach
the switch's FDB. So while client0 knows the MAC of br0, the switch
(very counter-intuitively) does not.
The result is that the unicast echo request sent by client0 is flooded
as unknown unicast by the switch. This way it reaches the CPU but also,
as you have discovered, all other ports that allow unknown unicast to
egress.
> - the other effect looks more suspicious: when pinging from br0 to the
> IP of client0 connected to port lan0, after ~280 seconds client1
> connected to lan1 will also see the ping replies of client0 (only the
> replies). And after another ~300seconds this stops again. This repeats
> in a cycle .
I can not account for the oscillating effect. In my system I see a
continuous stream of respones from client0 when tcpdumping on
client1. That said, 300s is the default age timeout so I would start by
diffing the ATU when you are seeing replies on client1 and when you are
not.
The echo responses reaches client1 for the same reason as above. It is
just that now that client0 is the pinged host, the responses are
addressed to br0's MAC, which will be classified as unknown unicast.
> I see these problems since at least kernel version 5.4.y, but not with
> the old linux-marvel kernel sources
> (https://github.com/MarvellEmbeddedProcessors/linux-marvell.git)
> Can somebody using this switch in SGMII mode perhaps reproduce this ?
My system is connected to the CPU over RGMII, but I would guess that
that has no impact on this issue. The CPU is not responsible for
flooding the packets to client1, the switch does that autonomously. If
you tcpdump with "-Q out" on your base interface, I bet you will only
see FROM_CPUs to the port that client0 is connected to.
> One thing I noticed is that due to .tag_protocol=DSA_TAG_PROTO_EDSA
> for the 88E6341 switch, EgressMode (port control 0x4 , bit13:12) is
> set to an unsupported value of 0x3 ("reserved for future use" in the
> switch spec). See the value in row 04 Port control for port 5 = 0x373f
> in the following dump:
>
> root@...ard3:~# mv88e6xxx_dump --ports
> Using device <mdio_bus/d0032004.mdio-mii:01>
> 0 1 2 3 4 5
> 00 Port status 0006 9e4f 9e4f 9e4f 100f 0f0b
> 01 Physical control 0003 0003 0003 0003 0003 20ff
> 02 Jamming control ff00 0000 0000 0000 0000 0000
> 03 Switch ID 3410 3410 3410 3410 3410 3410
> 04 Port control 007c 043f 043f 043f 043c 373f
> 05 Port control 1 0000 0000 0000 0000 0000 0000
> 06 Port base VLAN map 007e 007c 007a 0076 006e 005f
> 07 Def VLAN ID & Prio 0001 0000 0000 0000 0000 0000
> 08 Port control 2 2080 0080 0080 0080 0080 0080
> 09 Egress rate control 0001 0001 0001 0001 0001 0001
> 0a Egress rate control 2 8000 0000 0000 0000 0000 0000
> 0b Port association vec 0001 0002 0004 0008 0010 0000
> 0c Port ATU control 0000 0000 0000 0000 0000 0000
> 0d Override 0000 0000 0000 0000 0000 0000
> 0e Policy control 0000 0000 0000 0000 0000 0000
> 0f Port ether type 9100 9100 9100 9100 9100 dada
> 10 Reserved 0000 0000 0000 0000 0000 0000
> 11 Reserved 0000 0000 0000 0000 0000 0000
> 12 Reserved 0000 0000 0000 0000 0000 0000
> 13 Reserved 0000 0000 0000 0000 0000 0000
> 14 Reserved 0000 0000 0000 0000 0000 0000
> 15 Reserved 0000 0000 0000 0000 0000 0000
> 16 LED control 0000 10eb 10eb 10eb 10eb 0000
> 17 Reserved 0000 0000 0000 0000 0000 0000
> 18 Tag remap low 3210 3210 3210 3210 3210 3210
> 19 Tag remap high 7654 7654 7654 7654 7654 7654
> 1a Reserved 0000 0000 0000 0000 5ea0 a100
> 1b Queue counters 8000 8000 8000 8000 8000 8000
> 1c Queue control 0000 0000 0000 0000 0000 0000
> 1d queue control 2 0000 0000 0000 0000 0000 0000
> 1e Cut through control f000 f000 f000 f000 f000 f000
> 1f Debug counters 0000 0014 0015 0012 0000 0010
>
> I tested setting .tag_protocol=DSA_TAG_PROTO_DSA for the 6341 switch
> instead, resulting in a register setting of 04 Port control for port 5
> = 0x053f (i.e. EgressMode=Unmodified mode, frames are transmitted
> unmodified), which looks correct to me. It does not fix the above
> problem, but the change seems to make sense anyhow. Should I send a
> patch ?
This is not up to me, but my guess is that Andrew would like a patch,
yes. On 6390X, I know for a fact that setting the EgressMode to 3 does
indeed produce the behavior that was supported in older devices (like
the 6352), but there is no reason not to change it to regular DSA.
Powered by blists - more mailing lists