netdev - Re: Bonded multicast traffic causing packet loss when using DSA with Microchip KSZ9567 switch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <fd16ebb3-2435-ef01-d9f1-b873c9c0b389@gmail.com>
Date:   Mon, 25 Jul 2022 14:35:40 -0700
From:   Florian Fainelli <f.fainelli@...il.com>
To:     Brian Hutchinson <b.hutchman@...il.com>, netdev@...r.kernel.org
Cc:     andrew@...n.ch, Vladimir Oltean <olteanv@...il.com>,
        woojung.huh@...rochip.com, UNGLinuxDriver@...rochip.com,
        j.vosburgh@...il.com, vfalico@...il.com, andy@...yhouse.net,
        davem@...emloft.net, kuba@...nel.org
Subject: Re: Bonded multicast traffic causing packet loss when using DSA with
 Microchip KSZ9567 switch

On 7/25/22 08:12, Brian Hutchinson wrote:
> I'm experiencing large packet loss when using multicast with bonded
> DSA interfaces.
> 
> I have the first two ports of a ksz9567 setup as individual network
> interfaces in device tree that shows up in the system as lan1 and lan2
> and I have those two interfaces bonded in an "active-backup" bond with
> the intent of having each slave interface go to redundant switches.
> I've tried connecting both interfaces to the same switch and also to
> separate switches that are then connected together.  In the latter
> setup, if I disconnect the two switches I don't see the problem.
> 
> The kernel bonding documentation says "active-backup" will work with
> any layer2 switch and doesn't need smart/managed switches configured
> in any particular way.  I'm currently using dumb switches.
> 
> I can readily reproduce the packet loss issue running iperf to
> generate multicast traffic.
> 
> If I ping my board with the ksz9567 from a PC while iperf is
> generating multicast packets, I get tons of packet loss.  If I run
> heavily loaded iperf tests that are not multicast I don't notice the
> packet loss problem.
> 
> Here is ifconfig view of interfaces:
> 
> bond1: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST>  mtu 1500  metric 1
>        inet 192.168.1.6  netmask 255.255.255.0  broadcast 0.0.0.0
>        inet6 fd1c:a799:6054:0:60e2:5ff:fe75:6716  prefixlen 64
> scopeid 0x0<global>
>        inet6 fe80::60e2:5ff:fe75:6716  prefixlen 64  scopeid 0x20<link>
>        ether 62:e2:05:75:67:16  txqueuelen 1000  (Ethernet)
>        RX packets 1264782  bytes 84198600 (80.2 MiB)
>        RX errors 0  dropped 40  overruns 0  frame 0
>        TX packets 2466062  bytes 3705565532 (3.4 GiB)
>        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> 
> eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1506  metric 1
>        inet6 fe80::f21f:afff:fe6b:b218  prefixlen 64  scopeid 0x20<link>
>        ether f0:1f:af:6b:b2:18  txqueuelen 1000  (Ethernet)
>        RX packets 1264782  bytes 110759022 (105.6 MiB)
>        RX errors 0  dropped 0  overruns 0  frame 0
>        TX packets 2466097  bytes 3710503019 (3.4 GiB)
>        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> 
> lan1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 1500  metric 1
>        ether 62:e2:05:75:67:16  txqueuelen 1000  (Ethernet)
>        RX packets 543771  bytes 37195218 (35.4 MiB)
>        RX errors 0  dropped 20  overruns 0  frame 0
>        TX packets 1058336  bytes 1593030865 (1.4 GiB)
>        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> 
> lan2: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 1500  metric 1
>        ether 62:e2:05:75:67:16  txqueuelen 1000  (Ethernet)
>        RX packets 721011  bytes 47003382 (44.8 MiB)
>        RX errors 0  dropped 0  overruns 0  frame 0
>        TX packets 1407726  bytes 2112534667 (1.9 GiB)
>        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> 
> lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536  metric 1
>        inet 127.0.0.1  netmask 255.0.0.0
>        inet6 ::1  prefixlen 128  scopeid 0x10<host>
>        loop  txqueuelen 1000  (Local Loopback)
>        RX packets 394  bytes 52052 (50.8 KiB)
>        RX errors 0  dropped 0  overruns 0  frame 0
>        TX packets 394  bytes 52052 (50.8 KiB)
>        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> 
> Is what I'm trying to do even valid with dumb switches or is the
> bonding documentation wrong/outdated regarding active-backup bonds not
> needing smart switches?
> 
> I know there's probably not going to be anyone out there that can
> reproduce my setup to look at this problem but I'm willing to run
> whatever tests and provide all the info/feedback I can.
> 
> I'm running 5.10.69 on iMX8MM with custom Linux OS based on Yocto
> Dunfell release.
> 
> I know that DSA master interface eth0 is not to be accessed directly
> yet I see eth0 is getting an ipv6 address and I'm wondering if that
> could cause a scenario where networking stack could attempt to use
> eth0 directly for traffic.

This is a red herring, we cannot tell the network stack without much special casing that the DSA network device must only transport tagged traffic to/from the switch, so the IPv6 stack still happily generates a link local address for your adapter.

Any chance of getting the outputs of ethtool -S for lan1 and lan2, and eth0 so we could possibly glean something from the hardware maintained statistics?
-- 
Florian