netdev - Bonded multicast traffic causing packet loss when using DSA with Microchip KSZ9567 switch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAFZh4h-JVWt80CrQWkFji7tZJahMfOToUJQgKS5s0_=9zzpvYQ@mail.gmail.com>
Date:   Mon, 25 Jul 2022 11:12:34 -0400
From:   Brian Hutchinson <b.hutchman@...il.com>
To:     netdev@...r.kernel.org
Cc:     andrew@...n.ch, f.fainelli@...il.com,
        Vladimir Oltean <olteanv@...il.com>, woojung.huh@...rochip.com,
        UNGLinuxDriver@...rochip.com, j.vosburgh@...il.com,
        vfalico@...il.com, andy@...yhouse.net, davem@...emloft.net,
        kuba@...nel.org
Subject: Bonded multicast traffic causing packet loss when using DSA with
 Microchip KSZ9567 switch

I'm experiencing large packet loss when using multicast with bonded
DSA interfaces.

I have the first two ports of a ksz9567 setup as individual network
interfaces in device tree that shows up in the system as lan1 and lan2
and I have those two interfaces bonded in an "active-backup" bond with
the intent of having each slave interface go to redundant switches.
I've tried connecting both interfaces to the same switch and also to
separate switches that are then connected together.  In the latter
setup, if I disconnect the two switches I don't see the problem.

The kernel bonding documentation says "active-backup" will work with
any layer2 switch and doesn't need smart/managed switches configured
in any particular way.  I'm currently using dumb switches.

I can readily reproduce the packet loss issue running iperf to
generate multicast traffic.

If I ping my board with the ksz9567 from a PC while iperf is
generating multicast packets, I get tons of packet loss.  If I run
heavily loaded iperf tests that are not multicast I don't notice the
packet loss problem.

Here is ifconfig view of interfaces:

bond1: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST>  mtu 1500  metric 1
       inet 192.168.1.6  netmask 255.255.255.0  broadcast 0.0.0.0
       inet6 fd1c:a799:6054:0:60e2:5ff:fe75:6716  prefixlen 64
scopeid 0x0<global>
       inet6 fe80::60e2:5ff:fe75:6716  prefixlen 64  scopeid 0x20<link>
       ether 62:e2:05:75:67:16  txqueuelen 1000  (Ethernet)
       RX packets 1264782  bytes 84198600 (80.2 MiB)
       RX errors 0  dropped 40  overruns 0  frame 0
       TX packets 2466062  bytes 3705565532 (3.4 GiB)
       TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1506  metric 1
       inet6 fe80::f21f:afff:fe6b:b218  prefixlen 64  scopeid 0x20<link>
       ether f0:1f:af:6b:b2:18  txqueuelen 1000  (Ethernet)
       RX packets 1264782  bytes 110759022 (105.6 MiB)
       RX errors 0  dropped 0  overruns 0  frame 0
       TX packets 2466097  bytes 3710503019 (3.4 GiB)
       TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lan1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 1500  metric 1
       ether 62:e2:05:75:67:16  txqueuelen 1000  (Ethernet)
       RX packets 543771  bytes 37195218 (35.4 MiB)
       RX errors 0  dropped 20  overruns 0  frame 0
       TX packets 1058336  bytes 1593030865 (1.4 GiB)
       TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lan2: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 1500  metric 1
       ether 62:e2:05:75:67:16  txqueuelen 1000  (Ethernet)
       RX packets 721011  bytes 47003382 (44.8 MiB)
       RX errors 0  dropped 0  overruns 0  frame 0
       TX packets 1407726  bytes 2112534667 (1.9 GiB)
       TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536  metric 1
       inet 127.0.0.1  netmask 255.0.0.0
       inet6 ::1  prefixlen 128  scopeid 0x10<host>
       loop  txqueuelen 1000  (Local Loopback)
       RX packets 394  bytes 52052 (50.8 KiB)
       RX errors 0  dropped 0  overruns 0  frame 0
       TX packets 394  bytes 52052 (50.8 KiB)
       TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Is what I'm trying to do even valid with dumb switches or is the
bonding documentation wrong/outdated regarding active-backup bonds not
needing smart switches?

I know there's probably not going to be anyone out there that can
reproduce my setup to look at this problem but I'm willing to run
whatever tests and provide all the info/feedback I can.

I'm running 5.10.69 on iMX8MM with custom Linux OS based on Yocto
Dunfell release.

I know that DSA master interface eth0 is not to be accessed directly
yet I see eth0 is getting an ipv6 address and I'm wondering if that
could cause a scenario where networking stack could attempt to use
eth0 directly for traffic.

Regards,

Brian