netdev - Re: Bonded multicast traffic causing packet loss when using DSA with Microchip KSZ9567 switch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFZh4h9i2e6Mvnnh0Go7Z3CZqRWVKVfBtAGbi05tYKBLqfig5Q@mail.gmail.com>
Date:   Thu, 28 Jul 2022 21:11:30 -0400
From:   Brian Hutchinson <b.hutchman@...il.com>
To:     Vladimir Oltean <olteanv@...il.com>
Cc:     Florian Fainelli <f.fainelli@...il.com>, netdev@...r.kernel.org,
        andrew@...n.ch, woojung.huh@...rochip.com,
        UNGLinuxDriver@...rochip.com, j.vosburgh@...il.com,
        vfalico@...il.com, andy@...yhouse.net, davem@...emloft.net,
        kuba@...nel.org
Subject: Re: Bonded multicast traffic causing packet loss when using DSA with
 Microchip KSZ9567 switch

Hi Vladimir,

On Thu, Jul 28, 2022 at 6:29 PM Vladimir Oltean <olteanv@...il.com> wrote:
>
> Hi Brian,
>
> On Thu, Jul 28, 2022 at 03:14:17PM -0400, Brian Hutchinson wrote:
> > > So I mentioned in a recent PM that I was looking at other vendor DSA
> > > drivers and I see code that smells like some of the concerns you have.
> > >
> > > I did some grepping on /drivers/net/dsa and while I get hits for
> > > things like 'flood', 'multicast', 'igmp' etc. in marvel and broadcom
> > > drivers ... I get nothing on microchip.  Hardware documentation has
> > > whole section on ingress and egress rate limiting and shaping but
> > > doesn't look like drivers use any of it.
> > >
> > > Example:
> > >
> > > /drivers/net/dsa/mv88e6xxx$ grep -i multicast *.c
> > > chip.c: { "in_multicasts",              4, 0x07, STATS_TYPE_BANK0, },
> > > chip.c: { "out_multicasts",             4, 0x12, STATS_TYPE_BANK0, },
> > > chip.c:                  is_multicast_ether_addr(addr))
> > > chip.c: /* Upstream ports flood frames with unknown unicast or multicast DA */
> > > chip.c:  * forwarding of unknown unicasts and multicasts.
> > > chip.c:         dev_err(ds->dev, "p%d: failed to load multicast MAC address\n",
> > > chip.c:                                  bool unicast, bool multicast)
> > > chip.c:                                                       multicast);
> > > global2.c:      /* Consider the frames with reserved multicast destination
> > > global2.c:      /* Consider the frames with reserved multicast destination
> > > port.c:                              bool unicast, bool multicast)
> > > port.c: if (unicast && multicast)
> > > port.c: else if (multicast)
> > > port.c:                                       int port, bool multicast)
> > > port.c: if (multicast)
> > > port.c:                              bool unicast, bool multicast)
> > > port.c: return mv88e6185_port_set_default_for
> > > ward(chip, port, multicast);
> > >
> > > Wondering if some needed support is missing.
>
> I know it's tempting to look at other drivers and think "whoah, how much
> code these guys have! and I went for the cheaper switch!", but here it
> really does not matter in the slightest.
>
> Your application, as far as I understand it, requires the KSZ switch to
> operate as a simple port multiplexer, with no hardware offloading of
> packet processing (essentially all ports operate as what we call
> 'standalone'). It's quite sad that this mode didn't work with the KSZ
> driver. But what you're looking at, 'multicast', 'igmp', things like
> that, only matter if you instruct the switch to forward packets in
> hardware, trap packets for control protocols, things like that.
> Not applicable.

Our use case in this instance is basically to use the two ports
exposed by hardware as NICs (standalone) bonded together for
redundancy ... and doing 1588 and 1PPS.

It's kinda funny, our hardware guy worked with Microchip before we
spun a board and they said get this switch.  Later they told us we
picked the wrong switch ha, ha.  This was before I got involved ...
but we have another board that uses it too as just a dumb switch so
that influenced things I think ... or I probably would of suggested a
different one.

>
> > > Will try your patch and report back.
> >
> > I applied Vladimir's patch (had to edit it to change ksz9477.c to
> > ksz9477_main.c) ;)
> >
> > I did the same steps as before but ran multicast iperf a bit longer as
> > I wasn't noticing packet loss this time.  I also fat fingered options
> > on first iperf run so if you focus on the number of datagrams iperf
> > sent below, the RX counts won't match that.
> >
> > On PC ran: iperf -s -u -B 239.0.0.67%enp4s0 -i 1
> > On my board I ran: iperf -B 192.168.1.6 -c 239.0.0.67 -u --ttl 5 -t
> > 3600 -b 1M -i 1 (I noticed I had a copy/paste error in previous email
> > ... no I didn't use a -ttl of 3000!!!).  Again I didn't let iperf run
> > for 3600 sec., ctrl-c it early.
> >
> > Pings from external PC to board while iperf multicast test was going
> > on resulted in zero dropped packets.
>
> Can you please reword this so that I can understand beyond any doubt
> that you're saying that the patch has fixed the problem?

Sure, let me try again.  Prior to Vladimir's patch, I noticed that
running our application, which uses multicast sockets or LinuxPTP
(ptp4l) would cause packet loss only when lan2 was connected.  Didn't
experience packet loss if we had just lan1 connected ... or stopped
our application and ptp4l.  I watched wireshark and noticed only
difference in traffic pattern was multicast.  The did iperf tests to
see if it was truly a multicast issue or a loading issue.  I could run
iperf with multicast packets and pretty much cause 100% packet loss.

After applying Vladimir's patch to turn off STP, the same iperf test
with multicast packets didn't experience any packet loss.  It works
like a champ.  You're the greatest!  How's that?

>
> > .
> > .
> > .
> > 64 bytes from 192.168.1.6: icmp_seq=98 ttl=64 time=1.94 ms
> > 64 bytes from 192.168.1.6: icmp_seq=99 ttl=64 time=1.91 ms
> > 64 bytes from 192.168.1.6: icmp_seq=100 ttl=64 time=0.713 ms
> > 64 bytes from 192.168.1.6: icmp_seq=101 ttl=64 time=1.95 ms
> > 64 bytes from 192.168.1.6: icmp_seq=102 ttl=64 time=1.26 ms
> > ^C
> > --- 192.168.1.6 ping statistics ---
> > 102 packets transmitted, 102 received, 0% packet loss, time 101265ms
> > rtt min/avg/max/mdev = 0.253/1.451/2.372/0.414 ms
> >
> > ... I also noticed that the board's ping time greatly improved too.
> > I've noticed ping times are usually over 2ms and I'm not sure why or
> > what to do about it.
>
> So they're usually over 2 ms now, or were before? I see 1.95 ms, that's
> not too far.

So "before" patch pings in earlier post looks like avg time of 2.054ms
and "after" patch test looks like avg time of 1.451ms so that got my
attention.  I guess more stuff is rattling around inside this switch.

>
> I think "rteval" / "cyclictest" / "perf" are the kind of tools you need
> to look at, if you want to improve this RTT.

Indeed.  I have Ftrace/trace-cmd and LTTng working.

>
> > iperf on board sent 9901 datagrams:
> >
> > .
> > .
> > .
> > [  3] 108.0-109.0 sec   128 KBytes  1.05 Mbits/sec
> > [  3] 109.0-110.0 sec   129 KBytes  1.06 Mbits/sec
> > [  3] 110.0-111.0 sec   128 KBytes  1.05 Mbits/sec
> > ^C[  3]  0.0-111.0 sec  13.9 MBytes  1.05 Mbits/sec
> > [  3] Sent 9901 datagrams
> >
> > ethtool statistics:
> >
> > ethtool -S eth0 | grep -v ': 0'
> > NIC statistics:
> >     tx_packets: 32713
> >     tx_broadcast: 2
> >     tx_multicast: 32041
> >     tx_65to127byte: 719
> >     tx_128to255byte: 30
> >     tx_1024to2047byte: 31964
> >     tx_octets: 48598874
> >     IEEE_tx_frame_ok: 32713
> >     IEEE_tx_octets_ok: 48598874
> >     rx_packets: 33260
> >     rx_broadcast: 378
> >     rx_multicast: 32209
> >     rx_65to127byte: 1140
> >     rx_128to255byte: 136
> >     rx_256to511byte: 20
> >     rx_1024to2047byte: 31964
> >     rx_octets: 48624055
> >     IEEE_rx_frame_ok: 33260
> >     IEEE_rx_octets_ok: 48624055
> >     p06_rx_bcast: 2
> >     p06_rx_mcast: 32041
> >     p06_rx_ucast: 670
> >     p06_rx_65_127: 719
> >     p06_rx_128_255: 30
> >     p06_rx_1024_1522: 31964
> >     p06_tx_bcast: 378
> >     p06_tx_mcast: 32209
> >     p06_tx_ucast: 673
> >     p06_rx_total: 48598874
> >     p06_tx_total: 48624055
>
> (unrelated: the octet counts reported by the FEC match those of the KSZ switch; I'm impressed)
>

Always a plus!

> > # ethtool -S lan1 | grep -v ': 0'
> > NIC statistics:
> >     tx_packets: 32711
> >     tx_bytes: 48401459
> >     rx_packets: 1011
> >     rx_bytes: 84159
> >     rx_bcast: 207
> >     rx_mcast: 111
> >     rx_ucast: 697
> >     rx_64_or_less: 234
> >     rx_65_127: 699
> >     rx_128_255: 70
> >     rx_256_511: 12
> >     tx_bcast: 2
> >     tx_mcast: 32015
> >     tx_ucast: 694
> >     rx_total: 103241
> >     tx_total: 48532849
> >     rx_discards: 4
> >
> > # ethtool -S lan2 | grep -v ': 0'
> > NIC statistics:
> >     rx_packets: 32325
> >     rx_bytes: 47915110
> >     rx_bcast: 209
> >     rx_mcast: 32120
> >     rx_64_or_less: 212
> >     rx_65_127: 55
> >     rx_128_255: 86
> >     rx_256_511: 12
> >     rx_1024_1522: 31964
> >     rx_total: 48497844
> >     rx_discards: 4
>
> Still 4 rx_discards here and on lan1. Not sure exactly when those
> packets were discarded, or what those were.
>
> Generally what I do to observe this kind of thing is to run
> watch -n 1 "ethtool -S lan1 | grep -v ': 0'"
>
> and see what actually increments, in real time.
>
> It would be helpful if you could definitely say that those drops were
> there even prior to you running the test (packets received by MAC while
> port was down?), or if we need to look further into the problem there.
>

I'll do some more tests.  I'll commit it and run our application with
it and see if we are still getting packet loss.

> > ifconfig stats: (2 dropped packets on lan2.  Last time lan1 and lan2
> > about roughly same RX counts, this time lan1 significantly less)
>
> I've no idea where the 'dropped' packets as reported by ifconfig come
> from. I'm almost certain it's not from DSA.

Thanks everyone!  Thank you again Vladimir.

Sorry I hosed up and posted html to the list once and got bounced.  I
was out of the office and trying to respond from my phone.

I'll keep testing and report back again after we get more stick time.

Regards,

Brian