lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACPK8Xd5BLiz1ePwzirtxLvSL8V8EGmJuxB0GmxyyqBRK9mSdQ@mail.gmail.com>
Date:   Mon, 23 May 2022 22:25:52 +0000
From:   Joel Stanley <joel@....id.au>
To:     Benjamin Herrenschmidt <benh@...nel.crashing.org>
Cc:     Andrew Lunn <andrew@...n.ch>,
        "David S . Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Andrew Jeffery <andrew@...id.au>,
        Networking <netdev@...r.kernel.org>,
        David Wilder <wilder@...ibm.com>,
        linux-aspeed <linux-aspeed@...ts.ozlabs.org>
Subject: Re: [PATCH net v3] net: ftgmac100: Disable hardware checksum on AST2600

On Sat, 21 May 2022 at 02:53, Benjamin Herrenschmidt
<benh@...nel.crashing.org> wrote:
>
> On Tue, 2022-05-17 at 18:52 +0930, Joel Stanley wrote:
> > The AST2600 when using the i210 NIC over NC-SI has been observed to
> > produce incorrect checksum results with specific MTU values. This was
> > first observed when sending data across a long distance set of
> > networks.
> >
> > On a local network, the following test was performed using a 1MB file
> > of random data.
>
> Can you double check with Aspeed what's going on there and whether
> there's a way to instead, identify the bad case in the TX path and do
> on-demand SW checksuming only in those cases ?

Keep in mind this is only for the NC-SI case, where the link is
limited to 100Mbit anyway.

I did some tests with the openbmc kernel; a v5.15 tree with whatever
options we have enabled there.

Averaging a few iperf3 runs I see about 92Mbit/s with hardware
checksumming enabled, and 90Mbit/s with it disabled. So we can see the
difference, and it would be good if Aspeed could find the root cause
so this only needs to be disabled when hitting the problematic path as
you say.

> Because disabling HW checksum will kill performances afaik... (doesn't
> it also end up disabling zero-copy and SG ?)

Not sure?

>
> Cheers,
> Ben.
>
> > On the receiver run this script:
> >
> >  #!/bin/bash
> >  while [ 1 ]; do
> >         # Zero the stats
> >         nstat -r  > /dev/null
> >         nc -l 9899 > test-file
> >         # Check for checksum errors
> >         TcpInCsumErrors=$(nstat | grep TcpInCsumErrors)
> >         if [ -z "$TcpInCsumErrors" ]; then
> >                 echo No TcpInCsumErrors
> >         else
> >                 echo TcpInCsumErrors = $TcpInCsumErrors
> >         fi
> >  done
> >
> > On an AST2600 system:
> >
> >  # nc <IP of  receiver host> 9899 < test-file
> >
> > The test was repeated with various MTU values:
> >
> >  # ip link set mtu 1410 dev eth0
> >
> > The observed results:
> >
> >  1500 - good
> >  1434 - bad
> >  1400 - good
> >  1410 - bad
> >  1420 - good
> >
> > The test was repeated after disabling tx checksumming:
> >
> >  # ethtool -K eth0 tx-checksumming off
> >
> > And all MTU values tested resulted in transfers without error.
> >
> > An issue with the driver cannot be ruled out, however there has been
> > no
> > bug discovered so far.
> >
> > David has done the work to take the original bug report of slow data
> > transfer between long distance connections and triaged it down to
> > this
> > test case.
> >
> > The vendor suspects this this is a hardware issue when using NC-SI.
> > The
> > fixes line refers to the patch that introduced AST2600 support.
> >
> > Reported-by: David Wilder <wilder@...ibm.com>
> > Reviewed-by: Dylan Hung <dylan_hung@...eedtech.com>
> > Signed-off-by: Joel Stanley <joel@....id.au>
> > ---
> > v3 modifies the wrapping of the commit message.
> >
> > v2 updates the commit message with confirmation from the vendor that
> > this is a hardware issue, and clarifies why the commit used in the
> > fixes
> >
> >  drivers/net/ethernet/faraday/ftgmac100.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/faraday/ftgmac100.c
> > b/drivers/net/ethernet/faraday/ftgmac100.c
> > index caf48023f8ea..5231818943c6 100644
> > --- a/drivers/net/ethernet/faraday/ftgmac100.c
> > +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> > @@ -1928,6 +1928,11 @@ static int ftgmac100_probe(struct
> > platform_device *pdev)
> >       /* AST2400  doesn't have working HW checksum generation */
> >       if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac")))
> >               netdev->hw_features &= ~NETIF_F_HW_CSUM;
> > +
> > +     /* AST2600 tx checksum with NCSI is broken */
> > +     if (priv->use_ncsi && of_device_is_compatible(np,
> > "aspeed,ast2600-mac"))
> > +             netdev->hw_features &= ~NETIF_F_HW_CSUM;
> > +
> >       if (np && of_get_property(np, "no-hw-checksum", NULL))
> >               netdev->hw_features &= ~(NETIF_F_HW_CSUM |
> > NETIF_F_RXCSUM);
> >       netdev->features |= netdev->hw_features;
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ