lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f7e1f498-d90b-1685-dc02-4c24273957a7@i2se.com>
Date:   Tue, 7 Jan 2020 18:30:00 +0100
From:   Stefan Wahren <stefan.wahren@...e.com>
To:     Eric Dumazet <eric.dumazet@...il.com>,
        RENARD Pierre-Francois <pfrenard@...il.com>,
        nsaenzjulienne@...e.de, woojung.huh@...rochip.com,
        UNGLinuxDriver@...rochip.com, netdev@...r.kernel.org,
        linux-usb@...r.kernel.org
Subject: Re: [RPI 3B+ / TSO / lan78xx ]

Hi Eric,

Am 07.01.20 um 18:04 schrieb Eric Dumazet:
>
> On 1/7/20 5:32 AM, RENARD Pierre-Francois wrote:
>> Hello all
>>
>> I am facing an issue related to Raspberry PI 3B+ and onboard ethernet card.
>>
>> When doing a huge transfer (more than 1GB) in a row, transfer hanges and failed after a few minutes.
>>
>>
>> I have two ways to reproduce this issue
>>
>>
>> using NFS (v3 or v4)
>>
>>     dd if=/dev/zero of=/NFSPATH/file bs=4M count=1000 status=progress
>>
>>
>>     we can see that at some point dd hangs and becomes non interrutible (no way to ctrl-c it or kill it)
>>
>>     after afew minutes, dd dies and a bunch of NFS server not responding / NFS server is OK are seens into the journal
>>
>>
>> Using SCP
>>
>>     dd if=/dev/zero of=/tmp/file bs=4M count=1000
>>
>>     scp /tmp/file user@...ver:/directory
>>
>>
>>     scp hangs after 1GB and after a few minutes scp is failing with message "client_loop: send disconnect: Broken pipe lostconnection"
>>
>>
>>
>>
>> It appears, this is a known bug relatted to TCP Segmentation Offload & Selective Acknowledge.
>>
>> disabling this TSO (ethtool -K eth0 tso off & ethtool -K eth0 gso off) solves the issue.
>>
>> A patch has been created to disable the feature by default by the raspberry team and is by default applied wihtin raspbian.
>>
>> comment from the patch :
>>
>> /* TSO seems to be having some issue with Selective Acknowledge (SACK) that
>>  * results in lost data never being retransmitted.
>>  * Disable it by default now, but adds a module parameter to enable it for
>>  * debug purposes (the full cause is not currently understood).
>>  */
>>
>>
>> For reference you can find
>>
>> a link to the issue I created yesterday : https://github.com/raspberrypi/linux/issues/3395
>>
>> links to raspberry dev team : https://github.com/raspberrypi/linux/issues/2482 & https://github.com/raspberrypi/linux/issues/2449
>>
>>
>>
>> If you need me to test things, or give you more informations, I ll be pleased to help.
>>
>
> I doubt TSO and SACK have a serious generic bug like that.
>
> Most likely the TSO implementation on the driver/NIC has a bug .

Yes, the issue isn't reproducible with the Raspberry Pi 3B and the same
kernel (without +). The main difference between both boards is the
different ethernet USB chip:

Raspberry Pi 3B: smsc95xx
Raspberry Pi 3B+: lan78xx

>
> Anyway you do not provide a kernel version, I am not sure what you expect from us.

It's Linux 5.4.7 (arm64) as in the provided github link. I asked
Pierre-Francois to report this issue here, so the issue get addressed
properly. Currently this very old bug not fixed in mainline and the
Raspberry Pi vendor tree uses a workaround (disable TSO).

Stefan


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ