lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <56D201E2.4060504@baker-net.org.uk>
Date:	Sat, 27 Feb 2016 20:06:58 +0000
From:	Adam Baker <linux@...er-net.org.uk>
To:	Ezequiel Garcia <ezequiel@...guardiasur.com.ar>,
	Andrew Lunn <andrew@...n.ch>,
	Thomas Schlöter <thomas@...loeter.net>
Cc:	Thomas Petazzoni <thomas.petazzoni@...e-electrons.com>,
	Karl Beldan <karl.beldan@...ierawaves.com>,
	netdev@...r.kernel.org, philipp@...ilie-kirchhofer.de,
	Martin Michlmayr <tbm@...ius.com>,
	Linux ARM Kernel <linux-arm-kernel@...ts.infradead.org>
Subject: Re: net: mv643xx: interface does not transmit after some time

On 11/02/16 14:38, Ezequiel Garcia wrote:
> (let's expand the Cc a bit)
>
> On 10 February 2016 at 19:57, Andrew Lunn <andrew@...n.ch> wrote:
>> On Wed, Feb 10, 2016 at 07:40:54PM +0100, Thomas Schlöter wrote:
>>>
>>>> Am 08.02.2016 um 19:49 schrieb Thomas Schlöter <thomas@...loeter.net>:
>>>>
>>>>
>>>>> Am 07.02.2016 um 22:07 schrieb Thomas Schlöter <thomas@...loeter.net>:
>>>>>
>>>>> Am 07.02.2016 um 21:35 schrieb Andrew Lunn <andrew@...n.ch>:
>>>>>>
>>>>>>>> FWIW, we had a similar bug report in Debian recently:
>>>>>>>> https://lists.debian.org/debian-arm/2016/01/msg00098.html
>>>>>>
>>>>>> Hi Thomas
>>>>>>
>>>>>> I this thread, Ian Campbell mentions a patch. Please could you try
>>>>>> that patch and see if it fixes your problem.
>>>>>>
>>>>>> Thanks
>>>>>>    Andrew
>>>>>
>>>>> Hi Andrew,
>>>>>
>>>>> I just applied the patch and the NAS is now running it. I???ll try to crash it tonight and keep you informed whether it worked.
>>>>>
>>>>> Thanks
>>>>>     Thomas
>>>>
>>>> Hi Andrew,
>>>>
>>>> the patch did not fix the problem. After 1.2 GiB RX and 950 MiB TX, the interface crashed again.
>>>>
>>>> Now I switched off RX/TX offload just to make sure we are talking about the same problem. If we are, the interface should be stable without offload, right?
>>>>
>>>>      Thomas
>>>
>>> Okay, so I have installed ethtool and switched off all offload features available. Now the NAS is running rock solid for two days. I backed up my Mac using Time Machine / netatalk (450 GiB transferred) and some Linux machines via NFS (100 GiB total) without a problem.
>>>
>>> How much code is used for mv643xx offload functionality?
>>> Is it possible to debug things in the driver and figure out what happens during the crash?
>>> Is the hardware offload interface proprietary or reverse engineered or is it a well known API that can be analyzed?
>>
>> Hi Thomas
>>
>> Ezequiel Garcia probably knows this part of the driver and hardware
>> the best...
>>
>
> The TCP segmentation offload (TSO) implemented in this driver is
> mostly a software thing.
>
> I'm CCing Karl and Philipp, who have fixed subtle issues in the TSO
> path, and may be able to help figure this one out.
>

Hi,

Had this issue occur again today. In my case it seems to be triggered by 
large NFSv4 transfers.

I'm running 4.4 plus Nicolas Schichan's patch at
https://patchwork.ozlabs.org/patch/573334/

There is a thread a http://forum.doozan.com/read.php?2,17404 suggesting 
that this has been broken since at least 3.16.

I first spotted the issue when upgrading from 3.11 to 4.4.

Looking at 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/drivers/net/ethernet/marvell/mv643xx_eth.c 
I see 2014-05-22 as the date TSO support was first added which is 
shortly before the merge window opened for 3.16. I'm therefore guessing 
that TSO has been problematic since it's introduction.

Regards

Adam


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ