lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <54F41F38.1060505@numascale.com>
Date:	Mon, 02 Mar 2015 16:28:40 +0800
From:	Daniel J Blueman <daniel@...ascale.com>
To:	"Fujinaka, Todd" <todd.fujinaka@...el.com>
CC:	Steffen Persvold <sp@...ascale.com>,
	"e1000-devel@...ts.sourceforge.net" 
	<e1000-devel@...ts.sourceforge.net>, netdev@...r.kernel.org
Subject: Re: [E1000-devel] Sporadic packet loss observed with newer in-kernel
 drivers (5.2.15-k)

Hi Todd,

Following up on this, since the packet loss doesn't occur when using the 
out-of-tree driver but does when using the mainline driver, it's more 
plausible that there's a driver behavioural difference causing this.

After instrumenting MDI activity, a bunch of differences come from 
force_speed_duplex() being called when the hardware is first 
initialised, wherein hw->mac.autoneg is 0 only with the mainline driver 
along this path:

igb_setup_copper_link+0x2a5/0x2c0
igb_copper_link_setup_igp+0xb7/0x210
igb_setup_copper_link_82575+0xd4/0x180
igb_setup_link+0x36/0x1c0
igb_init_hw_82575+0xba/0x330
igb_reset+0x15f/0x5e0
igb_sriov_reinit+0x88/0xc0
igb_pci_enable_sriov+0x115/0x200
igb_probe+0x4ae/0x11a0
local_pci_probe+0x40/0xa0

The same 6 setup_copper_link() calls occur (three per on-board adapter) 
in the out-of-tree driver, however hw->mac.autoneg is always set; this 
also fits with our findings that triggering autoneg prevent the packet loss.

What's the expectation with value of hw->mac.autoneg?

Many thanks!
   Daniel

On 30/12/2014 00:41, Fujinaka, Todd wrote:
> This could be a BIOS issue as well. If you can't track this down to a specific software bug, you'll have to file the issue with Supermicro and they'll contact us if they need our help.
>
> Todd Fujinaka
> Software Application Engineer
> Networking Division (ND)
> Intel Corporation
> todd.fujinaka@...el.com
> (503) 712-4565
>
> -----Original Message-----
> From: Steffen Persvold [mailto:sp@...ascale.com]
> Sent: Friday, December 26, 2014 11:14 AM
> To: Fujinaka, Todd
> Cc: e1000-devel@...ts.sourceforge.net; Daniel J Blueman
> Subject: Re: [E1000-devel] Sporadic packet loss observed with newer in-kernel drivers (5.2.15-k)
>
> Hi Todd,
>
> I don’t think it’s related to queues/settings in the OS per se. These machines use shared-mode PHY for BMC (IPMI) access also, and when we get packet loss in the OS driver, we also see packet loss on the BMC side.
>
> What we’ve discovered is that if we do “ethtool -s eth0 autoneg on” it fixes the issue on both sides, however prior to doing this autonegotiation *is* enabled in the NIC, it just seems the “autoneg on” operation restarts something in the PHY.
>
> Weird.
>
> Cheers,
> --
> Steffen Persvold
> Chief Architect NumaChip, Numascale AS
> Tel: +47 23 16 71 88  Fax: +47 23 16 71 80 Skype: spersvold
>
>> On 19 Dec 2014, at 18:17, Fujinaka, Todd <todd.fujinaka@...el.com> wrote:
>>
>> Before you start, though, do the check for settings and number of queues being used. The issue may be as simple as that, and that shouldn't take more than a few ethtool commands.
>>
>> Todd Fujinaka
>> Software Application Engineer
>> Networking Division (ND)
>> Intel Corporation
>> todd.fujinaka@...el.com
>> (503) 712-4565
>>
>> -----Original Message-----
>> From: Steffen Persvold [mailto:sp@...ascale.com]
>> Sent: Friday, December 19, 2014 9:14 AM
>> To: Fujinaka, Todd
>> Cc: e1000-devel@...ts.sourceforge.net; Daniel J Blueman
>> Subject: Re: [E1000-devel] Sporadic packet loss observed with newer
>> in-kernel drivers (5.2.15-k)
>>
>> Hi Todd,
>>
>> Thanks for responding so quickly. It’s probably easier to bisect the changes to igb between the 3.10 kernel in-tree version (5.0.3-k) and the 3.14 kernel in-tree version (5.0.5-k), rather than diffing on out-of-tree 5.2.15 and in-kernel 5.2.15-k (I tried, the changes are huge, mostly because out-of-tree code has a lot of compatibility stuff in it naturally).
>>
>> I’ll let you know.
>>
>>
>> Cheers,
>> --
>> Steffen Persvold
>> Chief Architect NumaChip, Numascale AS
>> Tel: +47 23 16 71 88  Fax: +47 23 16 71 80 Skype: spersvold
>>
>>> On 19 Dec 2014, at 17:23, Fujinaka, Todd <todd.fujinaka@...el.com> wrote:
>>>
>>> The in-kernel and out-of-tree driver aren't exactly the same and there could be differences enforced by the community that create that difference. For example - and I'm just making this up - there could be a difference in the dropping or passing of packets with bad checksums.
>>>
>>> More likely are differences in the default settings of the two drivers. You may want to check that first.
>>>
>>> If you have a clearly reproducible use case, we can try looking into this, but we are a bit limited in the number of Opteron systems we have in-house.
>>>
>>> Todd Fujinaka
>>> Software Application Engineer
>>> Networking Division (ND)
>>> Intel Corporation
>>> todd.fujinaka@...el.com
>>> (503) 712-4565
>>>
>>> -----Original Message-----
>>> From: Steffen Persvold [mailto:sp@...ascale.com]
>>> Sent: Thursday, December 18, 2014 10:36 PM
>>> To: e1000-devel@...ts.sourceforge.net
>>> Cc: Daniel J Blueman
>>> Subject: [E1000-devel] Sporadic packet loss observed with newer
>>> in-kernel drivers (5.2.15-k)
>>>
>>> Hi,
>>>
>>> We’re currently working with a cluster of SuperMicro H8QGL (http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8QGL-iF.cfm) based systems which has two of the 82576 chips :
>>>
>>> 02:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network
>>> Connection (rev 01)
>>> 02:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network
>>> Connection (rev 01)
>>>
>>>
>>> Consequently the kernel use the igb network driver for this.
>>>
>>> We have observed with kernels 3.14 and onwards that we sometimes get packet-loss (due to corrupted packets). 3.14 uses igb version 5.0.5-k :
>>>
>>> [    0.000000] Linux version 3.14.27-numascale27+ (sp@...ld-ubuntu) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #2 SMP Thu Dec 18 08:00:08 CET 2014
>>> ...
>>> [    6.338430] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.0.5-k
>>> [    6.345394] igb: Copyright (c) 2007-2013 Intel Corporation.
>>>
>>>
>>> If we revert back to 3.10 kernels (3.10.63), which uses the 5.0.3-k igb driver we have no packet loss scenarios :
>>>
>>> [    0.000000] Linux version 3.10.63-numascale27+ (sp@...ld-ubuntu) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #1 SMP Wed Dec 17 15:56:25 CET 2014
>>> ...
>>> [    6.749783] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.0.3-k
>>> [    6.756740] igb: Copyright (c) 2007-2013 Intel Corporation.
>>>
>>>
>>> I have also tested the most recent kernel; 3.18.1 :
>>>
>>> [    0.000000] Linux version 3.18.1-numascale27+ (sp@...ld-ubuntu) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #1 SMP Thu Dec 18 08:36:03 CET 2014
>>> ...
>>> [    8.010000] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.2.15-k
>>> [    8.010000] igb: Copyright (c) 2007-2014 Intel Corporation.
>>>
>>> Also in this version we observe packet loss/corrupted packets.
>>>
>>> While in the failed state we observe with ethtool -S (snapshot taken on 3.14 with igb-5.0.5-k) :
>>>
>>>     rx_short_length_errors: 235
>>>     rx_errors: 235
>>>     rx_length_errors: 235
>>>     rx_queue_6_csum_err: 256
>>>
>>>
>>> Now to the interesting part :) If I download igb-5.2.15.tar.gz from the sourceforge site (http://sourceforge.net/projects/e1000/files/igb%20stable/5.2.15/igb-5.2.15.tar.gz/download), and build this for 3.18.1, the packet loss is gone. Which doesn’t make sense at all since 3.18.1 already has 5.2.15 driver (albeit an in-kernel variant). This also applies if we apply the same driver version to the 3.14 kernel (replacing 5.0.5-k).
>>>
>>>
>>> Any idea what might be causing this ? Any insight you might have would be highly appreciated.
-- 
Daniel J Blueman
Principal Software Engineer, Numascale
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ