netdev - RE: 82571EB: Detected Hardware Unit Hang

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <061C8A8601E8EE4CA8D8FD6990CEA8913348B0E7@ORSMSX102.amr.corp.intel.com>
Date:	Wed, 14 Nov 2012 03:43:33 +0000
From:	"Dave, Tushar N" <tushar.n.dave@...el.com>
To:	Li Yu <raise.sail@...il.com>
CC:	Joe Jin <joe.jin@...cle.com>,
	"e1000-devel@...ts.sf.net" <e1000-devel@...ts.sf.net>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Mary Mcgrath <mary.mcgrath@...cle.com>
Subject: RE: 82571EB: Detected Hardware Unit Hang

>-----Original Message-----
>From: Li Yu [mailto:raise.sail@...il.com]
>Sent: Tuesday, November 13, 2012 7:37 PM
>To: Dave, Tushar N
>Cc: Joe Jin; e1000-devel@...ts.sf.net; netdev@...r.kernel.org; linux-
>kernel@...r.kernel.org; Mary Mcgrath
>Subject: Re: 82571EB: Detected Hardware Unit Hang
>
>于 2012年11月09日 04:35, Dave, Tushar N 写道:
>>> -----Original Message-----
>>> From: netdev-owner@...r.kernel.org
>>> [mailto:netdev-owner@...r.kernel.org]
>>> On Behalf Of Joe Jin
>>> Sent: Wednesday, November 07, 2012 10:25 PM
>>> To: e1000-devel@...ts.sf.net
>>> Cc: netdev@...r.kernel.org; linux-kernel@...r.kernel.org; Mary
>>> Mcgrath
>>> Subject: 82571EB: Detected Hardware Unit Hang
>>>
>>> Hi list,
>>>
>>> IHAC reported "82571EB Detected Hardware Unit Hang" on HP ProLiant
>>> DL360 G6, and have to reboot the server to recover:
>>>
>>> e1000e 0000:06:00.1: eth3: Detected Hardware Unit Hang:
>>>   TDH                  <1a>
>>>   TDT                  <1a>
>>>   next_to_use          <1a>
>>>   next_to_clean        <18>
>>> buffer_info[next_to_clean]:
>>>   time_stamp           <10047a74e>
>>>   next_to_watch        <18>
>>>   jiffies              <10047a88c>
>>>   next_to_watch.status <1>
>>> MAC Status             <80383>
>>> PHY Status             <792d>
>>> PHY 1000BASE-T Status  <3800>
>>> PHY Extended Status    <3000>
>>> PCI Status             <10>
>>>
>>> With newer kernel 2.0.0.1 the issue still reproducible.
>>>
>>> Device info:
>>> 06:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit
>>> Ethernet Controller (Copper) (rev 06)
>>> 06:00.1 0200: 8086:10bc (rev 06)
>>>
>>> I compared lspci output before and after the issue, different as below:
>>> 06:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit
>>> Ethernet Controller (Copper) (rev 06)
>>> 	Subsystem: Hewlett-Packard Company NC364T PCI Express Quad Port
>>> Gigabit Server Adapter
>>> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
>>> Stepping- SERR- FastB2B- DisINTx-
>>> -	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>>> +	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>> +<TAbort- <MAbort- >SERR- <PERR- INTx+
>>
>> Are you sure this is not similar issue as before that you reported.
>> i.e.
>> On Mon, 2012-07-09 at 16:51 +0800, Joe Jin wrote:
>>> I'm seeing a Unit Hang even with the latest e1000e driver 2.0.0 when
>>> doing scp test. this issue is easy do reproduced on SUN FIRE X2270
>>> M2, just copy a big file (>500M) from another server will hit it at
>once.
>>
>> All devices in path from root complex to 82571, should have *same* max
>payload size otherwise it can cause hang.
>> Can you double check this?
>>
>
>We also found such hang problem on 82599EB (ixgbe driver) in RHEL6.3
>kernel, we ever tried to upgrade to latest version (3.8.21 or 3.10.17),
>but it still happens.
>
>Is it probably also due to wrong "max payload size" set in BIOS?
>
It could be or could not be. I would suggest please create another thread with that issue as these two devices are significantly different.

-Tushar