lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 22 Jul 2014 08:25:13 -0700
From:	Alexander Duyck <alexander.h.duyck@...el.com>
To:	Andrew Cooks <acooks@...il.com>,
	"Fujinaka, Todd" <todd.fujinaka@...el.com>
CC:	Dmitry Lifshitz <lifshitz@...pulab.co.il>,
	netdev <netdev@...r.kernel.org>,
	"e1000-devel@...ts.sf.net" <e1000-devel@...ts.sf.net>,
	Igor Grinberg <grinberg@...pulab.co.il>,
	Linux NICS <Linux-nics@...tope.jf.intel.com>
Subject: Re: [E1000-devel] [linux-nics] Problem: 82574L device (e1000e driver):
 Reset adapter unexpectedly / transmit queue 0 timed out

>>>> # lspci -vvnnk:
>>>> 01:00.0 Ethernet controller [0200]: Intel Corporation 82574L Gigabit Network Connection [8086:10d3]
>>>>         Subsystem: Intel Corporation Device [8086:0000]
>>>>         Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
>>>>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>         Interrupt: pin A routed to IRQ 16
>>>>         Region 0: [virtual] Memory at c1900000 (32-bit, non-prefetchable) [size=128K]
>>>>         Region 1: [virtual] Memory at c1800000 (32-bit, non-prefetchable) [size=1M]
>>>>         Region 2: I/O ports at 7000 [size=32]
>>>>         Region 3: [virtual] Memory at c1920000 (32-bit, non-prefetchable) [size=16K]
>>>>         [virtual] Expansion ROM at c1940000 [disabled] [size=256K]
>>>>         Capabilities: [c8] Power Management version 2
>>>>                 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
>>>>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
>>>>         Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
>>>>                 Address: 0000000000000000  Data: 0000
>>>>         Capabilities: [e0] Express (v1) Endpoint, MSI 00
>>>>                 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
>>>>                         ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>>>>                 DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
>>>>                         RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>>>>                         MaxPayload 128 bytes, MaxReadReq 512 bytes
>>>>                 DevSta: CorrErr+ UncorrErr+ FatalErr- UnsuppReq+ AuxPwr+ TransPend-
>>>>                 LnkCap: Port #1, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <128ns, L1 <64us
>>>>                         ClockPM- Surprise- LLActRep- BwNot-
>>>>                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
>>>>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>>>>                 LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>>>         Capabilities: [a0] MSI-X: Enable- Count=5 Masked-
>>>>                 Vector table: BAR=3 offset=00000000
>>>>                 PBA: BAR=3 offset=00002000
>>>>         Capabilities: [100 v1] Advanced Error Reporting
>>>>                 UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
>>>>                 UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>                 UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>>>                 CESta:  RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout- NonFatalErr+
>>>>                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>>>>                 AERCap: First Error Pointer: 14, GenCap- CGenEn- ChkCap- ChkEn-
>>>>         Capabilities: [140 v1] Device Serial Number 00-01-c0-ff-ff-12-8a-64
>>>>         Kernel driver in use: e1000e
>>>>
>>>>

It looks like something bad happened on the PCIe bus based on the RxErr,
BadTLP, BadDLLP, and NonFatalERR indicators all being set.  This could
be an indication of a possible problem with the PCIe link on the system.

>>>> # ethtool -d eth2
>>>> MAC Registers
>>>> -------------
>>>> 0x00000: CTRL (Device control register)  0xFFFFFFFF
>>>>       Endian mode (buffers):             big
>>>>       Link reset:                        reset
>>>>       Set link up:                       1
>>>>       Invert Loss-Of-Signal:             yes
>>>>       Receive flow control:              enabled
>>>>       Transmit flow control:             enabled
>>>>       VLAN mode:                         enabled
>>>>       Auto speed detect:                 enabled
>>>>       Speed select:                      not used
>>>>       Force speed:                       yes
>>>>       Force duplex:                      yes
>>>> 0x00008: STATUS (Device status register) 0xFFFFFFFF
>>>>       Duplex:                            full
>>>>       Link up:                           link config
>>>>       TBI mode:                          enabled
>>>>       Link speed:                        not used
>>>>       Bus type:                          PCI-X
>>>>       Bus speed:                         133MHz
>>>>       Bus width:                         64-bit
>>>> 0x00100: RCTL (Receive control register) 0xFFFFFFFF
>>>>       Receiver:                          enabled
>>>>       Store bad packets:                 enabled
>>>>       Unicast promiscuous:               enabled
>>>>       Multicast promiscuous:             enabled
>>>>       Long packet:                       enabled
>>>>       Descriptor minimum threshold size: reserved
>>>>       Broadcast accept mode:             accept
>>>>       VLAN filter:                       enabled
>>>>       Canonical form indicator:          enabled
>>>>       Discard pause frames:              ignored
>>>>       Pass MAC control frames:           pass
>>>>       Receive buffer size:               4096
>>>> 0x02808: RDLEN (Receive desc length)     0xFFFFFFFF
>>>> 0x02810: RDH   (Receive desc head)       0xFFFFFFFF
>>>> 0x02818: RDT   (Receive desc tail)       0xFFFFFFFF
>>>> 0x02820: RDTR  (Receive delay timer)     0xFFFFFFFF
>>>> 0x00400: TCTL (Transmit ctrl register)   0xFFFFFFFF
>>>>       Transmitter:                       enabled
>>>>       Pad short packets:                 enabled
>>>>       Software XOFF Transmission:        enabled
>>>>       Re-transmit on late collision:     enabled
>>>> 0x03808: TDLEN (Transmit desc length)    0xFFFFFFFF
>>>> 0x03810: TDH   (Transmit desc head)      0xFFFFFFFF
>>>> 0x03818: TDT   (Transmit desc tail)      0xFFFFFFFF
>>>> 0x03820: TIDV  (Transmit delay timer)    0xFFFFFFFF
>>>> PHY type:                                unknown
>>>>
>>>>

The device doesn't appear to be responding to MMIO reads based on the
fact that all of the registers are returning all 1's.

You should be able to recover from this error by issuing a PCIe device
reset request via the sysfs interface (echo 1 >
/sys/bus/pci/devices/0000\:01\:00.0/reset).  However that only resolves
the issue after it has occurred.

One thing that would probably be useful would be to provide an "lspci
-vvv" for the entire system.  That would at least give us an idea of the
PCIe hierarchy and could help to tell us if the problem is something in
the local PCIe hierarchy for the device, or if the problem is closer to
the root complex.

Thanks,

Alex


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ