lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJtEV7Z=54cHbZ8SiYmgm7=osncs1SaQG=adhht-3NmWrkG1ww@mail.gmail.com>
Date:	Tue, 22 Jul 2014 10:05:23 +0800
From:	Andrew Cooks <acooks@...il.com>
To:	"Fujinaka, Todd" <todd.fujinaka@...el.com>
Cc:	netdev <netdev@...r.kernel.org>,
	Linux NICS <Linux-nics@...tope.jf.intel.com>,
	"e1000-devel@...ts.sf.net" <e1000-devel@...ts.sf.net>,
	Igor Grinberg <grinberg@...pulab.co.il>,
	Dmitry Lifshitz <lifshitz@...pulab.co.il>
Subject: Re: [linux-nics] Problem: 82574L device (e1000e driver): Reset
 adapter unexpectedly / transmit queue 0 timed out

Resending to fix the broken CC list. Sorry about that.

On Tue, Jul 22, 2014 at 9:58 AM, Andrew Cooks <acooks@...il.com> wrote:
> Hi
>
> I think the common mailing list etiquette is to reply below, so I've
> moved the reply and mine follows below.
>
> On Mon, Jul 21, 2014 at 11:22 PM, Fujinaka, Todd
> <todd.fujinaka@...el.com> wrote:
>>> -----Original Message-----
>>> From: linux-nics-bounces@...tope.jf.intel.com [mailto:linux-nics-bounces@...tope.jf.intel.com] On Behalf Of Andrew Cooks
>>> Sent: Sunday, July 20, 2014 8:01 PM
>>> To: netdev
>>> Cc: Dmitry Lifshitz; Linux NICS; Igor@...tope.jf.intel.com; e1000-devel@...ts.sf.net; Grinberg
>>> Subject: [linux-nics] Problem: 82574L device (e1000e driver): Reset adapter unexpectedly / transmit queue 0 timed out
>>>
>>> Hi
>>>
>>> The 82574L device, using the e1000e driver, is unstable on the fit-MultiLAN (aka CompuLab MultiLAN)[1] that I'm using and I need some help to understand what's causing it.
>>>
>>> The fit-MultiLAN has four 82574L devices[2]. On two different occasions I've seen a (different) device drop into an unusable state, while the rest of the 82574L devices continue to function normally.
>>> I'm not sure how to describe the state exactly, but it's as if the adapter can no longer detect a link and disconnecting/reconnecting the cable doesn't recover it.
>>>
>>> I don't think there is any transition to/from a lower power state involved, because this happens while the device is in use (shunting packets), but I'm not sure how to rule it out either.
>>>
>>> I can get the device working again without a power cycle, by doing the following (though I admit I'm not sure whether each of these is really needed):
>>> # echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/remove
>>> # echo 1 > /sys/bus/pci/rescan
>>> # echo 1 > /sys/bus/pci/drivers_autoprobe # echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/reset
>>>
>>> This problem occurs with versions 3.15.0 and 3.16.0-rc5. I haven't tested older versions in the current configuration.
>>>
>>> Hopefully the information below will help pin it down.
>>>
>>> Kernel log:
>>> [18439.527157] ------------[ cut here ]------------ [18439.527177] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x266/0x270()
>>> [18439.527182] NETDEV WATCHDOG: eth2 (e1000e): transmit queue 0 timed out
>>> [18439.527185] Modules linked in: sch_cbq sch_netem nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack xt_LOG xt_comment xt_tcpudp iptable_filter ip_tables x_tables nfnetlink_queue nfnetlink_log nfnetlink arc4 rtl8723ae rtl_pci rtlwifi mac80211 cfg80211 rtl8723_common microcode sp5100_tco k10temp i2c_piix4 video hid_generic usbhid hid r8169 mii firmware_class ohci_pci ohci_hcd
>>> [18439.527231] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-rc5 #2
>>> [18439.527235] Hardware name: CompuLab fit-PC3i/SBC fit-PC3i, BIOS SBCFP3I_2.1.0.333_1 X64 11/26/2013
>>> [18439.527238]  0000000000000009 ffff88014ec03db0 ffffffff8164c94d ffff88014ec03df8
>>> [18439.527244]  ffff88014ec03de8 ffffffff810479dd 0000000000000000 ffff880091aec000
>>> [18439.527249]  0000000000000001 0000000000000000 ffff880091aec000 ffff88014ec03e48
>>> [18439.527254] Call Trace:
>>> [18439.527258]  <IRQ>  [<ffffffff8164c94d>] dump_stack+0x45/0x56
>>> [18439.527271]  [<ffffffff810479dd>] warn_slowpath_common+0x7d/0xa0
>>> [18439.527277]  [<ffffffff81047a4c>] warn_slowpath_fmt+0x4c/0x50
>>> [18439.527283]  [<ffffffff8156c276>] dev_watchdog+0x266/0x270
>>> [18439.527288]  [<ffffffff8156c010>] ? dev_graft_qdisc+0x80/0x80
>>> [18439.527294]  [<ffffffff81053c86>] call_timer_fn+0x36/0x100
>>> [18439.527298]  [<ffffffff8156c010>] ? dev_graft_qdisc+0x80/0x80
>>> [18439.527303]  [<ffffffff81053f4c>] run_timer_softirq+0x1fc/0x2e0
>>> [18439.527309]  [<ffffffff8104c97d>] __do_softirq+0xed/0x2d0
>>> [18439.527315]  [<ffffffff8104cdcd>] irq_exit+0xcd/0xe0
>>> [18439.527320]  [<ffffffff81655695>] smp_apic_timer_interrupt+0x45/0x60
>>> [18439.527325]  [<ffffffff81653cfa>] apic_timer_interrupt+0x6a/0x70
>>> [18439.527328]  <EOI>  [<ffffffff8100cb9c>] ? default_idle+0x1c/0xb0
>>> [18439.527339]  [<ffffffff810aab93>] ? rcu_eqs_enter+0x63/0x90
>>> [18439.527344]  [<ffffffff8100d44f>] arch_cpu_idle+0xf/0x20
>>> [18439.527350]  [<ffffffff8108da15>] cpu_startup_entry+0x355/0x420
>>> [18439.527355]  [<ffffffff8164ecdd>] ? __schedule+0x30d/0x780
>>> [18439.527361]  [<ffffffff8163fb77>] rest_init+0x77/0x80
>>> [18439.527367]  [<ffffffff81cf9fd4>] start_kernel+0x435/0x442
>>> [18439.527372]  [<ffffffff81cf99a6>] ? set_init_arg+0x53/0x53
>>> [18439.527378]  [<ffffffff81cf95ad>] x86_64_start_reservations+0x2a/0x2c
>>> [18439.527383]  [<ffffffff81cf96a0>] x86_64_start_kernel+0xf1/0xf4
>>> [18439.527386] ---[ end trace e1cdd13e14fbe306 ]---
>>> [18439.527405] e1000e 0000:01:00.0 eth2: Reset adapter unexpectedly
>>> [18439.548497] br_v401: port 1(eth2_v401) entered disabled state
>>> [18439.548616] br_v402: port 1(eth2_v402) entered disabled state
>>> [18439.548703] br_v403: port 1(eth2_v403) entered disabled state
>>> [18439.548776] br_v404: port 1(eth2_v404) entered disabled state
>>> [18439.548849] br_v405: port 1(eth2_v405) entered disabled state
>>> [18439.548929] br_v406: port 1(eth2_v406) entered disabled state
>>> [18439.548997] br_v407: port 1(eth2_v407) entered disabled state
>>> [18439.549067] br_v487: port 1(eth2_v487) entered disabled state
>>> [18439.549144] br_v600: port 1(eth2_v600) entered disabled state
>>> [18439.549216] br_v602: port 1(eth2_v602) entered disabled state
>>> [18439.549284] br_v603: port 1(eth2_v603) entered disabled state
>>> [18439.549362] br_v1010: port 1(eth2_v1010) entered disabled state
>>> [18439.767733] e1000e 0000:01:00.0 eth2: Timesync Tx Control register not set as expected
>>>
>>>
>>> # lspci -vvnnk:
>>> 01:00.0 Ethernet controller [0200]: Intel Corporation 82574L Gigabit Network Connection [8086:10d3]
>>>         Subsystem: Intel Corporation Device [8086:0000]
>>>         Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
>>>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>         Interrupt: pin A routed to IRQ 16
>>>         Region 0: [virtual] Memory at c1900000 (32-bit, non-prefetchable) [size=128K]
>>>         Region 1: [virtual] Memory at c1800000 (32-bit, non-prefetchable) [size=1M]
>>>         Region 2: I/O ports at 7000 [size=32]
>>>         Region 3: [virtual] Memory at c1920000 (32-bit, non-prefetchable) [size=16K]
>>>         [virtual] Expansion ROM at c1940000 [disabled] [size=256K]
>>>         Capabilities: [c8] Power Management version 2
>>>                 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
>>>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
>>>         Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
>>>                 Address: 0000000000000000  Data: 0000
>>>         Capabilities: [e0] Express (v1) Endpoint, MSI 00
>>>                 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
>>>                         ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>>>                 DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
>>>                         RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>>>                         MaxPayload 128 bytes, MaxReadReq 512 bytes
>>>                 DevSta: CorrErr+ UncorrErr+ FatalErr- UnsuppReq+ AuxPwr+ TransPend-
>>>                 LnkCap: Port #1, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <128ns, L1 <64us
>>>                         ClockPM- Surprise- LLActRep- BwNot-
>>>                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
>>>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>>>                 LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>>         Capabilities: [a0] MSI-X: Enable- Count=5 Masked-
>>>                 Vector table: BAR=3 offset=00000000
>>>                 PBA: BAR=3 offset=00002000
>>>         Capabilities: [100 v1] Advanced Error Reporting
>>>                 UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
>>>                 UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>                 UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>>                 CESta:  RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout- NonFatalErr+
>>>                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>>>                 AERCap: First Error Pointer: 14, GenCap- CGenEn- ChkCap- ChkEn-
>>>         Capabilities: [140 v1] Device Serial Number 00-01-c0-ff-ff-12-8a-64
>>>         Kernel driver in use: e1000e
>>>
>>>
>>> # ethtool eth2
>>> Settings for eth2:
>>> Supported ports: [ TP ]
>>> Supported link modes:   10baseT/Half 10baseT/Full
>>>                        100baseT/Half 100baseT/Full
>>>                        1000baseT/Full
>>> Supported pause frame use: No
>>> Supports auto-negotiation: Yes
>>> Advertised link modes:  10baseT/Half 10baseT/Full
>>>                        100baseT/Half 100baseT/Full
>>>                        1000baseT/Full
>>> Advertised pause frame use: No
>>> Advertised auto-negotiation: Yes
>>> Speed: Unknown!
>>> Duplex: Unknown! (255)
>>> Port: Twisted Pair
>>> PHYAD: 1
>>> Transceiver: internal
>>> Auto-negotiation: on
>>> MDI-X: Unknown (auto)
>>> Supports Wake-on: pumbg
>>> Wake-on: g
>>> Current message level: 0x00000007 (7)
>>>       drv probe link
>>> Link detected: no
>>>
>>>
>>> # ethtool -d eth2
>>> MAC Registers
>>> -------------
>>> 0x00000: CTRL (Device control register)  0xFFFFFFFF
>>>       Endian mode (buffers):             big
>>>       Link reset:                        reset
>>>       Set link up:                       1
>>>       Invert Loss-Of-Signal:             yes
>>>       Receive flow control:              enabled
>>>       Transmit flow control:             enabled
>>>       VLAN mode:                         enabled
>>>       Auto speed detect:                 enabled
>>>       Speed select:                      not used
>>>       Force speed:                       yes
>>>       Force duplex:                      yes
>>> 0x00008: STATUS (Device status register) 0xFFFFFFFF
>>>       Duplex:                            full
>>>       Link up:                           link config
>>>       TBI mode:                          enabled
>>>       Link speed:                        not used
>>>       Bus type:                          PCI-X
>>>       Bus speed:                         133MHz
>>>       Bus width:                         64-bit
>>> 0x00100: RCTL (Receive control register) 0xFFFFFFFF
>>>       Receiver:                          enabled
>>>       Store bad packets:                 enabled
>>>       Unicast promiscuous:               enabled
>>>       Multicast promiscuous:             enabled
>>>       Long packet:                       enabled
>>>       Descriptor minimum threshold size: reserved
>>>       Broadcast accept mode:             accept
>>>       VLAN filter:                       enabled
>>>       Canonical form indicator:          enabled
>>>       Discard pause frames:              ignored
>>>       Pass MAC control frames:           pass
>>>       Receive buffer size:               4096
>>> 0x02808: RDLEN (Receive desc length)     0xFFFFFFFF
>>> 0x02810: RDH   (Receive desc head)       0xFFFFFFFF
>>> 0x02818: RDT   (Receive desc tail)       0xFFFFFFFF
>>> 0x02820: RDTR  (Receive delay timer)     0xFFFFFFFF
>>> 0x00400: TCTL (Transmit ctrl register)   0xFFFFFFFF
>>>       Transmitter:                       enabled
>>>       Pad short packets:                 enabled
>>>       Software XOFF Transmission:        enabled
>>>       Re-transmit on late collision:     enabled
>>> 0x03808: TDLEN (Transmit desc length)    0xFFFFFFFF
>>> 0x03810: TDH   (Transmit desc head)      0xFFFFFFFF
>>> 0x03818: TDT   (Transmit desc tail)      0xFFFFFFFF
>>> 0x03820: TIDV  (Transmit delay timer)    0xFFFFFFFF
>>> PHY type:                                unknown
>>>
>>>
>>> # ethtool -t eth2
>>>
>>> The test result is FAIL
>>> The test extra info:
>>> Register test  (offline) 40
>>> Eeprom test    (offline) 2
>>> Interrupt test (offline) 4
>>> Loopback test  (offline) 0
>>> Link test   (on/offline) 0
>>>
>>> References:
>>> 1. http://www.fit-pc.com/web/solutions/multilan/
>>> 2. http://fit-pc.com/download/face-modules/documents/face-modules-hw-specifications.pdf (FM-XTDE4U2/4 FACE Module, p36)
>>>
>>> Any suggestions of help to pin down the problem would be much appreciated.
>>>
>>> Thanks.
>>>
>>
>> Need more -v's in the lspci. Also, what is the OS, kernel version, and driver version?
>>
>> Todd Fujinaka
>> Software Application Engineer
>> Networking Division (ND)
>> Intel Corporation
>> todd.fujinaka@...el.com
>> (503) 712-4565
>>
>
>
> Thanks for the response, Todd.
>
> Adding more -v's to lspci doesn't change the output for this device.
>
> This is on Linux Mint 16, without NetworkManager. Are there any
> particular packages that you think might be relevant?
>
> The kernel versions I've tested are 3.15.0 and 3.16.0-rc5 from
> kernel.org. Was that ambiguous in my previous message?
>
> Driver version:
> # ethtool -i eth2
> driver: e1000e
> version: 2.3.2-k
> firmware-version: 2.1-3
> bus-info: 0000:01:00.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: no
>
> Thanks,
>
> a.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ