lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJtEV7bERYgwtQv1zAvk+XzaywF6g7K-QqAq3VLErNU1YSgxYA@mail.gmail.com>
Date:	Mon, 21 Jul 2014 11:00:51 +0800
From:	Andrew Cooks <acooks@...il.com>
To:	netdev <netdev@...r.kernel.org>
Cc:	Bruce Allan <bruce.w.allan@...el.com>,
	Linux NICS <linux.nics@...el.com>, e1000-devel@...ts.sf.net,
	Dmitry Lifshitz <lifshitz@...pulab.co.il>,
	Igor Grinberg <grinberg@...pulab.co.il>
Subject: Problem: 82574L device (e1000e driver): Reset adapter unexpectedly /
 transmit queue 0 timed out

Hi

The 82574L device, using the e1000e driver, is unstable on the
fit-MultiLAN (aka CompuLab MultiLAN)[1] that I'm using and I need some
help to understand what's causing it.

The fit-MultiLAN has four 82574L devices[2]. On two different
occasions I've seen a (different) device drop into an unusable state,
while the rest of the 82574L devices continue to function normally.
I'm not sure how to describe the state exactly, but it's as if the
adapter can no longer detect a link and disconnecting/reconnecting the
cable doesn't recover it.

I don't think there is any transition to/from a lower power state
involved, because this happens while the device is in use (shunting
packets), but I'm not sure how to rule it out either.

I can get the device working again without a power cycle, by doing the
following (though I admit I'm not sure whether each of these is really
needed):
# echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/remove
# echo 1 > /sys/bus/pci/rescan
# echo 1 > /sys/bus/pci/drivers_autoprobe
# echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/reset

This problem occurs with versions 3.15.0 and 3.16.0-rc5. I haven't
tested older versions in the current configuration.

Hopefully the information below will help pin it down.

Kernel log:
[18439.527157] ------------[ cut here ]------------
[18439.527177] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264
dev_watchdog+0x266/0x270()
[18439.527182] NETDEV WATCHDOG: eth2 (e1000e): transmit queue 0 timed out
[18439.527185] Modules linked in: sch_cbq sch_netem nf_conntrack_ipv4
nf_defrag_ipv4 xt_conntrack nf_conntrack xt_LOG xt_comment xt_tcpudp
iptable_filter ip_tables x_tables nfnetlink_queue nfnetlink_log
nfnetlink arc4 rtl8723ae rtl_pci rtlwifi mac80211 cfg80211
rtl8723_common microcode sp5100_tco k10temp i2c_piix4 video
hid_generic usbhid hid r8169 mii firmware_class ohci_pci ohci_hcd
[18439.527231] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-rc5 #2
[18439.527235] Hardware name: CompuLab fit-PC3i/SBC fit-PC3i, BIOS
SBCFP3I_2.1.0.333_1 X64 11/26/2013
[18439.527238]  0000000000000009 ffff88014ec03db0 ffffffff8164c94d
ffff88014ec03df8
[18439.527244]  ffff88014ec03de8 ffffffff810479dd 0000000000000000
ffff880091aec000
[18439.527249]  0000000000000001 0000000000000000 ffff880091aec000
ffff88014ec03e48
[18439.527254] Call Trace:
[18439.527258]  <IRQ>  [<ffffffff8164c94d>] dump_stack+0x45/0x56
[18439.527271]  [<ffffffff810479dd>] warn_slowpath_common+0x7d/0xa0
[18439.527277]  [<ffffffff81047a4c>] warn_slowpath_fmt+0x4c/0x50
[18439.527283]  [<ffffffff8156c276>] dev_watchdog+0x266/0x270
[18439.527288]  [<ffffffff8156c010>] ? dev_graft_qdisc+0x80/0x80
[18439.527294]  [<ffffffff81053c86>] call_timer_fn+0x36/0x100
[18439.527298]  [<ffffffff8156c010>] ? dev_graft_qdisc+0x80/0x80
[18439.527303]  [<ffffffff81053f4c>] run_timer_softirq+0x1fc/0x2e0
[18439.527309]  [<ffffffff8104c97d>] __do_softirq+0xed/0x2d0
[18439.527315]  [<ffffffff8104cdcd>] irq_exit+0xcd/0xe0
[18439.527320]  [<ffffffff81655695>] smp_apic_timer_interrupt+0x45/0x60
[18439.527325]  [<ffffffff81653cfa>] apic_timer_interrupt+0x6a/0x70
[18439.527328]  <EOI>  [<ffffffff8100cb9c>] ? default_idle+0x1c/0xb0
[18439.527339]  [<ffffffff810aab93>] ? rcu_eqs_enter+0x63/0x90
[18439.527344]  [<ffffffff8100d44f>] arch_cpu_idle+0xf/0x20
[18439.527350]  [<ffffffff8108da15>] cpu_startup_entry+0x355/0x420
[18439.527355]  [<ffffffff8164ecdd>] ? __schedule+0x30d/0x780
[18439.527361]  [<ffffffff8163fb77>] rest_init+0x77/0x80
[18439.527367]  [<ffffffff81cf9fd4>] start_kernel+0x435/0x442
[18439.527372]  [<ffffffff81cf99a6>] ? set_init_arg+0x53/0x53
[18439.527378]  [<ffffffff81cf95ad>] x86_64_start_reservations+0x2a/0x2c
[18439.527383]  [<ffffffff81cf96a0>] x86_64_start_kernel+0xf1/0xf4
[18439.527386] ---[ end trace e1cdd13e14fbe306 ]---
[18439.527405] e1000e 0000:01:00.0 eth2: Reset adapter unexpectedly
[18439.548497] br_v401: port 1(eth2_v401) entered disabled state
[18439.548616] br_v402: port 1(eth2_v402) entered disabled state
[18439.548703] br_v403: port 1(eth2_v403) entered disabled state
[18439.548776] br_v404: port 1(eth2_v404) entered disabled state
[18439.548849] br_v405: port 1(eth2_v405) entered disabled state
[18439.548929] br_v406: port 1(eth2_v406) entered disabled state
[18439.548997] br_v407: port 1(eth2_v407) entered disabled state
[18439.549067] br_v487: port 1(eth2_v487) entered disabled state
[18439.549144] br_v600: port 1(eth2_v600) entered disabled state
[18439.549216] br_v602: port 1(eth2_v602) entered disabled state
[18439.549284] br_v603: port 1(eth2_v603) entered disabled state
[18439.549362] br_v1010: port 1(eth2_v1010) entered disabled state
[18439.767733] e1000e 0000:01:00.0 eth2: Timesync Tx Control register
not set as expected


# lspci -vvnnk:
01:00.0 Ethernet controller [0200]: Intel Corporation 82574L Gigabit
Network Connection [8086:10d3]
        Subsystem: Intel Corporation Device [8086:0000]
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 16
        Region 0: [virtual] Memory at c1900000 (32-bit,
non-prefetchable) [size=128K]
        Region 1: [virtual] Memory at c1800000 (32-bit,
non-prefetchable) [size=1M]
        Region 2: I/O ports at 7000 [size=32]
        Region 3: [virtual] Memory at c1920000 (32-bit,
non-prefetchable) [size=16K]
        [virtual] Expansion ROM at c1940000 [disabled] [size=256K]
        Capabilities: [c8] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [e0] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s
<512ns, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ UncorrErr+ FatalErr- UnsuppReq+
AuxPwr+ TransPend-
                LnkCap: Port #1, Speed 2.5GT/s, Width x1, ASPM L0s L1,
Latency L0 <128ns, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train-
SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [a0] MSI-X: Enable- Count=5 Masked-
                Vector table: BAR=3 offset=00000000
                PBA: BAR=3 offset=00002000
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt-
UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 14, GenCap- CGenEn- ChkCap- ChkEn-
        Capabilities: [140 v1] Device Serial Number 00-01-c0-ff-ff-12-8a-64
        Kernel driver in use: e1000e


# ethtool eth2
Settings for eth2:
Supported ports: [ TP ]
Supported link modes:   10baseT/Half 10baseT/Full
                       100baseT/Half 100baseT/Full
                       1000baseT/Full
Supported pause frame use: No
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full
                       100baseT/Half 100baseT/Full
                       1000baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Speed: Unknown!
Duplex: Unknown! (255)
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
MDI-X: Unknown (auto)
Supports Wake-on: pumbg
Wake-on: g
Current message level: 0x00000007 (7)
      drv probe link
Link detected: no


# ethtool -d eth2
MAC Registers
-------------
0x00000: CTRL (Device control register)  0xFFFFFFFF
      Endian mode (buffers):             big
      Link reset:                        reset
      Set link up:                       1
      Invert Loss-Of-Signal:             yes
      Receive flow control:              enabled
      Transmit flow control:             enabled
      VLAN mode:                         enabled
      Auto speed detect:                 enabled
      Speed select:                      not used
      Force speed:                       yes
      Force duplex:                      yes
0x00008: STATUS (Device status register) 0xFFFFFFFF
      Duplex:                            full
      Link up:                           link config
      TBI mode:                          enabled
      Link speed:                        not used
      Bus type:                          PCI-X
      Bus speed:                         133MHz
      Bus width:                         64-bit
0x00100: RCTL (Receive control register) 0xFFFFFFFF
      Receiver:                          enabled
      Store bad packets:                 enabled
      Unicast promiscuous:               enabled
      Multicast promiscuous:             enabled
      Long packet:                       enabled
      Descriptor minimum threshold size: reserved
      Broadcast accept mode:             accept
      VLAN filter:                       enabled
      Canonical form indicator:          enabled
      Discard pause frames:              ignored
      Pass MAC control frames:           pass
      Receive buffer size:               4096
0x02808: RDLEN (Receive desc length)     0xFFFFFFFF
0x02810: RDH   (Receive desc head)       0xFFFFFFFF
0x02818: RDT   (Receive desc tail)       0xFFFFFFFF
0x02820: RDTR  (Receive delay timer)     0xFFFFFFFF
0x00400: TCTL (Transmit ctrl register)   0xFFFFFFFF
      Transmitter:                       enabled
      Pad short packets:                 enabled
      Software XOFF Transmission:        enabled
      Re-transmit on late collision:     enabled
0x03808: TDLEN (Transmit desc length)    0xFFFFFFFF
0x03810: TDH   (Transmit desc head)      0xFFFFFFFF
0x03818: TDT   (Transmit desc tail)      0xFFFFFFFF
0x03820: TIDV  (Transmit delay timer)    0xFFFFFFFF
PHY type:                                unknown


# ethtool -t eth2

The test result is FAIL
The test extra info:
Register test  (offline) 40
Eeprom test    (offline) 2
Interrupt test (offline) 4
Loopback test  (offline) 0
Link test   (on/offline) 0

References:
1. http://www.fit-pc.com/web/solutions/multilan/
2. http://fit-pc.com/download/face-modules/documents/face-modules-hw-specifications.pdf
(FM-XTDE4U2/4 FACE Module, p36)

Any suggestions of help to pin down the problem would be much appreciated.

Thanks.

a.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ