[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20150814024148.GA2813@codemonkey.org.uk>
Date: Thu, 13 Aug 2015 22:41:48 -0400
From: Dave Jones <davej@...emonkey.org.uk>
To: netdev@...r.kernel.org
Cc: Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
intel-wired-lan@...ts.osuosl.org
Subject: I218 e1000e hangs.
I've got a machine with an onboard NIC that reproduces a hardware
hang every time I do an rsync to it.
[ 488.752630] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
TDH <27>
TDT <34>
next_to_use <34>
next_to_clean <23>
buffer_info[next_to_clean]:
time_stamp <1000048b2>
next_to_watch <27>
jiffies <1000049d8>
next_to_watch.status <0>
MAC Status <80083>
PHY Status <796d>
PHY 1000BASE-T Status <7c00>
PHY Extended Status <3000>
PCI Status <10>
[ 490.751948] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
TDH <27>
TDT <34>
next_to_use <34>
next_to_clean <23>
buffer_info[next_to_clean]:
time_stamp <1000048b2>
next_to_watch <27>
jiffies <100004aa0>
next_to_watch.status <0>
MAC Status <80083>
PHY Status <796d>
PHY 1000BASE-T Status <7c00>
PHY Extended Status <3000>
PCI Status <10>
[ 492.750447] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
TDH <27>
TDT <34>
next_to_use <34>
next_to_clean <23>
buffer_info[next_to_clean]:
time_stamp <1000048b2>
next_to_watch <27>
jiffies <100004b68>
next_to_watch.status <0>
MAC Status <80083>
PHY Status <796d>
PHY 1000BASE-T Status <7c00>
PHY Extended Status <3000>
PCI Status <10>
[ 494.749507] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
TDH <27>
TDT <34>
next_to_use <34>
next_to_clean <23>
buffer_info[next_to_clean]:
time_stamp <1000048b2>
next_to_watch <27>
jiffies <100004c30>
next_to_watch.status <0>
MAC Status <80083>
PHY Status <796d>
PHY 1000BASE-T Status <7c00>
PHY Extended Status <3000>
PCI Status <10>
[ 494.758881] ------------[ cut here ]------------
[ 494.759109] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:303 dev_watchdog+0x23a/0x250()
[ 494.759347] NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
[ 494.759585] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-rc6-backup-debug+ #1
[ 494.759841] ffffffffb0ddd622 0431bce15e8d04e9 ffff88043d803d08 ffffffffb097e15b
[ 494.760111] 0000000000000007 ffff88043d803d60 ffff88043d803d48 ffffffffb0076de5
[ 494.760392] 0000000000000000 0000000000000000 0000000000000000 ffff880427bb7d30
[ 494.760648] Call Trace:
[ 494.760896] <IRQ> [<ffffffffb097e15b>] dump_stack+0x4c/0x65
[ 494.761160] [<ffffffffb0076de5>] warn_slowpath_common+0x85/0xc0
[ 494.761423] [<ffffffffb0076ea5>] warn_slowpath_fmt+0x55/0x70
[ 494.761686] [<ffffffffb087b02a>] dev_watchdog+0x23a/0x250
[ 494.761949] [<ffffffffb087adf0>] ? qdisc_rcu_free+0x40/0x40
[ 494.762215] [<ffffffffb00e9703>] call_timer_fn+0xb3/0x420
[ 494.762483] [<ffffffffb00e9655>] ? call_timer_fn+0x5/0x420
[ 494.762753] [<ffffffffb00e9c02>] run_timer_softirq+0x192/0x3d0
[ 494.763025] [<ffffffffb007b6b5>] ? __do_softirq+0xb5/0x5d0
[ 494.763300] [<ffffffffb087adf0>] ? qdisc_rcu_free+0x40/0x40
[ 494.763570] [<ffffffffb007b6df>] __do_softirq+0xdf/0x5d0
[ 494.763838] [<ffffffffb007bd58>] ? irq_exit+0x78/0xc0
[ 494.764108] [<ffffffffb007bd98>] irq_exit+0xb8/0xc0
[ 494.764381] [<ffffffffb098bee6>] smp_apic_timer_interrupt+0x46/0x60
[ 494.764662] [<ffffffffb098a8ad>] apic_timer_interrupt+0x6d/0x80
[ 494.764943] <EOI> [<ffffffffb0815916>] ? cpuidle_enter_state+0x106/0x3a0
[ 494.765232] [<ffffffffb0815951>] ? cpuidle_enter_state+0x141/0x3a0
[ 494.765525] [<ffffffffb0815946>] ? cpuidle_enter_state+0x136/0x3a0
[ 494.765815] [<ffffffffb0815be7>] cpuidle_enter+0x17/0x20
[ 494.766105] [<ffffffffb00bca5c>] cpu_startup_entry+0x38c/0x500
[ 494.766396] [<ffffffffb0977988>] rest_init+0x138/0x140
[ 494.766692] [<ffffffffb0f91f23>] start_kernel+0x466/0x487
[ 494.766990] [<ffffffffb0f91495>] x86_64_start_reservations+0x2a/0x2c
[ 494.767292] [<ffffffffb0f91583>] x86_64_start_kernel+0xec/0xf0
Here's another instance after rebooting, with some different register states..
[ 2379.674285] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
TDH <50>
TDT <5d>
next_to_use <5d>
next_to_clean <4d>
buffer_info[next_to_clean]:
time_stamp <100032c2d>
next_to_watch <50>
jiffies <100032ce8>
next_to_watch.status <0>
MAC Status <80083>
PHY Status <796d>
PHY 1000BASE-T Status <3c00>
PHY Extended Status <3000>
PCI Status <10>
[ 2381.672792] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
TDH <50>
TDT <5d>
next_to_use <5d>
next_to_clean <4d>
buffer_info[next_to_clean]:
time_stamp <100032c2d>
next_to_watch <50>
jiffies <100032db0>
next_to_watch.status <0>
MAC Status <80083>
PHY Status <796d>
PHY 1000BASE-T Status <3c00>
PHY Extended Status <3000>
PCI Status <10>
[ 2383.671379] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
TDH <50>
TDT <5d>
next_to_use <5d>
next_to_clean <4d>
buffer_info[next_to_clean]:
time_stamp <100032c2d>
next_to_watch <50>
jiffies <100032e78>
next_to_watch.status <0>
MAC Status <80083>
PHY Status <796d>
PHY 1000BASE-T Status <3c00>
PHY Extended Status <3000>
PCI Status <10>
[ 2385.669944] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
TDH <50>
TDT <5d>
next_to_use <5d>
next_to_clean <4d>
buffer_info[next_to_clean]:
time_stamp <100032c2d>
next_to_watch <50>
jiffies <100032f40>
next_to_watch.status <0>
MAC Status <80083>
PHY Status <796d>
PHY 1000BASE-T Status <3c00>
PHY Extended Status <3000>
PCI Status <10>
[ 2387.668428] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
TDH <50>
TDT <5d>
next_to_use <5d>
next_to_clean <4d>
buffer_info[next_to_clean]:
time_stamp <100032c2d>
next_to_watch <50>
jiffies <100033008>
next_to_watch.status <0>
MAC Status <80083>
PHY Status <796d>
PHY 1000BASE-T Status <3c00>
PHY Extended Status <3000>
PCI Status <10>
The rsync on the other side then craps itself detecting 'corrupted packets'.
The NIC in question is..
00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (2) I218-V
If this is a software problem, it's not anything new. I tested as far back
as 3.16, which had the same problem.
Is there any hw feature I can try disabling, to see if that makes a difference ?
Dave
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists