lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 23 Jan 2018 15:46:05 -0800
From:   Ben Greear <greearb@...delatech.com>
To:     netdev <netdev@...r.kernel.org>
Subject: e1000e hardware unit hangs

Hello,

Anyone have any more suggestions for making e1000e work better?  This is from a 4.9.65+ kernel,
with these additional e1000e patches applied:

e1000e: Fix error path in link detection
e1000e: Fix wrong comment related to link detection
e1000e: Fix return value test
e1000e: Separate signaling for link check/link up
e1000e: Avoid receiver overrun interrupt bursts

Test case is simply to run 30000 tcp connections each trying to send 56Kbps of bi-directional
data between a pair of e1000e interfaces :)

No OOM related issues are seen on this kernel...similar test on 4.13 showed some OOM
issues, but I have not debugged that yet...


Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: NETDEV WATCHDOG: eth3 (e1000e): transmit queue 0 timed out, trans_start: 4294737199, wd-timeout: 5000 
jiffies: 4294745088 tx-queues: 1
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: NETDEV WATCHDOG: eth2 (e1000e): transmit queue 0 timed out, trans_start: 4294737200, wd-timeout: 5000 
jiffies: 4294745088 tx-queues: 1
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: ------------[ cut here ]------------
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: WARNING: CPU: 7 PID: 0 at /home/greearb/git/linux-4.9.dev.y/net/sched/sch_generic.c:322 dev_watchdog+0x267/0x270
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 cfg80211 bnep bluetooth 
macvlan wanlink(O) pktgen fuse corete...sunrpc ipmi_d
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: CPU: 7 PID: 0 Comm: swapper/7 Tainted: G           O    4.9.65+ #21
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: Hardware name: Supermicro X9SCI/X9SCA/X9SCI/X9SCA, BIOS 2.0b 09/17/2012
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  ffff88042fdc3df0 ffffffff8142d791 0000000000000000 0000000000000000
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  ffff88042fdc3e30 ffffffff8110f266 000001422fdc3e08 0000000000000000
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  0000000000001388 00000000fffc7d30 ffff880417d0c000 00000000fffc9c00
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: Call Trace:
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  <IRQ>
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff8142d791>] dump_stack+0x63/0x82
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff8110f266>] __warn+0xc6/0xe0
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff8110f338>] warn_slowpath_null+0x18/0x20
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff817da497>] dev_watchdog+0x267/0x270
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff817da230>] ? qdisc_rcu_free+0x40/0x40
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff8117bf70>] call_timer_fn+0x30/0x150
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff817da230>] ? qdisc_rcu_free+0x40/0x40
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff8117c350>] run_timer_softirq+0x1f0/0x450
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff81051021>] ? lapic_next_deadline+0x21/0x30
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff8118a54d>] ? clockevents_program_event+0x7d/0x120
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff81115101>] __do_softirq+0xc1/0x2c0
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff81115461>] irq_exit+0xb1/0xc0
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff81051c9d>] smp_apic_timer_interrupt+0x3d/0x50
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff81895842>] apic_timer_interrupt+0x82/0x90
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  <EOI>
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff81726e46>] ? cpuidle_enter_state+0x126/0x300
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff81727042>] cpuidle_enter+0x12/0x20
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff811521ce>] call_cpuidle+0x1e/0x40
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff8115240a>] cpu_startup_entry+0x13a/0x220
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff8104fbd9>] start_secondary+0x149/0x170
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: ---[ end trace 69e31de175b59d4f ]---
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: e1000e 0000:06:00.0 eth2: Reset adapter unexpectedly
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: e1000e 0000:07:00.0 eth3: Reset adapter unexpectedly
Jan 23 15:39:02 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jan 23 15:39:02 lf1003-e3v2-13100124-f20x64 kernel: e1000e 0000:06:00.0 eth2: Detected Hardware Unit Hang:
                                                       TDH                  <a8>
                                                       TDT                  <f3>...
Jan 23 15:39:02 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jan 23 15:39:13 lf1003-e3v2-13100124-f20x64 kernel: NETDEV WATCHDOG: eth3 (e1000e): transmit queue 0 timed out, trans_start: 4294748730, wd-timeout: 5000 
jiffies: 4294759424 tx-queues: 1
Jan 23 15:39:13 lf1003-e3v2-13100124-f20x64 kernel: NETDEV WATCHDOG: eth2 (e1000e): transmit queue 0 timed out, trans_start: 4294748730, wd-timeout: 5000 
jiffies: 4294759424 tx-queues: 1
Jan 23 15:39:13 lf1003-e3v2-13100124-f20x64 kernel: e1000e 0000:07:00.0 eth3: Reset adapter unexpectedly
Jan 23 15:39:13 lf1003-e3v2-13100124-f20x64 kernel: e1000e 0000:06:00.0 eth2: Reset adapter unexpectedly
Jan 23 15:39:20 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jan 23 15:39:20 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jan 23 15:39:25 lf1003-e3v2-13100124-f20x64 kernel: NETDEV WATCHDOG: eth2 (e1000e): transmit queue 0 timed out, trans_start: 4294766123, wd-timeout: 5000 
jiffies: 4294771200 tx-queues: 1
Jan 23 15:39:25 lf1003-e3v2-13100124-f20x64 kernel: NETDEV WATCHDOG: eth3 (e1000e): transmit queue 0 timed out, trans_start: 4294766125, wd-timeout: 5000 
jiffies: 4294771200 tx-queues: 1
Jan 23 15:39:25 lf1003-e3v2-13100124-f20x64 kernel: e1000e 0000:06:00.0 eth2: Reset adapter unexpectedly
Jan 23 15:39:25 lf1003-e3v2-13100124-f20x64 kernel: e1000e 0000:07:00.0 eth3: Reset adapter unexpectedly
Jan 23 15:39:28 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jan 23 15:39:28 lf1003-e3v2-13100124-f20x64 kernel: e1000e 0000:06:00.0 eth2: Detected Hardware Unit Hang:
                                                       TDH                  <c8>
                                                       TDT                  <f5>...
Jan 23 15:39:28 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx


Thanks,
Ben

-- 
Ben Greear <greearb@...delatech.com>
Candela Technologies Inc  http://www.candelatech.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ