[<prev] [next>] [day] [month] [year] [list]
Message-ID: <54F07630.1010802@kpanic.de>
Date: Fri, 27 Feb 2015 14:50:40 +0100
From: Stefan Assmann <sassmann@...nic.de>
To: netdev <netdev@...r.kernel.org>
CC: "e1000-devel@...ts.sourceforge.net"
<e1000-devel@...ts.sourceforge.net>,
"Brandeburg, Jesse" <jesse.brandeburg@...el.com>,
"Kirsher, Jeffrey T" <jeffrey.t.kirsher@...el.com>,
"Williams, Mitch A" <mitch.a.williams@...el.com>,
anjali.singhai@...el.com
Subject: i40e: crash on NMI by continuous module reload
When unloading/loading the driver in a loop with
modprobe -r i40e ; modprobe i40e
after a few cycles the driver no longer successfully probes and outputs
the following.
[ 160.171944] i40e 0000:07:00.1 eth7: adding 68:05:ca:2a:3a:41 vid=0
[ 161.271487] i40e 0000:07:00.1: set phy mask fail, aq_err -54
[ 161.685505] i40e 0000:07:00.0 eth6: NIC Link is Down
[ 161.873172] i40e 0000:07:00.1: link restart failed, aq_err=0
[ 162.401255] i40e 0000:07:00.1: PCI-Express: Speed 8.0GT/s Width x8
[ 162.710082] i40e 0000:07:00.0: add filter failed, err -54, aq_err 0
[ 162.930801] i40e 0000:07:00.1: get phy abilities failed, aq_err -54, advertised speed settings may not be correct
[ 162.977599] i40e 0000:07:00.1: Features: PF-id[1] VFs: 32 VSIs: 34 QP: 32 RX: PS RSS FD_ATR FD_SB NTUPLE PTP
[ 163.238624] i40e 0000:07:00.0 eth6: NIC Link is Down
[ 163.244566] i40e 0000:07:00.2: Initial pf_reset failed: -15
[ 163.244607] i40e: probe of 0000:07:00.2 failed with error -15
[ 163.464911] i40e 0000:07:00.3: Initial pf_reset failed: -15
[ 163.490747] i40e: probe of 0000:07:00.3 failed with error -15
[ 163.518932] i40e 0000:07:00.1: i40e_ptp_stop: removed PHC on eth7
[ 163.746713] i40e 0000:07:00.1 eth7: NIC Link is Down
[ 164.270164] i40e 0000:07:00.1: add filter failed, err -54, aq_err 0
[...]
[ 184.462907] i40e: Copyright (c) 2013 - 2014 Intel Corporation.
[ 184.711290] i40e 0000:07:00.0: Initial pf_reset failed: -15
[ 184.736457] i40e: probe of 0000:07:00.0 failed with error -15
[ 184.983109] i40e 0000:07:00.1: Initial pf_reset failed: -15
[ 185.009354] i40e: probe of 0000:07:00.1 failed with error -15
[ 185.256612] i40e 0000:07:00.2: Initial pf_reset failed: -15
[ 185.281990] i40e: probe of 0000:07:00.2 failed with error -15
[ 185.529085] i40e 0000:07:00.3: Initial pf_reset failed: -15
[ 185.555094] i40e: probe of 0000:07:00.3 failed with error -15
Followed by
[ 188.178408] NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0.
[ 188.214709] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.19.0+ #81
[ 188.245187] Hardware name: HP ProLiant DL360p Gen8, BIOS P71 08/02/2014
[ 188.276847] task: ffffffff81e13480 ti: ffffffff81e00000 task.ti: ffffffff81e00000
[ 188.313671] RIP: 0010:[<ffffffff8100d45b>] [<ffffffff8100d45b>] default_idle+0x1b/0xb0
[ 188.351779] RSP: 0018:ffffffff81e03ea8 EFLAGS: 00000246
[ 188.377118] RAX: 0000000000000000 RBX: ffffffff81e00010 RCX: 0000000000000000
[ 188.412311] RDX: ffffffff81e00000 RSI: 0000000000000000 RDI: 0000000000000000
[ 188.448563] RBP: ffffffff81e03eb8 R08: 0000000000000000 R09: 00000000fffe4047
[ 188.482137] R10: ffffffff81a0e045 R11: 0000000000000000 R12: 0000000000000000
[ 188.518089] R13: ffffffff81efd970 R14: ffffffff81e00010 R15: 0000000000000000
[ 188.553382] FS: 0000000000000000(0000) GS:ffff880237a00000(0000) knlGS:0000000000000000
[ 188.594583] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 188.621056] CR2: 00007fbcb561bc88 CR3: 0000000235966000 CR4: 00000000001406f0
[ 188.656549] Stack:
[ 188.665693] ffffffff81e00010 ffffffff81e00010 ffffffff81e03ec8 ffffffff8100cc3a
[ 188.700062] ffffffff81e03f48 ffffffff810884b7 ffffffff81e13480 ffff880236538910
[ 188.734638] ffffffff81e00000 ffffffff81e00010 ffffffff81e00010 ffffffff81e00000
[ 188.773067] Call Trace:
[ 188.784412] [<ffffffff8100cc3a>] arch_cpu_idle+0xa/0x10
[ 188.808717] [<ffffffff810884b7>] cpu_startup_entry+0x227/0x3b0
[ 188.837221] [<ffffffff819d0a52>] rest_init+0x72/0x80
[ 188.860698] [<ffffffff81f201bd>] start_kernel+0x41b/0x428
[ 188.887669] [<ffffffff81f1fbc0>] ? set_init_arg+0x5d/0x5d
[ 188.914359] [<ffffffff81f1f5ad>] x86_64_start_reservations+0x2a/0x2c
[ 188.945125] [<ffffffff81f1f700>] x86_64_start_kernel+0x151/0x158
[ 188.972480] Code: c0 48 83 c8 08 0f 22 c0 eb ce 66 0f 1f 44 00 00 55 8b 05 a1 a8 ec 00 48 89 e5 41 54 65 44 8b 25 cc cc ff 7e 85 c0 5
3 7f 19 fb f4 <8b> 05 87 a8 ec 00 65 44 8b 25 b7 cc ff 7e 85 c0 7f 44 5b 41 5c
I've tracked this down to the following hunk from this commit.
commit cafa2ee6fbb1bbc2fecdeef990858d56646fc1bd
Author: Anjali Singhai Jain <anjali.singhai@...el.com>
Date: Sat Sep 13 07:40:45 2014 +0000
i40e: Fix a bug where Rx would stop after some time
[...]
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index f7464e8..ff6d94d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
[...]
@@ -9169,6 +9178,13 @@ static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
if (err)
dev_info(&pf->pdev->dev, "set phy mask fail, aq_err %d\n", err);
+ msleep(75);
+ err = i40e_aq_set_link_restart_an(&pf->hw, true, NULL);
+ if (err) {
+ dev_info(&pf->pdev->dev, "link restart failed, aq_err=%d\n",
+ pf->hw.aq.asq_last_status);
+ }
+
/* The main driver is (mostly) up and happy. We need to set this state
* before setting up the misc vector or we get a race and the vector
* ends up disabled forever.
With this hunk removed the driver successfully unloaded/reloaded a
couple of hundred times. Would it be safe to just remove this hunk?
I haven't seen any negative effects by removing this yet.
Stefan
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists