[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <FC41C24E35F18A40888AACA1A36F3E418ADF339C@fmsmsx115.amr.corp.intel.com>
Date: Fri, 27 Feb 2015 19:42:29 +0000
From: "Nelson, Shannon" <shannon.nelson@...el.com>
To: nick <xerofoify@...il.com>, Stefan Assmann <sassmann@...nic.de>,
netdev <netdev@...r.kernel.org>
CC: "e1000-devel@...ts.sourceforge.net"
<e1000-devel@...ts.sourceforge.net>,
"Brandeburg, Jesse" <jesse.brandeburg@...el.com>
Subject: RE: [E1000-devel] i40e: crash on NMI by continuous module reload
> From: nick [mailto:xerofoify@...il.com]
> On 2015-02-27 09:16 AM, Stefan Assmann wrote:
> > On 27.02.2015 15:02, nick wrote:
> >
> > [...]
> >
> >>> i40e: Fix a bug where Rx would stop after some time
> >>> [...]
> >>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c
> b/drivers/net/ethernet/intel/i40e/i40e_main.c
> >>> index f7464e8..ff6d94d 100644
> >>> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> >>> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> >>> [...]
> >>> @@ -9169,6 +9178,13 @@ static int i40e_probe(struct pci_dev *pdev,
> const struct pci_device_id *ent)
> >>> if (err)
> >>> dev_info(&pf->pdev->dev, "set phy mask fail, aq_err %d\n",
> err);
> >>>
> >>> + msleep(75);
> >>> + err = i40e_aq_set_link_restart_an(&pf->hw, true, NULL);
> >>> + if (err) {
> >>> + dev_info(&pf->pdev->dev, "link restart failed, aq_err=%d\n",
> >>> + pf->hw.aq.asq_last_status);
> >>> + }
> >>> +
> >>> /* The main driver is (mostly) up and happy. We need to set this
> state
> >>> * before setting up the misc vector or we get a race and the
> vector
> >>> * ends up disabled forever.
> >>>
> >>> With this hunk removed the driver successfully unloaded/reloaded a
> >>> couple of hundred times. Would it be safe to just remove this hunk?
> >>> I haven't seen any negative effects by removing this yet.
> >>>
> >>> Stefan
> >>>
> >> Stefan,
> >> I wouldn't remove them yet as this does look like a valid idea to
> check to see if the link is
> >> restarting successfully. On the other hand can you try removing the
> msleep line as this one is
> >> most likely causing the issue due to sleeping for some long in a
> probe function is generally a
> >> bad idea.
> >> Thanks,
> >> Nick
> >
> > Thanks Nick for the quick reply. I tested removing the msleep but that
> > didn't make a difference. You actually need to remove the complete
> hunk
> > to get a stable driver reload.
> >
> > Stefan
> >
> Stefan,
> Basically there are a few things that could be going wrong
> 1. You are getting a error return for the
> function,i40e_aq_set_link_restart_an
> 2. You are trying to re able the device again when not needed
> 3. You are sending a NULL value to a field for command arguments that
> takes a 0 and not NULL
> to take no arguments
> Nick
First of all, I would make sure you've got a short sleep in between each load and unload in this stress test. There's a lot going on under the covers in the Firmware that really should be allowed to settle out before jostling it again with another load/unload command.
It would help to know what Firmware you have on your NIC - can you give us the output from "ethtool -i <ethX>"?
The out-of-tree driver has just (finally!) been updated on SourceForge, so you might give this version 1.2.37 driver a try to see if it changes your result. That code still has the hunk in question, but protected by a FW version check. The related patch will be headed upstream to net-next very soon.
Firmware updates have also just been released, but I'm not sure they've made it to the Intel Downloads site yet. Updating your FW will make a difference.
sln
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists