[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140603084032.GA13874@richard>
Date: Tue, 3 Jun 2014 16:40:32 +0800
From: Wei Yang <weiyang@...ux.vnet.ibm.com>
To: Or Gerlitz <or.gerlitz@...il.com>
Cc: Bjorn Helgaas <bhelgaas@...gle.com>,
David Miller <davem@...emloft.net>,
Wei Yang <weiyang@...ux.vnet.ibm.com>,
netdev <netdev@...r.kernel.org>, Amir Vadai <amirv@...lanox.com>,
Jack Morgenstein <jackm@....mellanox.co.il>,
Tal Alon <talal@...lanox.com>,
Yevgeny Petrilin <yevgenyp@...lanox.com>
Subject: Re: [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are
probed into the Host
On Tue, Jun 03, 2014 at 11:15:43AM +0300, Or Gerlitz wrote:
>On Mon, Jun 2, 2014 at 7:10 PM, Bjorn Helgaas <bhelgaas@...gle.com> wrote:
>> Writing a driver is not an empirical process of trying things to see
>> what works. You need to actively design a consistent structure so you
>> know why and when things are safe. I object to gratuitous "dev ==
>> NULL" checks because often they are just a way of patching up a driver
>> design that isn't well thought-out.
>
>Bjorn, 1st and most -- Agreed.
>
>Next, to be precise, the use case of rebooting the host while the
>driver was loaded in SRIOV mode and NO VFs probed to VMs worked before
>commit befdf89 and is now broken.
>
>Reading further your response, I understand that the code was probably
>using a sort of hackish branching to make that to happen, and you
>suggest we re-write that section properly so it can serve well when
>(hopefully soon) implemenet
>sriov_configure and possibly also suspend/resume, point taken.
>
>Dave, as for this patch, again, the regression of inability to reboot
>the host node
>while the driver is loaded exists in the latest upstream code as of
>befdf89 / 3.15-rc1
>
>Now, taking into account that 3.15 is after rc8 and the IL devel team
>has a holiday this week, I don't see us coming in time with a more
>deeper fix for 3.15, so maybe you can eventaully go and merge this one
>liner for 3.15?
I am glad to verify your patch, if you wish.
>
>Or.
>
>
>> As I wrote before:
>> From the PCI core's perspective, after .probe() returns successfully,
>> we can call any driver entry point and pass the pci_dev to it, and
>> expect it to work. Doing mlx4_remove_one() in mlx4_pci_err_detected()
>> sort of breaks that assumption because you clear out pci_drvdata().
>> Right now, the only other entry point mlx4 really implements is
>> mlx4_remove_one(), and it has a hack that tests whether pci_drvdata()
>> is NULL. But that's ... a hack, and you'll have to do the same
>> if/when you implement suspend/resume/sriov_configure/etc.
--
Richard Yang
Help you, Help me
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists