[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DM2PR0301MB123233EA695E505148665B98AB660@DM2PR0301MB1232.namprd03.prod.outlook.com>
Date: Fri, 29 Apr 2016 16:55:52 +0000
From: Jake Oshins <jakeo@...rosoft.com>
To: Vitaly Kuznetsov <vkuznets@...hat.com>,
"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>
CC: "devel@...uxdriverproject.org" <devel@...uxdriverproject.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
KY Srinivasan <kys@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>,
Bjorn Helgaas <bhelgaas@...gle.com>
Subject: RE: [PATCH] PCI: hv: report resources release after stopping the bus
> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@...hat.com]
> Sent: Friday, April 29, 2016 2:39 AM
> To: linux-pci@...r.kernel.org
> Cc: devel@...uxdriverproject.org; linux-kernel@...r.kernel.org; KY
> Srinivasan <kys@...rosoft.com>; Haiyang Zhang
> <haiyangz@...rosoft.com>; Bjorn Helgaas <bhelgaas@...gle.com>; Jake
> Oshins <jakeo@...rosoft.com>
> Subject: [PATCH] PCI: hv: report resources release after stopping the bus
>
> Kernel hang is observed when pci-hyperv module is release with device
> drivers still attached. E.g. when I do 'rmmod pci_hyperv' with BCM5720
> device pass-through-ed (tg3 module) I see the following:
>
> NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [rmmod:2104]
> ...
> Call Trace:
> [<ffffffffa0641487>] tg3_read_mem+0x87/0x100 [tg3]
> [<ffffffffa063f000>] ? 0xffffffffa063f000
> [<ffffffffa0644375>] tg3_poll_fw+0x85/0x150 [tg3]
> [<ffffffffa0649877>] tg3_chip_reset+0x357/0x8c0 [tg3]
> [<ffffffffa064ca8b>] tg3_halt+0x3b/0x190 [tg3]
> [<ffffffffa0657611>] tg3_stop+0x171/0x230 [tg3]
> ...
> [<ffffffffa064c550>] tg3_remove_one+0x90/0x140 [tg3]
> [<ffffffff813bee59>] pci_device_remove+0x39/0xc0
> [<ffffffff814a3201>] __device_release_driver+0xa1/0x160
> [<ffffffff814a32e3>] device_release_driver+0x23/0x30
> [<ffffffff813b794a>] pci_stop_bus_device+0x8a/0xa0
> [<ffffffff813b7ab6>] pci_stop_root_bus+0x36/0x60
> [<ffffffffa02c3f38>] hv_pci_remove+0x238/0x260 [pci_hyperv]
>
> The problem seems to be that we report local resources release before
> stopping the bus and removing devices from it and device drivers may
> try to perform some operations with these resources on shutdown. Move
> resources release report after we do pci_stop_root_bus().
>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@...hat.com>
Acked-by: Jake Oshins <jakeo@...rosoft.com>
> ---
> drivers/pci/host/pci-hyperv.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c
> index f2559b6..c17e792 100644
> --- a/drivers/pci/host/pci-hyperv.c
> +++ b/drivers/pci/host/pci-hyperv.c
> @@ -2268,11 +2268,6 @@ static int hv_pci_remove(struct hv_device *hdev)
>
> hbus = hv_get_drvdata(hdev);
>
> - ret = hv_send_resources_released(hdev);
> - if (ret)
> - dev_err(&hdev->device,
> - "Couldn't send resources released packet(s)\n");
> -
> memset(&pkt.teardown_packet, 0, sizeof(pkt.teardown_packet));
> init_completion(&comp_pkt.host_event);
> pkt.teardown_packet.completion_func = hv_pci_generic_compl;
> @@ -2295,6 +2290,11 @@ static int hv_pci_remove(struct hv_device *hdev)
> pci_unlock_rescan_remove();
> }
>
> + ret = hv_send_resources_released(hdev);
> + if (ret)
> + dev_err(&hdev->device,
> + "Couldn't send resources released packet(s)\n");
> +
> vmbus_close(hdev->channel);
>
> /* Delete any children which might still exist. */
> --
> 2.5.5
This looks like the right fix to me. Thanks.
-- Jake Oshins
Powered by blists - more mailing lists