[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <MW2PR2101MB0892B264810E6E6E54A96C4DBF469@MW2PR2101MB0892.namprd21.prod.outlook.com>
Date: Thu, 22 Apr 2021 02:31:23 +0000
From: Dexuan Cui <decui@...rosoft.com>
To: Michael Kelley <mikelley@...rosoft.com>,
Long Li <longli@...rosoft.com>,
"longli@...uxonhyperv.com" <longli@...uxonhyperv.com>,
KY Srinivasan <kys@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>,
Stephen Hemminger <sthemmin@...rosoft.com>,
Wei Liu <wei.liu@...nel.org>,
Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
Rob Herring <robh@...nel.org>,
Bjorn Helgaas <bhelgaas@...gle.com>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] PCI: hv: Fix a race condition when removing the device
> From: Michael Kelley <mikelley@...rosoft.com>
> Sent: Wednesday, April 21, 2021 2:06 PM
> ...
> > Yes I think put_hvpcibus() and get_hvpcibus() can be removed, as we have
> > changed to use
> > a dedicated workqueue for hbus since they were introduced.
> >
> > But we still need to call tasklet_disable/enable() the same way
> > hv_pci_suspend() does, the
> > reason is that we need to protect hbus->state. This value needs to be
> consistent for the
> > driver. For example, a CPU may decide to schedule a work on a work queue
> that we just
> > flushed or destroyed, by reading the wrong hbus->state.
> >
>
> Yes, I would agree the tasklet disable/enable are needed, especially since
> tasklet_disable()
> is what ensures that the tasklet is not currently running.
>
> If the hbus ref counting isn't needed any longer, I would strongly recommend
> adding
> a patch to the series that removes it. This synchronization stuff is hard
> enough to
> understand and reason about; having a leftover mechanism that doesn't really
> do
> anything useful makes it nearly impossible. :-)
>
> Dexuan -- I'm hoping you can take a look as well and see if you agree.
>
> Michael
I also think we can remove the reference counting.
But it looks like there is still race in hv_pci_remove() even with Long's
patch: in hv_pci_remove(), we disable the tasklet, change hbus->state to
hv_pcibus_removing, re-enable the tasklet and flush hbus->wq, and set
hbus->state to hv_pcibus_removed -- what if the channel callback runs
again? -- now hbus->state is no longer hv_pcibus_removing, so
hv_pci_devices_present() -> hv_pci_start_relations_work() and
hv_pci_eject_device() can still add new work items to hbus->wq, and the new
work items may race with the vmbus_close().
It looks like we should remove the state hv_pcibus_removed?
Thanks,
-- Dexuan
Powered by blists - more mailing lists