lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 21 Apr 2021 21:05:46 +0000
From:   Michael Kelley <mikelley@...rosoft.com>
To:     Long Li <longli@...rosoft.com>,
        "longli@...uxonhyperv.com" <longli@...uxonhyperv.com>,
        KY Srinivasan <kys@...rosoft.com>,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        Stephen Hemminger <sthemmin@...rosoft.com>,
        Wei Liu <wei.liu@...nel.org>,
        Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
        Rob Herring <robh@...nel.org>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
        "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Dexuan Cui <decui@...rosoft.com>
Subject: RE: [PATCH] PCI: hv: Fix a race condition when removing the device

From: Long Li <longli@...rosoft.com> Sent: Wednesday, April 21, 2021 12:57 PM
> > From: longli@...uxonhyperv.com <longli@...uxonhyperv.com>  Sent:
> > Monday, April 19, 2021 12:21 PM
> > >
> > > On removing the device, any work item (hv_pci_devices_present() or
> > > hv_pci_eject_device()) scheduled on workqueue hbus->wq may still be
> > > running and race with hv_pci_remove().
> > >
> > > This can happen because the host may send PCI_EJECT or
> > > PCI_BUS_RELATIONS(2) and decide to rescind the channel immediately
> > after that.
> > >
> > > Fix this by flushing/stopping the workqueue of hbus before doing hbus
> > remove.
> >
> > I can see that this change follows the same pattern as in hv_pci_suspend().
> > The comments there give a full explanation of the issue and the solution.  But
> > interestingly, the current code also has a reference count mechanism on the
> > hbus.  And code near the end of hv_pci_remove() decrements the reference
> > count and then waits for all users to finish before destroying the workqueue.
> > With this change, is this reference counting mechanism still needed?   If the
> > workqueue has already been emptied, it seems like the
> > wait_for_completion() near the end of hv_pci_remove() would never be
> > waiting for anything.  It makes me wonder if moving the reference count
> > checking code from near the end of hv_pci_remove() up to near the beginning
> > would solve the problem as well (and maybe in hv_pci_suspend also?).
> 
> Yes I think put_hvpcibus() and get_hvpcibus() can be removed, as we have changed to use
> a dedicated workqueue for hbus since they were introduced.
> 
> But we still need to call tasklet_disable/enable() the same way hv_pci_suspend() does, the
> reason is that we need to protect hbus->state. This value needs to be consistent for the
> driver. For example, a CPU may decide to schedule a work on a work queue that we just
> flushed or destroyed, by reading the wrong hbus->state.
> 

Yes, I would agree the tasklet disable/enable are needed, especially since tasklet_disable()
is what ensures that the tasklet is not currently running.

If the hbus ref counting isn't needed any longer, I would strongly recommend adding
a patch to the series that removes it.  This synchronization stuff is hard enough to
understand and reason about; having a leftover mechanism that doesn't really do
anything useful makes it nearly impossible. :-)

Dexuan -- I'm hoping you can take a look as well and see if you agree.

Michael

> >
> > Michael
> >
> > >
> > > Signed-off-by: Long Li <longli@...rosoft.com>
> > > ---
> > >  drivers/pci/controller/pci-hyperv.c | 11 +++++++++++
> > >  1 file changed, 11 insertions(+)
> > >
> > > diff --git a/drivers/pci/controller/pci-hyperv.c
> > > b/drivers/pci/controller/pci-hyperv.c
> > > index 27a17a1e4a7c..116815404313 100644
> > > --- a/drivers/pci/controller/pci-hyperv.c
> > > +++ b/drivers/pci/controller/pci-hyperv.c
> > > @@ -3305,6 +3305,17 @@ static int hv_pci_remove(struct hv_device
> > > *hdev)
> > >
> > >  	hbus = hv_get_drvdata(hdev);
> > >  	if (hbus->state == hv_pcibus_installed) {
> > > +		tasklet_disable(&hdev->channel->callback_event);
> > > +		hbus->state = hv_pcibus_removing;
> > > +		tasklet_enable(&hdev->channel->callback_event);
> > > +
> > > +		flush_workqueue(hbus->wq);
> > > +		/*
> > > +		 * At this point, no work is running or can be scheduled
> > > +		 * on hbus-wq. We can't race with hv_pci_devices_present()
> > > +		 * or hv_pci_eject_device(), it's safe to proceed.
> > > +		 */
> > > +
> > >  		/* Remove the bus from PCI's point of view. */
> > >  		pci_lock_rescan_remove();
> > >  		pci_stop_root_bus(hbus->pci_bus);
> > > --
> > > 2.27.0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ