lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BYAPR21MB12718D0364930BACC703F659CE469@BYAPR21MB1271.namprd21.prod.outlook.com>
Date:   Thu, 22 Apr 2021 03:57:20 +0000
From:   Long Li <longli@...rosoft.com>
To:     Dexuan Cui <decui@...rosoft.com>,
        Michael Kelley <mikelley@...rosoft.com>,
        "longli@...uxonhyperv.com" <longli@...uxonhyperv.com>,
        KY Srinivasan <kys@...rosoft.com>,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        Stephen Hemminger <sthemmin@...rosoft.com>,
        Wei Liu <wei.liu@...nel.org>,
        Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
        Rob Herring <robh@...nel.org>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
        "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] PCI: hv: Fix a race condition when removing the device

> Subject: RE: [PATCH] PCI: hv: Fix a race condition when removing the device
> 
> > From: Michael Kelley <mikelley@...rosoft.com>
> > Sent: Wednesday, April 21, 2021 2:06 PM  ...
> > > Yes I think put_hvpcibus() and get_hvpcibus() can be removed, as we
> > > have changed to use a dedicated workqueue for hbus since they were
> > > introduced.
> > >
> > > But we still need to call tasklet_disable/enable() the same way
> > > hv_pci_suspend() does, the
> > > reason is that we need to protect hbus->state. This value needs to
> > > be
> > consistent for the
> > > driver. For example, a CPU may decide to schedule a work on a work
> > > queue
> > that we just
> > > flushed or destroyed, by reading the wrong hbus->state.
> > >
> >
> > Yes, I would agree the tasklet disable/enable are needed, especially
> > since
> > tasklet_disable()
> > is what ensures that the tasklet is not currently running.
> >
> > If the hbus ref counting isn't needed any longer, I would strongly
> > recommend adding a patch to the series that removes it.  This
> > synchronization stuff is hard enough to understand and reason about;
> > having a leftover mechanism that doesn't really do anything useful
> > makes it nearly impossible. :-)
> >
> > Dexuan -- I'm hoping you can take a look as well and see if you agree.
> >
> > Michael
> 
> I also think we can remove the reference counting.
> 
> But it looks like there is still race in hv_pci_remove() even with Long's
> patch: in hv_pci_remove(), we disable the tasklet, change hbus->state to
> hv_pcibus_removing, re-enable the tasklet and flush hbus->wq, and set
> hbus->state to hv_pcibus_removed -- what if the channel callback runs
> again? -- now hbus->state is no longer hv_pcibus_removing, so
> hv_pci_devices_present() -> hv_pci_start_relations_work() and
> hv_pci_eject_device() can still add new work items to hbus->wq, and the
> new work items may race with the vmbus_close().

Good catch, adding a check for hv_pcibus_removed should fix it. I'm sending a v2.

> 
> It looks like we should remove the state hv_pcibus_removed?
> 
> Thanks,
> -- Dexuan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ