[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <BYAPR21MB13369B7B03B3CB7760D7D79ECAAC0@BYAPR21MB1336.namprd21.prod.outlook.com>
Date: Thu, 15 Aug 2019 16:55:15 +0000
From: Haiyang Zhang <haiyangz@...rosoft.com>
To: Lorenzo Pieralisi <lorenzo.pieralisi@....com>
CC: "sashal@...nel.org" <sashal@...nel.org>,
"bhelgaas@...gle.com" <bhelgaas@...gle.com>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
KY Srinivasan <kys@...rosoft.com>,
Stephen Hemminger <sthemmin@...rosoft.com>,
"olaf@...fle.de" <olaf@...fle.de>, vkuznets <vkuznets@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v5,1/2] PCI: hv: Detect and fix Hyper-V PCI domain number
collision
> -----Original Message-----
> From: Lorenzo Pieralisi <lorenzo.pieralisi@....com>
> Sent: Thursday, August 15, 2019 12:11 PM
> To: Haiyang Zhang <haiyangz@...rosoft.com>
> Cc: sashal@...nel.org; bhelgaas@...gle.com; linux-
> hyperv@...r.kernel.org; linux-pci@...r.kernel.org; KY Srinivasan
> <kys@...rosoft.com>; Stephen Hemminger <sthemmin@...rosoft.com>;
> olaf@...fle.de; vkuznets <vkuznets@...hat.com>; linux-
> kernel@...r.kernel.org
> Subject: Re: [PATCH v5,1/2] PCI: hv: Detect and fix Hyper-V PCI domain
> number collision
>
> On Wed, Aug 14, 2019 at 03:52:15PM +0000, Haiyang Zhang wrote:
> > Currently in Azure cloud, for passthrough devices, the host sets the device
> > instance ID's bytes 8 - 15 to a value derived from the host HWID, which is
> > the same on all devices in a VM. So, the device instance ID's bytes 8 and 9
> > provided by the host are no longer unique. This affects all Azure hosts
> > since last year, and can cause device passthrough to VMs to fail because
>
> Bjorn already asked, can you be a bit more specific than "since last
> year" here please ?
>
> It would be useful to understand when/how this became an issue.
The host change happens around July 2018. The Azure roll out takes
multi weeks, so there is no specific date. I will include the Month
Year in the log.
>
> > the bytes 8 and 9 are used as PCI domain number. Collision of domain
> > numbers will cause the second device with the same domain number fail to
> > load.
> >
> > In the cases of collision, we will detect and find another number that is
> > not in use.
> >
> > Suggested-by: Michael Kelley <mikelley@...rosoft.com>
> > Signed-off-by: Haiyang Zhang <haiyangz@...rosoft.com>
> > Acked-by: Sasha Levin <sashal@...nel.org>
> > ---
> > drivers/pci/controller/pci-hyperv.c | 92
> +++++++++++++++++++++++++++++++------
> > 1 file changed, 79 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-
> hyperv.c
> > index 40b6254..31b8fd5 100644
> > --- a/drivers/pci/controller/pci-hyperv.c
> > +++ b/drivers/pci/controller/pci-hyperv.c
> > @@ -2510,6 +2510,48 @@ static void put_hvpcibus(struct
> hv_pcibus_device *hbus)
> > complete(&hbus->remove_event);
> > }
> >
> > +#define HVPCI_DOM_MAP_SIZE (64 * 1024)
> > +static DECLARE_BITMAP(hvpci_dom_map, HVPCI_DOM_MAP_SIZE);
> > +
> > +/*
> > + * PCI domain number 0 is used by emulated devices on Gen1 VMs, so
> define 0
> > + * as invalid for passthrough PCI devices of this driver.
> > + */
> > +#define HVPCI_DOM_INVALID 0
> > +
> > +/**
> > + * hv_get_dom_num() - Get a valid PCI domain number
> > + * Check if the PCI domain number is in use, and return another number if
> > + * it is in use.
> > + *
> > + * @dom: Requested domain number
> > + *
> > + * return: domain number on success, HVPCI_DOM_INVALID on failure
> > + */
> > +static u16 hv_get_dom_num(u16 dom)
> > +{
> > + unsigned int i;
>
> > +
> > + if (test_and_set_bit(dom, hvpci_dom_map) == 0)
> > + return dom;
> > +
> > + for_each_clear_bit(i, hvpci_dom_map, HVPCI_DOM_MAP_SIZE) {
> > + if (test_and_set_bit(i, hvpci_dom_map) == 0)
> > + return i;
> > + }
>
> Don't you need locking around code reading/updating hvpci_dom_map ?
If the bit changes after for_each_clear_bit() considers it as a "clear bit" - the
test_and_set_bit() does test&set in an atomic operation - the return value
will be 1 instead of 0. Then the loop will continue to the next clear bit, until
the test_and_set_bit() is successful. So no locking is necessary here.
Thanks,
- Haiyang
Powered by blists - more mailing lists