lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <SN6PR02MB4157A7F41F299608E605E0C5D450A@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Fri, 18 Jul 2025 03:03:52 +0000
From: Michael Kelley <mhklinux@...look.com>
To: "dan.j.williams@...el.com" <dan.j.williams@...el.com>,
	"bhelgaas@...gle.com" <bhelgaas@...gle.com>
CC: "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>, "lukas@...ner.de"
	<lukas@...ner.de>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "Jonathan.Cameron@...wei.com"
	<Jonathan.Cameron@...wei.com>, Suzuki K Poulose <suzuki.poulose@....com>,
	Lorenzo Pieralisi <lpieralisi@...nel.org>, Rob Herring <robh@...nel.org>, "K.
 Y. Srinivasan" <kys@...rosoft.com>, Haiyang Zhang <haiyangz@...rosoft.com>,
	Wei Liu <wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>, "open
 list:Hyper-V/Azure CORE AND DRIVERS" <linux-hyperv@...r.kernel.org>
Subject: RE: [PATCH 2/3] PCI: Enable host bridge emulation for
 PCI_DOMAINS_GENERIC platforms

From: dan.j.williams@...el.com <dan.j.williams@...el.com> Sent: Thursday, July 17, 2025 5:23 PM
> 
> Michael Kelley wrote:
> > From: dan.j.williams@...el.com <dan.j.williams@...el.com> Sent: Thursday, July 17, 2025 12:59 PM
> > >
> > > Michael Kelley wrote:
> > > > From: Dan Williams <dan.j.williams@...el.com> Sent: Wednesday, July 16, 2025 9:09 AM
> > >
> > > Thanks for taking a look Michael!
> > >
> > > [..]
> > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > > > index e9448d55113b..833ebf2d5213 100644
> > > > > --- a/drivers/pci/pci.c
> > > > > +++ b/drivers/pci/pci.c
> > > > > @@ -6692,9 +6692,50 @@ static void pci_no_domains(void)
> > > > >  #endif
> > > > >  }
> > > > >
> > > > > +#ifdef CONFIG_PCI_DOMAINS
> > > > > +static DEFINE_IDA(pci_domain_nr_dynamic_ida);
> > > > > +
> > > > > +/*
> > > > > + * Find a free domain_nr either allocated by pci_domain_nr_dynamic_ida or
> > > > > + * fallback to the first free domain number above the last ACPI segment number.
> > > > > + * Caller may have a specific domain number in mind, in which case try to
> > > > > + * reserve it.
> > > > > + *
> > > > > + * Note that this allocation is freed by pci_release_host_bridge_dev().
> > > > > + */
> > > > > +int pci_bus_find_emul_domain_nr(int hint)
> > > > > +{
> > > > > +	if (hint >= 0) {
> > > > > +		hint = ida_alloc_range(&pci_domain_nr_dynamic_ida, hint, hint,
> > > > > +				       GFP_KERNEL);
> > > >
> > > > This almost preserves the existing functionality in pci-hyperv.c. But if the
> > > > "hint" passed in is zero, current code in pci-hyperv.c treats that as a
> > > > collision and allocates some other value. The special treatment of zero is
> > > > necessary per the comment with the definition of HVPCI_DOM_INVALID.
> > > >
> > > > I don't have an opinion on whether the code here should treat a "hint"
> > > > of zero as invalid, or whether that should be handled in pci-hyperv.c.
> > >
> > > Oh, I see what you are saying. I made the "hint == 0" case start working
> > > where previously it should have failed. I feel like that's probably best
> > > handled in pci-hyperv.c with something like the following which also
> > > fixes up a regression I caused with @dom being unsigned:
> > >
> > > diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> > > index cfe9806bdbe4..813757db98d1 100644
> > > --- a/drivers/pci/controller/pci-hyperv.c
> > > +++ b/drivers/pci/controller/pci-hyperv.c
> > > @@ -3642,9 +3642,9 @@ static int hv_pci_probe(struct hv_device *hdev,
> > >  {
> > >  	struct pci_host_bridge *bridge;
> > >  	struct hv_pcibus_device *hbus;
> > > -	u16 dom_req, dom;
> > > +	int ret, dom = -EINVAL;
> > > +	u16 dom_req;
> > >  	char *name;
> > > -	int ret;
> > >
> > >  	bridge = devm_pci_alloc_host_bridge(&hdev->device, 0);
> > >  	if (!bridge)
> > > @@ -3673,7 +3673,8 @@ static int hv_pci_probe(struct hv_device *hdev,
> > >  	 * collisions) in the same VM.
> > >  	 */
> > >  	dom_req = hdev->dev_instance.b[5] << 8 | hdev->dev_instance.b[4];
> > > -	dom = pci_bus_find_emul_domain_nr(dom_req);
> > > +	if (dom_req)
> > > +		dom = pci_bus_find_emul_domain_nr(dom_req);
> >
> > No, I don't think this is right either. If dom_req is 0, we don't want to
> > hv_pci_probe() to fail. We want the "collision" path to be taken so that
> > some other unused PCI domain ID is assigned. That could be done by
> > passing -1 as the hint to pci_bus_bind_emul_domain_nr(). Or PCI
> > domain ID 0 could be pre-reserved in init_hv_pci_drv() like is done
> > with HVPCI_DOM_INVALID in current code.
> 
> Yeah, I realized that shortly after sending. I will slow down.
> 
> > >
> > > A couple observations:
> > >
> > > - I think it would be reasonable to not fallback in the hint case with
> > >   something like this:
> >
> > We *do* need the fallback in the hint case. If the hint causes a collision
> > (i.e., another device is already using the hinted PCI domain ID), then we
> > need to choose some other PCI domain ID. Again, we don't want hv_pci_probe()
> > to fail for the device because the value of bytes 4 and 5 chosen from device's
> > GUID (as assigned by Hyper-V) accidently matches bytes 4 and 5 of some other
> > device's GUID. Hyper-V guarantees the GUIDs are unique, but not bytes 4 and
> > 5 standing alone. Current code behaves like the acpi_disabled case in your
> > patch, and picks some other unused PCI domain ID in the 1 to 0xFFFF range.
> 
> Ok, that feels like "let the caller set the range in addition to the
> hint".
> 
> >
> > >
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index 833ebf2d5213..0bd2053dbe8a 100644
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -6705,14 +6705,10 @@ static DEFINE_IDA(pci_domain_nr_dynamic_ida);
> > >   */
> > >  int pci_bus_find_emul_domain_nr(int hint)
> > >  {
> > > -	if (hint >= 0) {
> > > -		hint = ida_alloc_range(&pci_domain_nr_dynamic_ida, hint, hint,
> > > +	if (hint >= 0)
> > > +		return ida_alloc_range(&pci_domain_nr_dynamic_ida, hint, hint,
> > >  				       GFP_KERNEL);
> > >
> > > -		if (hint >= 0)
> > > -			return hint;
> > > -	}
> > > -
> > >  	if (acpi_disabled)
> > >  		return ida_alloc(&pci_domain_nr_dynamic_ida, GFP_KERNEL);
> > >
> > > - The VMD driver has been allocating 32-bit PCI domain numbers since
> > >   v4.5 185a383ada2e ("x86/PCI: Add driver for Intel Volume Management
> > >   Device (VMD)"). At a minimum if it is still a problem, it is a shared
> > >   problem, but the significant deployment of VMD in the time likely
> > >   indicates it is ok. If not, the above change at least makes the
> > >   hyper-v case avoid 32-bit domain numbers.
> >
> > The problem we encountered in 2018/2019 was with graphics devices
> > and the Xorg X Server, specifically with the PCI domain ID stored in
> > xorg.conf to identify the graphics device that the X Server was to run
> > against. I don't recall ever seeing a similar problem with storage or NIC
> > devices, but my memory could be incomplete. It's plausible that user
> > space code accessing the VMD device correctly handled 32-bit domain
> > IDs, but that's not necessarily an indicator for user space graphics
> > software. The Xorg X Server issues would have started somewhere after
> > commit 4a9b0933bdfc in the 4.11 kernel, and were finally fixed in the 5.4
> > kernel with commits be700103efd10 and f73f8a504e279.
> >
> > All that said, I'm not personally averse to trying again in assigning a
> > domain ID > 0xFFFF. I do see a commit [1] to fix libpciaccess that was
> > made 7 years ago in response to the issues we were seeing on Hyper-V.
> > Assuming those fixes have propagated into using packages like X Server,
> > then we're good. But someone from Microsoft should probably sign off
> > on taking this risk. I retired from Microsoft nearly two years ago, and
> > meddle in things from time-to-time without the burden of dealing
> > with customer support issues. ;-)
> 
> Living the dream! Extra thanks for taking a look.
> 
> > [1] https://gitlab.freedesktop.org/xorg/lib/libpciaccess/-/commit/a167bd6474522a709ff3cbb00476c0e4309cb66f > 
> Thanks for this.
> 
> I would rather do the equivalent conversion for now because 7 years old
> is right on the cusp of "someone might still be running that with new
> kernels".

Works for me, and is a bit less risky.

> 
> Here is the replacement fixup that I will fold in if it looks good to
> you:
> 
> -- 8< --
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index cfe9806bdbe4..f1079a438bff 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -3642,9 +3642,9 @@ static int hv_pci_probe(struct hv_device *hdev,
>  {
>  	struct pci_host_bridge *bridge;
>  	struct hv_pcibus_device *hbus;
> -	u16 dom_req, dom;
> +	int ret, dom;
> +	u16 dom_req;
>  	char *name;
> -	int ret;
> 
>  	bridge = devm_pci_alloc_host_bridge(&hdev->device, 0);
>  	if (!bridge)
> @@ -3673,8 +3673,7 @@ static int hv_pci_probe(struct hv_device *hdev,
>  	 * collisions) in the same VM.
>  	 */
>  	dom_req = hdev->dev_instance.b[5] << 8 | hdev->dev_instance.b[4];
> -	dom = pci_bus_find_emul_domain_nr(dom_req);
> -

As an additional paragraph the larger comment block above, let's include a
massaged version of the comment associated with HVPCI_DOM_INVALID.
Perhaps:

 *
 * Because Gen1 VMs use domain 0, don't allow picking domain 0 here, even
 * if bytes 4 and 5 of the instance GUID are both zero.
 */

> +	dom = pci_bus_find_emul_domain_nr(dom_req, 1, U16_MAX);
>  	if (dom < 0) {
>  		dev_err(&hdev->device,
>  			"Unable to use dom# 0x%x or other numbers", dom_req);
> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> index f60244ff9ef8..30935fe85af9 100644
> --- a/drivers/pci/controller/vmd.c
> +++ b/drivers/pci/controller/vmd.c
> @@ -881,7 +881,14 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
>  	pci_add_resource_offset(&resources, &vmd->resources[2], offset[1]);
> 
>  	sd->vmd_dev = vmd->dev;
> -	sd->domain = pci_bus_find_emul_domain_nr(PCI_DOMAIN_NR_NOT_SET);
> +
> +	/*
> +	 * Emulated domains start at 0x10000 to not clash with ACPI _SEG
> +	 * domains.  Per ACPI r6.0, sec 6.5.6,  _SEG returns an integer, of
> +	 * which the lower 16 bits are the PCI Segment Group (domain) number.
> +	 * Other bits are currently reserved.
> +	 */
> +	sd->domain = pci_bus_find_emul_domain_nr(0, 0x10000, INT_MAX);
>  	if (sd->domain < 0)
>  		return sd->domain;
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 833ebf2d5213..de42e53f07d0 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -6695,34 +6695,15 @@ static void pci_no_domains(void)
>  #ifdef CONFIG_PCI_DOMAINS
>  static DEFINE_IDA(pci_domain_nr_dynamic_ida);
> 
> -/*
> - * Find a free domain_nr either allocated by pci_domain_nr_dynamic_ida or
> - * fallback to the first free domain number above the last ACPI segment number.
> - * Caller may have a specific domain number in mind, in which case try to
> - * reserve it.
> - *
> - * Note that this allocation is freed by pci_release_host_bridge_dev().
> +/**
> + * pci_bus_find_emul_domain_nr() - allocate a PCI domain number per constraints
> + * @hint: desired domain, 0 if any id in the range of @min to @max is acceptable
> + * @min: minimum allowable domain
> + * @max: maximum allowable domain, no ids higher than INT_MAX will be returned
>   */
> -int pci_bus_find_emul_domain_nr(int hint)
> +u32 pci_bus_find_emul_domain_nr(u32 hint, u32 min, u32 max)

Shouldn't the return type here still be "int"?  ida_alloc_range() can return a negative
errno if it fails. And the call sites in hv_pci_probe() and vmd_enable_domain()
store the return value into an "int".

Other than that, and my suggested added comment, this looks good.

Michael

>  {
> -	if (hint >= 0) {
> -		hint = ida_alloc_range(&pci_domain_nr_dynamic_ida, hint, hint,
> -				       GFP_KERNEL);
> -
> -		if (hint >= 0)
> -			return hint;
> -	}
> -
> -	if (acpi_disabled)
> -		return ida_alloc(&pci_domain_nr_dynamic_ida, GFP_KERNEL);
> -
> -	/*
> -	 * Emulated domains start at 0x10000 to not clash with ACPI _SEG
> -	 * domains.  Per ACPI r6.0, sec 6.5.6,  _SEG returns an integer, of
> -	 * which the lower 16 bits are the PCI Segment Group (domain) number.
> -	 * Other bits are currently reserved.
> -	 */
> -	return ida_alloc_range(&pci_domain_nr_dynamic_ida, 0x10000, INT_MAX,
> +	return ida_alloc_range(&pci_domain_nr_dynamic_ida, max(hint, min), max,
>  			       GFP_KERNEL);
>  }
>  EXPORT_SYMBOL_GPL(pci_bus_find_emul_domain_nr);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index f6a713da5c49..4aeabe8e2f1f 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1934,13 +1934,16 @@ DEFINE_GUARD(pci_dev, struct pci_dev *,
> pci_dev_lock(_T), pci_dev_unlock(_T))
>   */
>  #ifdef CONFIG_PCI_DOMAINS
>  extern int pci_domains_supported;
> -int pci_bus_find_emul_domain_nr(int hint);
> +u32 pci_bus_find_emul_domain_nr(u32 hint, u32 min, u32 max);
>  void pci_bus_release_emul_domain_nr(int domain_nr);
>  #else
>  enum { pci_domains_supported = 0 };
>  static inline int pci_domain_nr(struct pci_bus *bus) { return 0; }
>  static inline int pci_proc_domain(struct pci_bus *bus) { return 0; }
> -static inline int pci_bus_find_emul_domain_nr(int hint) { return 0; }
> +static inline u32 pci_bus_find_emul_domain_nr(u32 hint, u32 min, u32 max)
> +{
> +	return 0;
> +}
>  static inline void pci_bus_release_emul_domain_nr(int domain_nr) { }
>  #endif /* CONFIG_PCI_DOMAINS */
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ