lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
 <GV2PR10MB66728BB22371B73CEECD4DB183DCA@GV2PR10MB6672.EURPRD10.PROD.OUTLOOK.COM>
Date: Fri, 28 Nov 2025 13:47:50 +0000
From: Hinko Kocevar <Hinko.Kocevar@....eu>
To: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
CC: "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
	"bhelgaas@...gle.com" <bhelgaas@...gle.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>
Subject: Re: [BUG] PCIe hotplug behind PEX8748: bridge window allocation
 failures when moving AMC between adjacent downstream ports

________________________________________
From: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
Sent: Monday, November 24, 2025 8:18 PM
To: Hinko Kocevar
Cc: linux-pci@...r.kernel.org; bhelgaas@...gle.com; linux-kernel@...r.kernel.org
Subject: Re: [BUG] PCIe hotplug behind PEX8748: bridge window allocation failures when moving AMC between adjacent downstream ports

On Mon, 24 Nov 2025, Hinko Kocevar wrote:

> Hello,
>
> I am observing reproducible PCIe hotplug resource allocation failures on
> Linux 6.18.0-rc7 in a MicroTCA system with an Intel Q170-based CPU board
> and a PLX PEX8725 / PEX8748 PCIe switch hierarchy. Earlier stock
> versions of the kernel (6.11, 6.8) fail with similar symptoms.
>
> An AMC card with a small 256 KiB BAR works correctly at boot, and also
> works when hot-removed and reinserted into the *same* slot. However,
> when reinserted into an adjacent slot, the kernel fails to assign even a
> 256 KiB BAR, with repeated messages of:
>
> bridge window [mem size X]: can't assign; no space
> pci <endpoint>: BAR 0 [...] failed to assign
>
>
> This occurs with vanilla Linux built from git, with no pci= cmdline
> options, and with Above-4G decoding enabled in CPU BIOS.
>
> I have (see attached file for details on PCI resources):
>
> * CPU board: Intel Q170 chipset
> * Root port → PEX8725 (01:00.0) → PEX8748 (03:00.0) → downstream ports 04:00.0 .. 04:12.0
> * AMC card under test:
>   * 10ee:7011, Xilinx 7-Series PCIe endpoint
>   * Single BAR0 of 0x40000 bytes
>
> This AMC works normally at boot and functions under the `mrf-pci` driver.
>
> Reproduce error sequence:
>
> 1. Remove AMC
>
> [  840.371432] pcieport 0000:04:0b.0: pciehp: Slot(12): Button press: will power off in 5 sec
> [  845.448242] mrf-pci 0000:0c:00.0: MRF Cleaned up
>
> 2. Reinsert into SAME slot (Slot 12) → SUCCESS
>
> The kernel cannot allocate IO windows, but *BAR 0 is successfully assigned*:
>
> [  865.689276] pcieport 0000:04:0b.0: pciehp: Slot(12): Link Up
> [  866.687797] pci 0000:0c:00.0: [10ee:7011]
> [  866.687952] pci 0000:0c:00.0: BAR 0 [mem 0x00000000-0x0003ffff]
>
> [  866.688528] pci 0000:0c:00.0: BAR 0 [mem 0xdf000000-0xdf03ffff]: assigned
>
> [  866.689539] mrf-pci 0000:0c:00.0: MRF Setup complete
>
> The device is operational.
>
> 3. Remove AMC and insert into ADJACENT slot (Slot 11) → FAILURE
>
> When moved to a neighboring downstream PEX8748 port, BAR assignment fails repeatedly:
>
> [  952.268260] pcieport 0000:04:09.0: pciehp: Slot(11): Card present
> [  953.367876] pci 0000:0a:00.0: [10ee:7011]
> [  953.368008] pci 0000:0a:00.0: BAR 0 [mem 0x00000000-0x0003ffff]
>
> [  953.368506] pcieport 0000:04:09.0: bridge window [mem size 0x00200000]: can't assign; no space
> [  953.368515] pcieport 0000:04:09.0: bridge window [mem size 0x00200000 64bit pref]: can't assign; no space
> [  953.368544] pci 0000:0a:00.0: BAR 0 [mem size 0x00040000]: can't assign; no space
> [  953.368553] pci 0000:0a:00.0: BAR 0 [mem size 0x00040000]: failed to assign
>
> [  953.369048] mrf-pci 0000:0a:00.0: can't ioremap BAR 0: [??? 0x00000000 flags 0x0]
> [  953.369054] mrf-pci 0000:0a:00.0: Failed to map BARS!
>
>
> The kernel repeatedly tries to reserve 2 MiB bridge windows for this
> port (size 0x00200000), even though the only required resource is a 256
> KiB EP BAR.
>
> Why this appears to be a kernel bug?
>
> * The endpoint BAR is small (256 KiB).
> * Hotplug into the same slot succeeds.
> * Hotplug into an adjacent slot fails, with oversized bridge windows requested.
> * Cold boot always succeeds.
> * The hotplug sizing logic seems to request windows much larger than necessary.

Hi,

There are two things which can make kernel to request more memory than
needed:

- window reserved for hotplug that can be controlled with pci=hpmmiosize=
on the kernel's command line (defaults to DEFAULT_HOTPLUG_MMIO_SIZE which is 2M)

- old_size in calculate_memsize().

I did a patch to remove old_size, it is here (not sure yet if it will go
mainline in this form as there's some regression potential):

https://lore.kernel.org/linux-pci/922b1f68-a6a2-269b-880c-d594f9ca6bde@linux.intel.com/

pci=realloc might help though (but it's also possible it breaks
things because it's rollback isn't as robust as I'd like).


Looking your log, it's unclear why this allocation is so small:

[    0.424748] pci 0000:01:00.0: bridge window [mem 0x00100000-0x00cfffff 64bit pref] to [bus 02-13] add_size 1800000 add_align 100000
...
[    0.424811] pci 0000:01:00.0: bridge window [mem 0x90000000-0x90bfffff 64bit pref]: assigned

It seems to not include that add_size for some reason while making the
allocation (assignment). __assign_resources_sorted() should try to apply
the add_sizes into the resources (it's first loop) before assigning them.
It seems to work for this:

[    0.424780] pci 0000:00:01.0: bridge window [mem 0x90000000-0x923fffff 64bit pref]: assigned

But not for the 0000:01:00.0 for some reason. You might want to figure
that out somehow, e.g., by adding some pci_*() prints here and there.

Hi,

Thanks for the pointers. I dug into this on the same PEX8725/8748 stack and found the hotplug sizing was asking for far more space than the endpoints needed. Even though the defaults in pci.c are 2 MiB, by the time pciehp rescanned a downstream port the globals used by pci_bus_size_bridges() had grown (pref headroom was 128 MiB, non-pref ~64 MiB), so hotplug tried to inflate the 04:xx bridge windows and failed.

What worked for me:

* Keep the downstream PEX8748 ports as hotplug bridges, but pre-size only the non-pref window to 32 MiB for the AMC slots (the reason for such a big number is that there are some AMCs with 16 MB BAR in use here).
* Zero the pref hotplug budget so pciehp doesn’t chase a larger pref window.
* Ensure the non-pref hotplug budget is at least 32 MiB so windows don’t collapse to 2 MiB.

With this, coldboot + hotplug across any downstream port succeeds for a 16 MiB non-pref BAR AMC (and smaller BARs). Upstream windows size correctly; pref windows stay disabled.

I tried using https://lore.kernel.org/linux-pci/922b1f68-a6a2-269b-880c-d594f9ca6bde@linux.intel.com/ but it did not help much with large BAR AMC; I was constantly seeing small windows.

If there’s a better upstreamable approach (e.g., per-device hotplug budgets instead of touching the globals), I’d appreciate feedback.

The testing so far has showed that I can boot without any AMCs inserted and then hotplug the AMC with large BAR with success. But I more time and cards that I do not have at the moment to be more confident that this is solved for me.

Kernel cmdline was only : pci=realloc=on

This is how the memory window looks like with the quirks below:

Device: 00:01.0
	Memory behind bridge: 90000000-a81fffff [size=386M] [32-bit]
	Prefetchable memory behind bridge: [disabled] [64-bit]
Device: 01:00.0
	Memory behind bridge: 90000000-a80fffff [size=385M] [32-bit]
	Prefetchable memory behind bridge: [disabled] [64-bit]
Device: 03:00.0
	Memory behind bridge: 90000000-a5ffffff [size=352M] [32-bit]
	Prefetchable memory behind bridge: [disabled] [64-bit]
Device: 04:00.0
	Memory behind bridge: 90000000-91ffffff [size=32M] [32-bit]
	Prefetchable memory behind bridge: [disabled] [64-bit]
Device: 04:01.0
	Memory behind bridge: 92000000-93ffffff [size=32M] [32-bit]
	Prefetchable memory behind bridge: [disabled] [64-bit]
Device: 04:02.0
	Memory behind bridge: 94000000-95ffffff [size=32M] [32-bit]
	Prefetchable memory behind bridge: [disabled] [64-bit]
Device: 04:03.0
	Memory behind bridge: 96000000-97ffffff [size=32M] [32-bit]
	Prefetchable memory behind bridge: [disabled] [64-bit]
Device: 04:08.0
	Memory behind bridge: 98000000-99ffffff [size=32M] [32-bit]
	Prefetchable memory behind bridge: [disabled] [64-bit]
Device: 04:09.0
	Memory behind bridge: 9a000000-9bffffff [size=32M] [32-bit]
	Prefetchable memory behind bridge: [disabled] [64-bit]
Device: 04:0a.0
	Memory behind bridge: 9c000000-9dffffff [size=32M] [32-bit]
	Prefetchable memory behind bridge: [disabled] [64-bit]
Device: 04:0b.0
	Memory behind bridge: 9e000000-9fffffff [size=32M] [32-bit]
	Prefetchable memory behind bridge: [disabled] [64-bit]
Device: 04:10.0
	Memory behind bridge: a0000000-a1ffffff [size=32M] [32-bit]
	Prefetchable memory behind bridge: [disabled] [64-bit]
Device: 04:11.0
	Memory behind bridge: a2000000-a3ffffff [size=32M] [32-bit]
	Prefetchable memory behind bridge: [disabled] [64-bit]
Device: 04:12.0
	Memory behind bridge: a4000000-a5ffffff [size=32M] [32-bit]
	Prefetchable memory behind bridge: [disabled] [64-bit]


Thanks for your help,
Hinko


Quirk diff (board-specific: PEX8748 downstream on bus 04):

diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c
index a92f18db9..32801405b 100644
--- a/arch/x86/kernel/quirks.c
+++ b/arch/x86/kernel/quirks.c
@@ -664,6 +664,25 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2083, quirk_intel_purley_xeon_ras
 bool x86_apple_machine;
 EXPORT_SYMBOL(x86_apple_machine);
 
+/*
+ * Force the 00:01.0 root port to be treated as hotplug-capable so the PCI
+ * sizing code allocates additional window space for the large downstream
+ * PLX fanout. Use the subsystem ID to avoid touching other root ports.
+ */
+static void quirk_intel_root_port_hotplug(struct pci_dev *dev)
+{
+	if (dev->subsystem_vendor != PCI_VENDOR_ID_INTEL ||
+	    dev->subsystem_device != 0x2015)
+		return;
+
+	dev_info(&dev->dev,
+	    "PCI quirk: root port is hotpluggable\n");
+
+	dev->is_hotplug_bridge = 1;
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x1901,
+			 quirk_intel_root_port_hotplug);
+
 void __init early_platform_quirks(void)
 {
 	x86_apple_machine = dmi_match(DMI_SYS_VENDOR, "Apple Inc.") ||
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index b9c252aa6..c8627eab8 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -6341,3 +6341,59 @@ static void pci_mask_replay_timer_timeout(struct pci_dev *pdev)
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9750, pci_mask_replay_timer_timeout);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9755, pci_mask_replay_timer_timeout);
 #endif
+
+/*
+ * AMC cards behind PLX PEX8748 have 16 MB non-prefetchable BAR.
+ * Size the memory window on PEX8748 downstream ports for proper operation.
+ */
+#define PLX_PEX8748        0x8748
+#define PLX_AMC_MIN_MMIO   (32UL * 1024 * 1024) /* 32 MiB non-pref */
+
+static void quirk_plx_amc_pref_32m(struct pci_dev *dev)
+{
+	struct resource *mmio;
+	resource_size_t mmio_have, mmio_want = PLX_AMC_MIN_MMIO;
+
+	/* Only bridges, and only PEX8748 downstream ports on bus 04:xx. */
+	if ((dev->class >> 16) != PCI_BASE_CLASS_BRIDGE ||
+	    dev->device != PLX_PEX8748 || dev->bus->number != 4)
+		return;
+
+	/* Treat these as hotplug bridges so sizing code keeps them large. */
+	dev->is_hotplug_bridge = 1;
+
+	/* We only want non-pref sizing; drop any pref hotplug ask. */
+	pci_hotplug_mmio_pref_size = 0;
+
+	/* Ensure the hotplug non-pref budget is at least our 32 MiB window. */
+	if (pci_hotplug_mmio_size < mmio_want)
+		pci_hotplug_mmio_size = mmio_want;
+
+	/* Non-prefetchable bridge window for 32-bit BARs. */
+	mmio = &dev->resource[PCI_BRIDGE_RESOURCES];
+
+	if (!(mmio->flags & IORESOURCE_MEM)) {
+		mmio->start = 0;
+		mmio->end   = mmio_want - 1;
+		mmio->flags = IORESOURCE_MEM | IORESOURCE_WINDOW | IORESOURCE_MEM_64;
+
+		dev_info(&dev->dev,
+			 "PLX AMC quirk: creating %llu MiB non-pref window template\n",
+			 (unsigned long long)(mmio_want >> 20));
+	} else {
+		mmio_have = resource_size(mmio);
+		if (mmio_have < mmio_want) {
+			mmio->end = mmio->start + mmio_want - 1;
+			dev_info(&dev->dev,
+				 "PLX AMC quirk: enlarging non-pref window from %llu to %llu MiB\n",
+				 (unsigned long long)(mmio_have >> 20),
+				 (unsigned long long)(mmio_want >> 20));
+		}
+	}
+}
+
+/* Run early enough that bridge sizing sees the 32 MiB windows. */
+DECLARE_PCI_FIXUP_HEADER(
+	PCI_VENDOR_ID_PLX,	/* 0x10b5 */
+	PLX_PEX8748,		/* 0x8748 */
+	quirk_plx_amc_pref_32m);


> * The switch hierarchy is complex but static and stable; only the endpoint moves.
>
> Given this pattern, it appears that the bridge-window sizing policy
> during hotplug is too conservative for switch-dense topologies like
> PEX8748, and the result is an inability to allocate resources for
> perfectly normal devices.
>
> I am happy to run further tests, enable kernel debug options, or try patches.
>
> I'm also attaching the full dmesg and lspci output.
>
> Thanks for any guidance or suggestions.


--
 i.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ