[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7468e936-1a16-3971-283a-9d4d77fe3b35@linux.intel.com>
Date: Mon, 24 Nov 2025 21:18:57 +0200 (EET)
From: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
To: Hinko Kocevar <Hinko.Kocevar@....eu>
cc: "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
"bhelgaas@...gle.com" <bhelgaas@...gle.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [BUG] PCIe hotplug behind PEX8748: bridge window allocation
failures when moving AMC between adjacent downstream ports
On Mon, 24 Nov 2025, Hinko Kocevar wrote:
> Hello,
>
> I am observing reproducible PCIe hotplug resource allocation failures on
> Linux 6.18.0-rc7 in a MicroTCA system with an Intel Q170-based CPU board
> and a PLX PEX8725 / PEX8748 PCIe switch hierarchy. Earlier stock
> versions of the kernel (6.11, 6.8) fail with similar symptoms.
>
> An AMC card with a small 256 KiB BAR works correctly at boot, and also
> works when hot-removed and reinserted into the *same* slot. However,
> when reinserted into an adjacent slot, the kernel fails to assign even a
> 256 KiB BAR, with repeated messages of:
>
> bridge window [mem size X]: can't assign; no space
> pci <endpoint>: BAR 0 [...] failed to assign
>
>
> This occurs with vanilla Linux built from git, with no pci= cmdline
> options, and with Above-4G decoding enabled in CPU BIOS.
>
> I have (see attached file for details on PCI resources):
>
> * CPU board: Intel Q170 chipset
> * Root port → PEX8725 (01:00.0) → PEX8748 (03:00.0) → downstream ports 04:00.0 .. 04:12.0
> * AMC card under test:
> * 10ee:7011, Xilinx 7-Series PCIe endpoint
> * Single BAR0 of 0x40000 bytes
>
> This AMC works normally at boot and functions under the `mrf-pci` driver.
>
> Reproduce error sequence:
>
> 1. Remove AMC
>
> [ 840.371432] pcieport 0000:04:0b.0: pciehp: Slot(12): Button press: will power off in 5 sec
> [ 845.448242] mrf-pci 0000:0c:00.0: MRF Cleaned up
>
> 2. Reinsert into SAME slot (Slot 12) → SUCCESS
>
> The kernel cannot allocate IO windows, but *BAR 0 is successfully assigned*:
>
> [ 865.689276] pcieport 0000:04:0b.0: pciehp: Slot(12): Link Up
> [ 866.687797] pci 0000:0c:00.0: [10ee:7011]
> [ 866.687952] pci 0000:0c:00.0: BAR 0 [mem 0x00000000-0x0003ffff]
>
> [ 866.688528] pci 0000:0c:00.0: BAR 0 [mem 0xdf000000-0xdf03ffff]: assigned
>
> [ 866.689539] mrf-pci 0000:0c:00.0: MRF Setup complete
>
> The device is operational.
>
> 3. Remove AMC and insert into ADJACENT slot (Slot 11) → FAILURE
>
> When moved to a neighboring downstream PEX8748 port, BAR assignment fails repeatedly:
>
> [ 952.268260] pcieport 0000:04:09.0: pciehp: Slot(11): Card present
> [ 953.367876] pci 0000:0a:00.0: [10ee:7011]
> [ 953.368008] pci 0000:0a:00.0: BAR 0 [mem 0x00000000-0x0003ffff]
>
> [ 953.368506] pcieport 0000:04:09.0: bridge window [mem size 0x00200000]: can't assign; no space
> [ 953.368515] pcieport 0000:04:09.0: bridge window [mem size 0x00200000 64bit pref]: can't assign; no space
> [ 953.368544] pci 0000:0a:00.0: BAR 0 [mem size 0x00040000]: can't assign; no space
> [ 953.368553] pci 0000:0a:00.0: BAR 0 [mem size 0x00040000]: failed to assign
>
> [ 953.369048] mrf-pci 0000:0a:00.0: can't ioremap BAR 0: [??? 0x00000000 flags 0x0]
> [ 953.369054] mrf-pci 0000:0a:00.0: Failed to map BARS!
>
>
> The kernel repeatedly tries to reserve 2 MiB bridge windows for this
> port (size 0x00200000), even though the only required resource is a 256
> KiB EP BAR.
>
> Why this appears to be a kernel bug?
>
> * The endpoint BAR is small (256 KiB).
> * Hotplug into the same slot succeeds.
> * Hotplug into an adjacent slot fails, with oversized bridge windows requested.
> * Cold boot always succeeds.
> * The hotplug sizing logic seems to request windows much larger than necessary.
Hi,
There are two things which can make kernel to request more memory than
needed:
- window reserved for hotplug that can be controlled with pci=hpmmiosize=
on the kernel's command line (defaults to DEFAULT_HOTPLUG_MMIO_SIZE which is 2M)
- old_size in calculate_memsize().
I did a patch to remove old_size, it is here (not sure yet if it will go
mainline in this form as there's some regression potential):
https://lore.kernel.org/linux-pci/922b1f68-a6a2-269b-880c-d594f9ca6bde@linux.intel.com/
pci=realloc might help though (but it's also possible it breaks
things because it's rollback isn't as robust as I'd like).
Looking your log, it's unclear why this allocation is so small:
[ 0.424748] pci 0000:01:00.0: bridge window [mem 0x00100000-0x00cfffff 64bit pref] to [bus 02-13] add_size 1800000 add_align 100000
...
[ 0.424811] pci 0000:01:00.0: bridge window [mem 0x90000000-0x90bfffff 64bit pref]: assigned
It seems to not include that add_size for some reason while making the
allocation (assignment). __assign_resources_sorted() should try to apply
the add_sizes into the resources (it's first loop) before assigning them.
It seems to work for this:
[ 0.424780] pci 0000:00:01.0: bridge window [mem 0x90000000-0x923fffff 64bit pref]: assigned
But not for the 0000:01:00.0 for some reason. You might want to figure
that out somehow, e.g., by adding some pci_*() prints here and there.
> * The switch hierarchy is complex but static and stable; only the endpoint moves.
>
> Given this pattern, it appears that the bridge-window sizing policy
> during hotplug is too conservative for switch-dense topologies like
> PEX8748, and the result is an inability to allocate resources for
> perfectly normal devices.
>
> I am happy to run further tests, enable kernel debug options, or try patches.
>
> I'm also attaching the full dmesg and lspci output.
>
> Thanks for any guidance or suggestions.
--
i.
Powered by blists - more mailing lists