lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID:
 <GV2PR10MB6672E198F632F5903E27AEF283D0A@GV2PR10MB6672.EURPRD10.PROD.OUTLOOK.COM>
Date: Mon, 24 Nov 2025 14:32:08 +0000
From: Hinko Kocevar <Hinko.Kocevar@....eu>
To: "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>
CC: "bhelgaas@...gle.com" <bhelgaas@...gle.com>,
	"ilpo.jarvinen@...ux.intel.com" <ilpo.jarvinen@...ux.intel.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: [BUG] PCIe hotplug behind PEX8748: bridge window allocation failures
 when moving AMC between adjacent downstream ports

Hello,

I am observing reproducible PCIe hotplug resource allocation failures on Linux 6.18.0-rc7 in a MicroTCA system with an Intel Q170-based CPU board and a PLX PEX8725 / PEX8748 PCIe switch hierarchy. Earlier stock versions of the kernel (6.11, 6.8) fail with similar symptoms.

An AMC card with a small 256 KiB BAR works correctly at boot, and also works when hot-removed and reinserted into the *same* slot. However, when reinserted into an adjacent slot, the kernel fails to assign even a 256 KiB BAR, with repeated messages of:

bridge window [mem size X]: can't assign; no space
pci <endpoint>: BAR 0 [...] failed to assign


This occurs with vanilla Linux built from git, with no pci= cmdline options, and with Above-4G decoding enabled in CPU BIOS.

I have (see attached file for details on PCI resources):

* CPU board: Intel Q170 chipset
* Root port → PEX8725 (01:00.0) → PEX8748 (03:00.0) → downstream ports 04:00.0 .. 04:12.0
* AMC card under test:
  * 10ee:7011, Xilinx 7-Series PCIe endpoint
  * Single BAR0 of 0x40000 bytes

This AMC works normally at boot and functions under the `mrf-pci` driver.

Reproduce error sequence:

1. Remove AMC

[  840.371432] pcieport 0000:04:0b.0: pciehp: Slot(12): Button press: will power off in 5 sec
[  845.448242] mrf-pci 0000:0c:00.0: MRF Cleaned up

2. Reinsert into SAME slot (Slot 12) → SUCCESS

The kernel cannot allocate IO windows, but *BAR 0 is successfully assigned*:

[  865.689276] pcieport 0000:04:0b.0: pciehp: Slot(12): Link Up
[  866.687797] pci 0000:0c:00.0: [10ee:7011]
[  866.687952] pci 0000:0c:00.0: BAR 0 [mem 0x00000000-0x0003ffff]

[  866.688528] pci 0000:0c:00.0: BAR 0 [mem 0xdf000000-0xdf03ffff]: assigned

[  866.689539] mrf-pci 0000:0c:00.0: MRF Setup complete

The device is operational.

3. Remove AMC and insert into ADJACENT slot (Slot 11) → FAILURE

When moved to a neighboring downstream PEX8748 port, BAR assignment fails repeatedly:

[  952.268260] pcieport 0000:04:09.0: pciehp: Slot(11): Card present
[  953.367876] pci 0000:0a:00.0: [10ee:7011]
[  953.368008] pci 0000:0a:00.0: BAR 0 [mem 0x00000000-0x0003ffff]

[  953.368506] pcieport 0000:04:09.0: bridge window [mem size 0x00200000]: can't assign; no space
[  953.368515] pcieport 0000:04:09.0: bridge window [mem size 0x00200000 64bit pref]: can't assign; no space
[  953.368544] pci 0000:0a:00.0: BAR 0 [mem size 0x00040000]: can't assign; no space
[  953.368553] pci 0000:0a:00.0: BAR 0 [mem size 0x00040000]: failed to assign

[  953.369048] mrf-pci 0000:0a:00.0: can't ioremap BAR 0: [??? 0x00000000 flags 0x0]
[  953.369054] mrf-pci 0000:0a:00.0: Failed to map BARS!


The kernel repeatedly tries to reserve 2 MiB bridge windows for this port (size 0x00200000), even though the only required resource is a 256 KiB EP BAR.

Why this appears to be a kernel bug?

* The endpoint BAR is small (256 KiB).
* Hotplug into the same slot succeeds.
* Hotplug into an adjacent slot fails, with oversized bridge windows requested.
* Cold boot always succeeds.
* The hotplug sizing logic seems to request windows much larger than necessary.
* The switch hierarchy is complex but static and stable; only the endpoint moves.

Given this pattern, it appears that the bridge-window sizing policy during hotplug is too conservative for switch-dense topologies like PEX8748, and the result is an inability to allocate resources for perfectly normal devices.

I am happy to run further tests, enable kernel debug options, or try patches.

I'm also attaching the full dmesg and lspci output.

Thanks for any guidance or suggestions.

Best regards,
Hinko

View attachment "report.txt" of type "text/plain" (296640 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ