[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <f1ba313f-74c2-ae63-ac80-4a35e53477b4@linux.intel.com>
Date: Wed, 17 Sep 2025 16:00:10 +0300 (EEST)
From: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
To: Lucas De Marchi <lucas.demarchi@...el.com>
cc: linux-pci@...r.kernel.org, Bjorn Helgaas <bhelgaas@...gle.com>,
Krzysztof Wilczyński <kw@...ux.com>,
Christian König <christian.koenig@....com>,
Michał Winiarski <michal.winiarski@...el.com>,
Alex Deucher <alexander.deucher@....com>, amd-gfx@...ts.freedesktop.org,
David Airlie <airlied@...il.com>, dri-devel@...ts.freedesktop.org,
intel-gfx@...ts.freedesktop.org, intel-xe@...ts.freedesktop.org,
Jani Nikula <jani.nikula@...ux.intel.com>,
Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>,
Rodrigo Vivi <rodrigo.vivi@...el.com>, Simona Vetter <simona@...ll.ch>,
Tvrtko Ursulin <tursulin@...ulin.net>,
?UTF-8?q?Thomas=20Hellstr=C3=B6m?= <thomas.hellstrom@...ux.intel.com>,
"Michael J . Ruhl" <mjruhl@...ana.ai>, LKML <linux-kernel@...r.kernel.org>,
linux-doc@...r.kernel.org
Subject: Re: [PATCH v2 00/11] PCI: Resizable BAR improvements
On Tue, 16 Sep 2025, Lucas De Marchi wrote:
> On Mon, Sep 15, 2025 at 08:24:06PM +0300, Ilpo Järvinen wrote:
> > On Mon, 15 Sep 2025, Lucas De Marchi wrote:
> >
> > > On Mon, Sep 15, 2025 at 12:13:47PM +0300, Ilpo Järvinen wrote:
> > > > pci.c has been used as catch everything that doesn't fits elsewhere
> > > > within PCI core and thus resizable BAR code has been placed there as
> > > > well. Move Resizable BAR related code to a newly introduced rebar.c to
> > > > reduce size of pci.c. After move, there are no pci_rebar_*() calls from
> > > > pci.c indicating this is indeed well-defined subset of PCI core.
> > > >
> > > > Endpoint drivers perform Resizable BAR related operations which could
> > > > well be performed by PCI core to simplify driver-side code. This
> > > > series adds a few new API functions to that effect and converts the
> > > > drivers to use the new APIs (in separate patches).
> > > >
> > > > While at it, also convert BAR sizes bitmask to u64 as PCIe spec already
> > > > specifies more sizes than what will fit u32 to make the API typing more
> > > > future-proof. The extra sizes beyond 128TB are not added at this point.
> > > >
> > > > These are based on pci/main plus a simple "adapter" patch to add the
> > > > include for xe_vram_types.h that was added by a commit in drm-tip.
> > > > Hopefully that is enough to avoid the within context conflict with
> > > > BAR_SIZE_SHIFT removal to let the xe CI tests to be run for this
> > > > series.
> > > >
> > > > There are two minor conflicts with the work in pci/resource but I'm
> > > > hesitant to base this on top of it as this is otherwise entirely
> > > > independent (and would likely prevent GPU CI tests as well). If we end
> > > > up having to pull the bridge window select changes, there should be no
> > > > reason why this does have to become collateral damage (so my
> > > > suggestion, if this is good to go in this cycle, to take this into a
> > > > separate branch than pci/resource and deal with those small conflicts
> > > > while merging into pci/next).
> > > >
> > > > I've tested sysfs resize, i915, and xe BAR resizing functionality. In
> > > > the case of xe, I did small hack patch as its resize is anyway broken
> > > > as is because BAR0 pins the bridge window so resizing BAR2 fails. My
> > > > hack caused other problems further down the road (likely because BAR0
> > > > is in use by the driver so releasing it messed assumptions xe driver
> > > > has) but the BAR resize itself was working which was all I was
> > >
> > > is the hack you mention here to release all BARs before attempting the
> > > resize?
> >
> > Yes, the patch added release of BAR0 prior to resize. The existing xe code
> > in _resize_bar() already releases BAR2.
> >
> > During resize, if the first loop in pbus_reassign_bridge_resources()
> > (called from pci_resize_resource()) finds the bridge window closest to the
> > endpoint still has a child, it results in having empty saved list because
> > all upstream bridge windows will then have a child as well.
> >
> > Empty saved list is checked after the loop and
> > pbus_reassign_bridge_resources() returns -ENOENT without even trying to
> > assign the resources. The error is returned even if the actual bridge
> > window size is large enough to fit the resized resource.
> >
> > The logic in pci_resize_resource() and pbus_reassign_bridge_resources()
> > need some other improvements besides that problem, but I likely won't
> > have time to look at that until completing the fitting algorithm changes.
> > I'd actually want to add pci_release_and_resize_resource() which would
> > take care of releasing all the resources of the device (obviously driver
> > must have its hands off all those BARs when it calls that function). With
> > the current pci_resize_resource() API, handling the restore of BARs in
> > case of failure is not as robust as I'd like to make it.
> >
> > > > interested to know. I'm not planning to pursue fixing the pinning
> > > > problem within xe driver because the core changes to consider maximum
> > > > size of the resizable BARs should take care of the main problem by
> > > > different means.
> > >
> > > I'd actually like to pursue that myself as that could be propagated to
> > > stable since we do have some resize errors in xe with BMG that I wasn't
> > > understanding. It's likely due to xe_mmio_probe_early() taking a hold of
> > > BAR0 and not expecting it to be moved. We could either remap if we have
> > > have to resize or just move the resize logic early on.
> >
> > Great. If you have any questions when it comes to the PCI core side code,
> > please let me know.
>
> I moved the resize to happen before anything else in xe. However when
> testing I noticed a scenario failing without involving the driver.
> With and without this series I still have the same pass/failure
> scenarios:
>
> Tests executed with a BMG. Just after boot, BAR2 is 16GB.
>
> 1) If I resize it via sysfs to 8GB and then load the driver, it resizes
> it back. Resize from sysfs works too. No change in behavior.
It's expected that resizing smaller size -> back to the original works
through sysfs because the upstream window pins won't prevent reacquiring
the same or less space.
But the way resize is called from current xe code, sizing even to a
smaller size fails because BAR0 pins the closest upstream window,
resulting in -ENOENT as explained above. I don't see fixing this on core
side as priority because I plan to rework the resizing code anyway and
resizing to a smaller size doesn't seem overly useful use case.
> 2) If I do "remove the bridge via sysfs and rescan the bus"[1], it fails to
> resize (either automatically, on rescan, via sysfs, or loading the xe
> driver). It just stays at 256M.
This is because the larger resource sizes are only calculated on the
actual resize call which occurs after the bridge windows were already
sized on rescan to the smaller size. At that point, the critical bridge
windows are already pinned in place and thus cannot be relocated to free
area I assume there would be somewhere within 4000000000-7fffffffff.
> The only thing that brings it back is a reboot. /proc/iomem shows this:
>
> 4000000000-7fffffffff : PCI Bus 0000:00
> 4000000000-44007fffff : PCI Bus 0000:01
> 4000000000-4017ffffff : PCI Bus 0000:02
> 4000000000-400fffffff : PCI Bus 0000:03 <<<< BMG
> 4000000000-400fffffff : 0000:03:00.0
> 4010000000-40100fffff : PCI Bus 0000:04
This pins 0000:01:00.0's window in place. And also prevents enlarging the
siblings.
It would possible, though, to release it and still use sysfs to perform
resize on 0000:03:00.0 as removing 0000:04:00.0 doesn't require removing
0000:03:00.0. But...
> 4018000000-40187fffff : 0000:01:00.0
...This resource pins 0000:00:01.0's window in place. AFAIK, it cannot be
released other than by removing 0000:01:00.0 which results in removing
0000:03:00.0 as well, thus making it impossible to perform the BAR resize
for 0000:03:00.0 through sysfs anymore. Catch-22.
Could you test if the attached quirk patch helps. Maybe it could be
considered as the interim solution until the bridge sizing logic becomes
aware of the resizable BARs. To use a quirk to do this feels hacky to me,
but then it's hard to point out any real downsides with that approach
(other than having to quirk it).
You'll still need to manually release 0000:04:00.0 but the BAR0 on the
switch should be gone thanks to the quirk. When both of the window pins
are gone, I think the resize through sysfs should work.
> And dmesg shows this for the rescan:
>
> [ 1673.189737] pci 0000:01:00.0: [8086:e2ff] type 01 class 0x060400 PCIe
> Switch Upstream Port
> [ 1673.189794] pci 0000:01:00.0: BAR 0 [mem 0x00000000-0x007fffff 64bit pref]
> [ 1673.189808] pci 0000:01:00.0: PCI bridge to [bus 00]
> [ 1673.189824] pci 0000:01:00.0: bridge window [io 0x0000-0x0fff]
> [ 1673.189834] pci 0000:01:00.0: bridge window [mem 0x00000000-0x000fffff]
> [ 1673.189856] pci 0000:01:00.0: bridge window [mem 0x00000000-0x000fffff
> 64bit pref]
> [ 1673.189878] pci 0000:01:00.0: Max Payload Size set to 256 (was 128, max
> 256)
> [ 1673.190164] pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
> [ 1673.193531] pci 0000:01:00.0: Adding to iommu group 16
> [ 1673.196997] pcieport 0000:00:01.0: ASPM: current common clock configuration
> is inconsistent, reconfiguring
> [ 1673.197061] pci 0000:01:00.0: bridge configuration invalid ([bus 00-00]),
> reconfiguring
> [ 1673.197421] pci 0000:02:01.0: [8086:e2f0] type 01 class 0x060400 PCIe
> Switch Downstream Port
> [ 1673.197452] pci 0000:02:01.0: PCI bridge to [bus 00]
> [ 1673.197463] pci 0000:02:01.0: bridge window [io 0x0000-0x0fff]
> [ 1673.197468] pci 0000:02:01.0: bridge window [mem 0x00000000-0x000fffff]
> [ 1673.197482] pci 0000:02:01.0: bridge window [mem 0x00000000-0x000fffff
> 64bit pref]
> [ 1673.197497] pci 0000:02:01.0: Max Payload Size set to 256 (was 128, max
> 256)
> [ 1673.197503] pci 0000:02:01.0: enabling Extended Tags
> [ 1673.197660] pci 0000:02:01.0: PME# supported from D0 D3hot D3cold
> [ 1673.198411] pci 0000:02:01.0: Adding to iommu group 17
> [ 1673.200258] pci 0000:02:02.0: [8086:e2f1] type 01 class 0x060400 PCIe
> Switch Downstream Port
> [ 1673.200289] pci 0000:02:02.0: PCI bridge to [bus 00]
> [ 1673.200299] pci 0000:02:02.0: bridge window [io 0x0000-0x0fff]
> [ 1673.200304] pci 0000:02:02.0: bridge window [mem 0x00000000-0x000fffff]
> [ 1673.200317] pci 0000:02:02.0: bridge window [mem 0x00000000-0x000fffff
> 64bit pref]
> [ 1673.200333] pci 0000:02:02.0: Max Payload Size set to 256 (was 128, max
> 256)
> [ 1673.200337] pci 0000:02:02.0: enabling Extended Tags
> [ 1673.200470] pci 0000:02:02.0: PME# supported from D0 D3hot D3cold
> [ 1673.201059] pci 0000:02:02.0: Adding to iommu group 18
> [ 1673.202761] pci 0000:01:00.0: PCI bridge to [bus 02-04]
> [ 1673.202774] pci 0000:02:01.0: bridge configuration invalid ([bus 00-00]),
> reconfiguring
> [ 1673.202782] pci 0000:02:02.0: bridge configuration invalid ([bus 00-00]),
> reconfiguring
> [ 1673.203024] pci 0000:03:00.0: [8086:e221] type 00 class 0x030000 PCIe
> Endpoint
> [ 1673.203060] pci 0000:03:00.0: BAR 0 [mem 0x00000000-0x00ffffff 64bit]
> [ 1673.203064] pci 0000:03:00.0: BAR 2 [mem 0x00000000-0x0fffffff 64bit pref]
> [ 1673.203069] pci 0000:03:00.0: ROM [mem 0x00000000-0x001fffff pref]
> [ 1673.203077] pci 0000:03:00.0: Max Payload Size set to 256 (was 128, max
> 256)
> [ 1673.203209] pci 0000:03:00.0: PME# supported from D0 D3hot
> [ 1673.203770] pci 0000:03:00.0: Adding to iommu group 19
> [ 1673.205451] pci 0000:03:00.0: vgaarb: setting as boot VGA device
> [ 1673.205454] pci 0000:03:00.0: vgaarb: bridge control possible
> [ 1673.205455] pci 0000:03:00.0: vgaarb: VGA device added:
> decodes=io+mem,owns=none,locks=none
> [ 1673.205534] pci 0000:02:01.0: PCI bridge to [bus 03-04]
> [ 1673.205543] pci_bus 0000:03: busn_res: [bus 03-04] end is updated to 03
> [ 1673.205787] pci 0000:04:00.0: [8086:e2f7] type 00 class 0x040300 PCIe
> Endpoint
> [ 1673.205848] pci 0000:04:00.0: BAR 0 [mem 0x00000000-0x00003fff 64bit]
> [ 1673.205867] pci 0000:04:00.0: Max Payload Size set to 256 (was 128, max
> 256)
> [ 1673.205872] pci 0000:04:00.0: enabling Extended Tags
> [ 1673.206012] pci 0000:04:00.0: PME# supported from D3hot D3cold
> [ 1673.206528] pci 0000:04:00.0: Adding to iommu group 20
> [ 1673.208271] pci 0000:02:02.0: PCI bridge to [bus 04]
> [ 1673.208284] pci_bus 0000:04: busn_res: [bus 04] end is updated to 04
> [ 1673.208291] pci_bus 0000:02: busn_res: [bus 02-04] end is updated to 04
> [ 1673.232003] pcieport 0000:00:01.0: Assigned bridge window [mem
> 0x83000000-0x840fffff] to [bus 01-04] cannot fit 0x2000000 required for
> 0000:02:01.0 bridging to [bus 03]
> [ 1673.232009] pci 0000:02:01.0: bridge window [mem 0x00000000-0x000fffff] to
> [bus 03] requires relaxed alignment rules
> [ 1673.232016] pci 0000:02:01.0: bridge window [mem 0x01000000-0x01ffffff] to
> [bus 03] add_size 200000 add_align 1000000
> [ 1673.232020] pcieport 0000:00:01.0: Assigned bridge window [mem
> 0x83000000-0x840fffff] to [bus 01-04] cannot fit 0x1800000 required for
> 0000:01:00.0 bridging to [bus 02-04]
> [ 1673.232025] pci 0000:01:00.0: bridge window [mem 0x00000000-0x000fffff] to
> [bus 02-04] requires relaxed alignment rules
> [ 1673.232027] pcieport 0000:00:01.0: Assigned bridge window [mem
> 0x83000000-0x840fffff] to [bus 01-04] cannot fit 0x2000000 required for
> 0000:01:00.0 bridging to [bus 02-04]
> [ 1673.232031] pci 0000:01:00.0: bridge window [mem 0x00000000-0x000fffff] to
> [bus 02-04] requires relaxed alignment rules
> [ 1673.232036] pci 0000:01:00.0: bridge window [mem 0x01000000-0x020fffff] to
> [bus 02-04] add_size 200000 add_align 1000000
> [ 1673.232077] pci 0000:01:00.0: bridge window [mem 0x4000000000-0x4017ffffff
> 64bit pref]: assigned
> [ 1673.232080] pci 0000:01:00.0: bridge window [mem size 0x01300000]: can't
> assign; no space
> [ 1673.232082] pci 0000:01:00.0: bridge window [mem size 0x01300000]: failed
> to assign
> [ 1673.232090] pci 0000:01:00.0: BAR 0 [mem 0x4018000000-0x40187fffff 64bit
> pref]: assigned
> [ 1673.232103] pci 0000:01:00.0: bridge window [io 0x8000-0x9fff]: assigned
> [ 1673.232129] pci 0000:01:00.0: bridge window [mem 0x83000000-0x840fffff]:
> assigned
> [ 1673.232131] pci 0000:01:00.0: bridge window [mem 0x83000000-0x840fffff]:
> failed to expand by 0x200000
> [ 1673.232136] pci 0000:01:00.0: bridge window [mem 0x83000000-0x840fffff]:
> failed to add optional 200000
> [ 1673.232192] pci 0000:02:01.0: bridge window [mem 0x4000000000-0x400fffffff
> 64bit pref]: assigned
> [ 1673.232196] pci 0000:02:01.0: bridge window [mem 0x83000000-0x83ffffff]:
> assigned
> [ 1673.232200] pci 0000:02:02.0: bridge window [mem 0x84000000-0x840fffff]:
> assigned
> [ 1673.232202] pci 0000:02:02.0: bridge window [mem 0x4010000000-0x40100fffff
> 64bit pref]: assigned
> [ 1673.232204] pci 0000:02:01.0: bridge window [io 0x8000-0x8fff]: assigned
> [ 1673.232206] pci 0000:02:02.0: bridge window [io 0x9000-0x9fff]: assigned
> [ 1673.232241] pci 0000:03:00.0: BAR 2 [mem 0x4000000000-0x400fffffff 64bit
> pref]: assigned
> [ 1673.232250] pci 0000:03:00.0: BAR 0 [mem 0x83000000-0x83ffffff 64bit]:
> assigned
> [ 1673.232259] pci 0000:03:00.0: ROM [mem size 0x00200000 pref]: can't assign;
> no space
> [ 1673.232261] pci 0000:03:00.0: ROM [mem size 0x00200000 pref]: failed to
> assign
> [ 1673.232272] pci 0000:03:00.0: BAR 2 [mem 0x4000000000-0x400fffffff 64bit
> pref]: assigned
> [ 1673.232280] pci 0000:03:00.0: BAR 0 [mem 0x83000000-0x83ffffff 64bit]:
> assigned
> [ 1673.232289] pci 0000:03:00.0: ROM [mem size 0x00200000 pref]: can't assign;
> no space
> [ 1673.232291] pci 0000:03:00.0: ROM [mem size 0x00200000 pref]: failed to
> assign
> [ 1673.232302] pci 0000:02:01.0: PCI bridge to [bus 03]
> [ 1673.232304] pci 0000:02:01.0: bridge window [io 0x8000-0x8fff]
> [ 1673.232309] pci 0000:02:01.0: bridge window [mem 0x83000000-0x83ffffff]
> [ 1673.232313] pci 0000:02:01.0: bridge window [mem
> 0x4000000000-0x400fffffff 64bit pref]
> [ 1673.232321] pci 0000:04:00.0: BAR 0 [mem 0x84000000-0x84003fff 64bit]:
> assigned
> [ 1673.232336] pci 0000:02:02.0: PCI bridge to [bus 04]
> [ 1673.232339] pci 0000:02:02.0: bridge window [io 0x9000-0x9fff]
> [ 1673.232345] pci 0000:02:02.0: bridge window [mem 0x84000000-0x840fffff]
> [ 1673.232349] pci 0000:02:02.0: bridge window [mem
> 0x4010000000-0x40100fffff 64bit pref]
> [ 1673.232356] pci 0000:01:00.0: PCI bridge to [bus 02-04]
> [ 1673.232359] pci 0000:01:00.0: bridge window [io 0x8000-0x9fff]
> [ 1673.232363] pci 0000:01:00.0: bridge window [mem 0x83000000-0x840fffff]
> [ 1673.232366] pci 0000:01:00.0: bridge window [mem
> 0x4000000000-0x4017ffffff 64bit pref]
> [ 1673.232471] pcieport 0000:01:00.0: enabling device (0000 -> 0003)
> [ 1673.233508] pcieport 0000:02:01.0: enabling device (0000 -> 0003)
> [ 1673.233692] pcieport 0000:02:02.0: enabling device (0000 -> 0003)
>
> # echo 9 > /sys/bus/pci/devices/0000\:03\:00.0/resource2_resize -bash: echo:
> write error: No space left on device
>
>
> [1] # echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/remove
> # echo 0 > /sys/bus/pci/drivers_autoprobe
> # echo 1 > /sys/bus/pci/rescan
>
>
> I can share the xe patch so you check if it at least fixes it in your
> test scenario.
Ah, one thing I didn't remember mention is that in my case the BAR is
already at its maximum size, so to test the resize is working, I made
the target size smaller, not larger. (I understand this might not be very
helpful in your case but I was only interested that resize code still
works after this series).
--
i.
View attachment "0001-PCI-Release-BAR0-of-an-integrated-bridge-to-allow-GP.patch" of type "text/x-diff" (3715 bytes)
Powered by blists - more mailing lists