[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20251028173551.22578-1-ilpo.jarvinen@linux.intel.com>
Date: Tue, 28 Oct 2025 19:35:42 +0200
From: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
To: Alex Bennée <alex.bennee@...aro.org>,
Simon Richter <Simon.Richter@...yros.de>,
Lucas De Marchi <lucas.demarchi@...el.com>,
Alex Deucher <alexander.deucher@....com>,
amd-gfx@...ts.freedesktop.org,
Bjorn Helgaas <bhelgaas@...gle.com>,
David Airlie <airlied@...il.com>,
dri-devel@...ts.freedesktop.org,
intel-gfx@...ts.freedesktop.org,
intel-xe@...ts.freedesktop.org,
Jani Nikula <jani.nikula@...ux.intel.com>,
Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>,
linux-pci@...r.kernel.org,
Rodrigo Vivi <rodrigo.vivi@...el.com>,
Simona Vetter <simona@...ll.ch>,
Tvrtko Ursulin <tursulin@...ulin.net>,
Christian König <christian.koenig@....com>,
Thomas Hellström <thomas.hellstrom@...ux.intel.com>,
Michał Winiarski <michal.winiarski@...el.com>
Cc: linux-kernel@...r.kernel.org,
Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
Subject: [PATCH 0/9] PCI: BAR resizing fix/rework
Simon and Alex, could you please test if this series eliminates the
claim conflicts and makes the BAR resize either succeed or not break
things while rolling back resource changes? It should be tested without
other fix patches (from me; if you need some random unrelated fix,
that's okay).
Hi all,
Thanks to issue reports from Simon Richter and Alex Bennée, I
discovered BAR resize rollback can corrupt the resource tree. As fixing
corruption requires avoiding overlapping resource assignments, the
correct fix can unfortunately results in worse user experience, what
appeared to be "working" previously might no longer do so. Thus, I had
to do a larger rework to pci_resize_resource() in order to properly
restore resource states as it was prior to BAR resize.
This rework has been on my TODO list anyway but it wasn't the highest
prio item until pci_resize_resource() started to cause regressions due
to other resource assignment algorithm changes.
BAR resize rollback does not always restore BAR resources as they were
before the resize operation was started. Currently, when
pci_resize_resource() call is made by a driver, the driver must release
device resource prior to the call. This is a design flaw in
pci_resize_resource() API as PCI core cannot then save the state of
those resources from what it was prior to release so it could restore
them later if the BAR size change has to be rolled back.
PCI core's BAR resize operation doesn't even attempt to restore the
device resources currently when rolling back BAR resize operation. If
the normal resource assignment algorithm assigned those resources, then
device resources might be assigned after pci_resize_resource() call but
that could also trigger the resource tree corruption issue so what
appeared to an user as "working" might be a corrupted state.
With the new pci_resize_resource() interface, the driver calling
pci_resize_resource() should no longer release the device resources.
I've added WARN_ON_ONCE() to pick up similar bugs that cause resource
tree corruption. At least in my tests all looked clear on that front
after this series.
I was a bit on the edge how to split this series. Between patches 1 and
5-8, there might be cases where user experience is made worse if only
part of the series are applied. But at the same time I was hesitant to
merge all those changes together either as the changes way easier to
understand when split properly. Personally I think BAR resize rollback
code has not really functioned okay prior to series at all because
touching an assigned resource on the rollback path is a bug, plain and
simple. If that got things "working" it's still a bad bug (that one can
get lucky and corruption results in non-corrupted numbers doesn't make
it any better). If those patches need to be merged into one, just let
me know and I can rearrange the patch order to make it easier.
This series will conflict what's in pci/rebar and likely with some xe
changes from Lucas De Marchi that might also be rendered in part
unnecessary due to pci_resize_resource() API change. My suggestion is
that this series takes precedence over what's in pci/rebar to make
things easier for stable people (I can rebase the pci/rebar patches on
top of these so feel free to drop those other patches, if needed).
Ilpo Järvinen (9):
PCI: Prevent resource tree corruption when BAR resize fails
PCI/IOV: Adjust ->barsz[] when changing BAR size
PCI: Change pci_dev variable from 'bridge' to 'dev'
PCI: Try BAR resize even when no window was released
PCI: Fix restoring BARs on BAR resize rollback path
drm/xe: Remove driver side BAR release before resize
drm/i915: Remove driver side BAR release before resize
drm/amdgpu: Remove driver side BAR release before resize
PCI: Prevent restoring assigned resources
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 +-
drivers/gpu/drm/i915/gt/intel_region_lmem.c | 12 --
drivers/gpu/drm/xe/xe_vram.c | 3 -
drivers/pci/iov.c | 15 +--
drivers/pci/pci-sysfs.c | 15 +--
drivers/pci/pci.c | 4 +
drivers/pci/pci.h | 8 +-
drivers/pci/setup-bus.c | 119 ++++++++++++++------
drivers/pci/setup-res.c | 30 ++---
9 files changed, 108 insertions(+), 106 deletions(-)
base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787
--
2.39.5
Powered by blists - more mailing lists