[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e20e3171-7aa5-0646-7934-e6b10cdefc4e@linux.intel.com>
Date: Wed, 25 Jun 2025 20:45:34 +0300 (EEST)
From: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
To: D Scott Phillips <scott@...amperecomputing.com>
cc: Bjorn Helgaas <bhelgaas@...gle.com>, linux-pci@...r.kernel.org,
Michał Winiarski <michal.winiarski@...el.com>,
Igor Mammedov <imammedo@...hat.com>, LKML <linux-kernel@...r.kernel.org>,
Mika Westerberg <mika.westerberg@...ux.intel.com>
Subject: Re: [PATCH 24/25] PCI: Perform reset_resource() and build fail list
in sync
On Wed, 18 Jun 2025, D Scott Phillips wrote:
> Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com> writes:
>
> > Resetting resource is problematic as it prevent attempting to allocate
> > the resource later, unless something in between restores the resource.
> > Similarly, if fail_head does not contain all resources that were reset,
> > those resource cannot be restored later.
> >
> > The entire reset/restore cycle adds complexity and leaving resources
> > into reseted state causes issues to other code such as for checks done
> > in pci_enable_resources(). Take a small step towards not resetting
> > resources by delaying reset until the end of resource assignment and
> > build failure list (fail_head) in sync with the reset to avoid leaving
> > behind resources that cannot be restored (for the case where the caller
> > provides fail_head in the first place to allow restore somewhere in the
> > callchain, as is not all callers pass non-NULL fail_head).
> >
> > The Expansion ROM check is temporarily left in place while building the
> > failure list until the upcoming change which reworks optional resource
> > handling.
> >
> > Ideally, whole resource reset could be removed but doing that in a big
> > step would make the impact non-tractable due to complexity of all
> > related code.
> >
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
>
> Hi Ilpo, I'm seeing a crash on arm64 at boot that I bisected to this
> change. I don't think it's the same as the other issues reported. I've
> confirmed the crash is still there after your follow up patches. The
> crash itself is below[1].
>
> It looks like the problem begins when:
>
> amdgpu_device_resize_fb_bar()
> pci_resize_resource()
> pci_reassign_bridge_resources()
> __pci_bus_size_bridges()
>
> adds pci_hotplug_io_size to `realloc_head`. The io resource allocation
> has failed earlier because the root port doesn't have an io window[2].
>
> Then with this patch, pci_reassign_bridge_resources()'s call to
> __pci_bridge_assign_resources() now returns the io added space for
> hotplug in the `failed` list where the old code dropped it and did not.
>
> That sends pci_reassign_bridge_resources() into the `cleanup:` path,
> where I think the cleanup code doesn't properly release the resources
> that were assigned by __pci_bridge_assign_resources() and there's a
> conflict reported in pci_claim_resource() where a restored resource is
> found as conflicting with itself:
>
> > pcieport 000d:00:01.0: bridge window [mem 0x340000000000-0x340017ffffff 64bit pref]: can't claim; address conflict with PCI Bus 000d:01 [mem 0x340000000000-0x340017ffffff 64bit pref]
>
> Setting `pci=hpiosize=0` avoids this crash, as does this change:
>
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 16d5d390599a..59ece11702da 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -2442,7 +2442,7 @@ int pci_reassign_bridge_resources(struct pci_dev *bridge, unsigned long type)
> LIST_HEAD(saved);
> LIST_HEAD(added);
> LIST_HEAD(failed);
> - unsigned int i;
> + unsigned int i, relevant_fails;
> int ret;
>
> down_read(&pci_bus_sem);
> @@ -2490,7 +2490,16 @@ int pci_reassign_bridge_resources(struct pci_dev *bridge, unsigned long type)
> __pci_bridge_assign_resources(bridge, &added, &failed);
> BUG_ON(!list_empty(&added));
>
> - if (!list_empty(&failed)) {
> + relevant_fails = 0;
> + list_for_each_entry(dev_res, &failed, list) {
> + restore_dev_resource(dev_res);
> + if (((dev_res->res->flags ^ type) & PCI_RES_TYPE_MASK) == 0)
> + relevant_fails++;
> + }
> + free_list(&failed);
> +
> + /* Cleanup if we had failures in resources of interest */
> + if (relevant_fails != 0) {
> ret = -ENOSPC;
> goto cleanup;
> }
> @@ -2509,11 +2518,6 @@ int pci_reassign_bridge_resources(struct pci_dev *bridge, unsigned long type)
> return 0;
>
> cleanup:
> - /* Restore size and flags */
> - list_for_each_entry(dev_res, &failed, list)
> - restore_dev_resource(dev_res);
> - free_list(&failed);
> -
> /* Revert to the old configuration */
> list_for_each_entry(dev_res, &saved, list) {
> struct resource *res = dev_res->res;
>
> I don't know this code well enough to know if that changes is completely
> bonkers or what.
Hi again,
Thanks for all the details what you think went wrong, it was really
useful. I think you have it towards the right direction but a more
targetted seems enough to address this (this needs to be confirmed, please
test the patch below).
The most correct solution would be to make all the resource fitting code
to focus on the resources that match the type filter. However, that looks
way too scary change at the moment to implement, and especially, let it
end up into stable (to fix this issue). So it looks this somewhat band-aid
solution similar to your attempt might be better as a fix for now.
In medium term, I'd want to avoid using type as a filter and base all
such decisions on matching the bridge window resource the dev resource
belongs to. I've some work towards that direction already which removes
lots of complexity in which bridge window is going to be selected as
there will be a single place to make always the same decision. That change
is also going to simplify the internal interfaces between functions very
noticably (but the change require more testing before I've enough
confidence to submit it). That work doesn't cover this resize side yet but
it should be extended there as well.
So please test this somewhat band-aid patch:
From 971686ed85e341e7234f8fe8b666140187f63ad1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ilpo=20J=C3=A4rvinen?= <ilpo.jarvinen@...ux.intel.com>
Date: Wed, 25 Jun 2025 20:30:43 +0300
Subject: [PATCH 1/1] PCI: Fix failure detectiong during resource resize
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Since the commit 96336ec70264 ("PCI: Perform reset_resource() and build
fail list in sync") the failed list is always built and returned to let
the caller decide if what to do with the failures. The caller may want
to retry resource fitting and assignment and before that can happen,
the resources should be restored to their original state (a reset
effectively clears the struct resource), which requires returning them
on the failed list so that the original state remains stored in the
associated struct pci_dev_resource.
Resource resizing is different from the ordinary resource fitting and
assignment in that it only considers part of the resources. This means
failures for other resource types are not relevant at all and should be
ignored. As resize doesn't unassign such unrelated resources, those
resource ending up into the failed list implies assignment of that
resource must have failed before resize too. The check in
pci_reassign_bridge_resources() to decide if the whole assignment is
successful, however, is based on list emptiness which may cause false
negatives when the failed list resources with unrelated type.
If the failed list is not empty, call pci_required_resource_failed()
and extend it to be able to filter on specific resource types too (if
provided).
Calling pci_required_resource_failed() at this point is slightly
problematic because the resource itself is reset when the failed list
is constructed in __assign_resources_sorted(). As a result,
pci_resource_is_optional() does not have access to the original
resource flags. This could be worked around by restoring and
re-reseting the resource around the call to pci_resource_is_optional(),
however, it shouldn't cause issue as resource resizing is meant for
64-bit prefetchable resources according to Christian König (see the
Link which unfortunately doesn't point directly to Christian's reply
because lore didn't store that email at all).
Link: https://lore.kernel.org/all/c5d1b5d8-8669-5572-75a7-0b480f581ac1@linux.intel.com/
Reported-by: D Scott Phillips <scott@...amperecomputing.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
Cc: Christian König <christian.koenig@....com>
---
drivers/pci/setup-bus.c | 26 ++++++++++++++++++--------
1 file changed, 18 insertions(+), 8 deletions(-)
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 07c3d021a47e..8284bbdc44b4 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -28,6 +28,10 @@
#include <linux/acpi.h>
#include "pci.h"
+#define PCI_RES_TYPE_MASK \
+ (IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH |\
+ IORESOURCE_MEM_64)
+
unsigned int pci_flags;
EXPORT_SYMBOL_GPL(pci_flags);
@@ -384,13 +388,19 @@ static bool pci_need_to_release(unsigned long mask, struct resource *res)
}
/* Return: @true if assignment of a required resource failed. */
-static bool pci_required_resource_failed(struct list_head *fail_head)
+static bool pci_required_resource_failed(struct list_head *fail_head,
+ unsigned long type)
{
struct pci_dev_resource *fail_res;
+ type &= ~PCI_RES_TYPE_MASK;
+
list_for_each_entry(fail_res, fail_head, list) {
int idx = pci_resource_num(fail_res->dev, fail_res->res);
+ if (type && (fail_res->flags & PCI_RES_TYPE_MASK) != type)
+ continue;
+
if (!pci_resource_is_optional(fail_res->dev, idx))
return true;
}
@@ -504,7 +514,7 @@ static void __assign_resources_sorted(struct list_head *head,
}
/* Without realloc_head and only optional fails, nothing more to do. */
- if (!pci_required_resource_failed(&local_fail_head) &&
+ if (!pci_required_resource_failed(&local_fail_head, 0) &&
list_empty(realloc_head)) {
list_for_each_entry(save_res, &save_head, list) {
struct resource *res = save_res->res;
@@ -1704,10 +1714,6 @@ static void __pci_bridge_assign_resources(const struct pci_dev *bridge,
}
}
-#define PCI_RES_TYPE_MASK \
- (IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH |\
- IORESOURCE_MEM_64)
-
static void pci_bridge_release_resources(struct pci_bus *bus,
unsigned long type)
{
@@ -2445,8 +2451,12 @@ int pci_reassign_bridge_resources(struct pci_dev *bridge, unsigned long type)
free_list(&added);
if (!list_empty(&failed)) {
- ret = -ENOSPC;
- goto cleanup;
+ if (pci_required_resource_failed(&failed, type)) {
+ ret = -ENOSPC;
+ goto cleanup;
+ }
+ /* Only resources with unrelated types failed (again) */
+ free_list(&failed);
}
list_for_each_entry(dev_res, &saved, list) {
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
--
2.39.5
Powered by blists - more mailing lists