linux-kernel - Re: [PATCH 24/25] PCI: Perform reset_resource() and build fail list in sync

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <86plf0lgit.fsf@scott-ph-mail.amperecomputing.com>
Date: Wed, 18 Jun 2025 17:30:18 -0700
From: D Scott Phillips <scott@...amperecomputing.com>
To: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>, Bjorn
 Helgaas
 <bhelgaas@...gle.com>, linux-pci@...r.kernel.org, Michał
 Winiarski
 <michal.winiarski@...el.com>, Igor Mammedov <imammedo@...hat.com>,
 linux-kernel@...r.kernel.org
Cc: Mika Westerberg <mika.westerberg@...ux.intel.com>, Ilpo Järvinen
 <ilpo.jarvinen@...ux.intel.com>
Subject: Re: [PATCH 24/25] PCI: Perform reset_resource() and build fail list
 in sync

Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com> writes:

> Resetting resource is problematic as it prevent attempting to allocate
> the resource later, unless something in between restores the resource.
> Similarly, if fail_head does not contain all resources that were reset,
> those resource cannot be restored later.
>
> The entire reset/restore cycle adds complexity and leaving resources
> into reseted state causes issues to other code such as for checks done
> in pci_enable_resources(). Take a small step towards not resetting
> resources by delaying reset until the end of resource assignment and
> build failure list (fail_head) in sync with the reset to avoid leaving
> behind resources that cannot be restored (for the case where the caller
> provides fail_head in the first place to allow restore somewhere in the
> callchain, as is not all callers pass non-NULL fail_head).
>
> The Expansion ROM check is temporarily left in place while building the
> failure list until the upcoming change which reworks optional resource
> handling.
>
> Ideally, whole resource reset could be removed but doing that in a big
> step would make the impact non-tractable due to complexity of all
> related code.
>
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>

Hi Ilpo, I'm seeing a crash on arm64 at boot that I bisected to this
change. I don't think it's the same as the other issues reported. I've
confirmed the crash is still there after your follow up patches.  The
crash itself is below[1].

It looks like the problem begins when:

amdgpu_device_resize_fb_bar()
 pci_resize_resource()
  pci_reassign_bridge_resources()
   __pci_bus_size_bridges()

adds pci_hotplug_io_size to `realloc_head`. The io resource allocation
has failed earlier because the root port doesn't have an io window[2].

Then with this patch, pci_reassign_bridge_resources()'s call to
__pci_bridge_assign_resources() now returns the io added space for
hotplug in the `failed` list where the old code dropped it and did not.

That sends pci_reassign_bridge_resources() into the `cleanup:` path,
where I think the cleanup code doesn't properly release the resources
that were assigned by __pci_bridge_assign_resources() and there's a
conflict reported in pci_claim_resource() where a restored resource is
found as conflicting with itself:

> pcieport 000d:00:01.0: bridge window [mem 0x340000000000-0x340017ffffff 64bit pref]: can't claim; address conflict with PCI Bus 000d:01 [mem 0x340000000000-0x340017ffffff 64bit pref]

Setting `pci=hpiosize=0` avoids this crash, as does this change:

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 16d5d390599a..59ece11702da 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -2442,7 +2442,7 @@ int pci_reassign_bridge_resources(struct pci_dev *bridge, unsigned long type)
 	LIST_HEAD(saved);
 	LIST_HEAD(added);
 	LIST_HEAD(failed);
-	unsigned int i;
+	unsigned int i, relevant_fails;
 	int ret;
 
 	down_read(&pci_bus_sem);
@@ -2490,7 +2490,16 @@ int pci_reassign_bridge_resources(struct pci_dev *bridge, unsigned long type)
 	__pci_bridge_assign_resources(bridge, &added, &failed);
 	BUG_ON(!list_empty(&added));
 
-	if (!list_empty(&failed)) {
+	relevant_fails = 0;
+	list_for_each_entry(dev_res, &failed, list) {
+		restore_dev_resource(dev_res);
+		if (((dev_res->res->flags ^ type) & PCI_RES_TYPE_MASK) == 0)
+			relevant_fails++;
+	}
+	free_list(&failed);
+
+	/* Cleanup if we had failures in resources of interest */
+	if (relevant_fails != 0) {
 		ret = -ENOSPC;
 		goto cleanup;
 	}
@@ -2509,11 +2518,6 @@ int pci_reassign_bridge_resources(struct pci_dev *bridge, unsigned long type)
 	return 0;
 
 cleanup:
-	/* Restore size and flags */
-	list_for_each_entry(dev_res, &failed, list)
-		restore_dev_resource(dev_res);
-	free_list(&failed);
-
 	/* Revert to the old configuration */
 	list_for_each_entry(dev_res, &saved, list) {
 		struct resource *res = dev_res->res;

I don't know this code well enough to know if that changes is completely
bonkers or what. Maybe a change that gets zero as pci_hotplug_io_size
for root ports that don't have an io window would be better? Any other
ideas, or other information about the crash that I could provide?
Thanks,

Scott

[1]: Crash:
> SError Interrupt on CPU0, code 0x00000000be000411 -- SError
> CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.15.2+ #16 PREEMPT(undef)
> Hardware name: Adlink Ampere Altra Developer Platform/Ampere Altra Developer Platform, BIOS TianoCore 2.09.100.00 (SYS: 2.10.20221028) 04/30/2
> Workqueue: events work_for_cpu_fn
> pstate: 204000c9 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : _raw_spin_lock_irqsave+0x34/0xb0
> lr : __wake_up+0x30/0x80
> sp : ffff800080003e20
> x29: ffff800080003e20 x28: ffff07ff808b0000 x27: 0000000000050260
> x26: 0000000000000001 x25: ffffcc5decad6bd8 x24: ffffcc5decc8b5b8
> x23: ffff080138a10000 x22: ffff080138a00000 x21: 0000000000000003
> x20: ffff080138a14dc8 x19: 0000000000000000 x18: 006808018e775f03
> x17: ffff3bc1330d2000 x16: ffffcc5e3a520ed8 x15: 8daad055c6e77021
> x14: f231631cf9328575 x13: 0e51168d06a6cf91 x12: f5db8c23b764520c
> x11: 0000000000000040 x10: 0000000000000000 x9 : ffffcc5e3a520f08
> x8 : 0000000000002113 x7 : 0000000000000000 x6 : 0000000000000000
> x5 : 0000000000000000 x4 : ffff800080003e08 x3 : 00000000000000c0
> x2 : 0000000000000001 x1 : 0000000000000000 x0 : ffff080138a14dc8
> Kernel panic - not syncing: Asynchronous SError Interrupt
> CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.15.2+ #16 PREEMPT(undef)
> Hardware name: Adlink Ampere Altra Developer Platform/Ampere Altra Developer Platform, BIOS TianoCore 2.09.100.00 (SYS: 2.10.20221028) 04/30/2
> Workqueue: events work_for_cpu_fn
> Call trace:
>  show_stack+0x30/0x98 (C)
>  dump_stack_lvl+0x7c/0xa0
>  dump_stack+0x18/0x2c
>  panic+0x184/0x3c0
>  nmi_panic+0x90/0xa0
>  arm64_serror_panic+0x6c/0x90
>  do_serror+0x58/0x60
>  el1h_64_error_handler+0x38/0x60
>  el1h_64_error+0x84/0x88
>  _raw_spin_lock_irqsave+0x34/0xb0 (P)
>  amdgpu_ih_process+0xf0/0x150 [amdgpu]
>  amdgpu_irq_handler+0x34/0xa0 [amdgpu]
>  __handle_irq_event_percpu+0x60/0x248
>  handle_irq_event+0x4c/0xc0
>  handle_fasteoi_irq+0xa0/0x1c8
>  handle_irq_desc+0x3c/0x68
>  generic_handle_domain_irq+0x24/0x40
>  __gic_handle_irq_from_irqson.isra.0+0x15c/0x260
>  gic_handle_irq+0x28/0x80
>  call_on_irq_stack+0x24/0x30
>  do_interrupt_handler+0x88/0xa0
>  el1_interrupt+0x44/0xd0
>  el1h_64_irq_handler+0x18/0x28
>  el1h_64_irq+0x84/0x88
>  amdgpu_device_rreg.part.0+0x4c/0x190 [amdgpu] (P)
>  amdgpu_device_rreg+0x24/0x40 [amdgpu]
>  psp_wait_for+0x88/0xd8 [amdgpu]
>  psp_v11_0_bootloader_load_component+0x164/0x1b0 [amdgpu]
>  psp_v11_0_bootloader_load_kdb+0x20/0x40 [amdgpu]
>  psp_hw_start+0x5c/0x580 [amdgpu]
>  psp_load_fw+0x9c/0x280 [amdgpu]
>  psp_hw_init+0x44/0xa0 [amdgpu]
>  amdgpu_device_fw_loading+0xf8/0x358 [amdgpu]
>  amdgpu_device_ip_init+0x684/0x990 [amdgpu]
>  amdgpu_device_init+0xba4/0x1038 [amdgpu]
>  amdgpu_driver_load_kms+0x20/0xb8 [amdgpu]
>  amdgpu_pci_probe+0x1f8/0x7f8 [amdgpu]
>  local_pci_probe+0x44/0xb0
>  work_for_cpu_fn+0x24/0x40
>  process_one_work+0x17c/0x410
>  worker_thread+0x254/0x388
>  kthread+0x10c/0x128
>  ret_from_fork+0x10/0x20
> Kernel Offset: 0x4c5dba380000 from 0xffff800080000000
> PHYS_OFFSET: 0x80000000
> CPU features: 0x0800,000042e0,01202650,8241720b
> Memory Limit: none
> ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---

[2]: boot snippet
> ACPI: PCI Root Bridge [PCI1] (domain 000d [bus 00-ff])
> acpi PNP0A08:01: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3]
> acpi PNP0A08:01: _OSC: platform does not support [PCIeHotplug PME LTR DPC]
> acpi PNP0A08:01: _OSC: OS now controls [PCIeCapability]
> acpi PNP0A08:01: FADT indicates ASPM is unsupported, using BIOS configuration
> acpi PNP0A08:01: MCFG quirk: ECAM at [mem 0x37fff0000000-0x37ffffffffff] for [bus 00-ff] with pci_32b_read_ops
> acpi PNP0A08:01: ECAM area [mem 0x37fff0000000-0x37ffffffffff] reserved by PNP0C02:00
> acpi PNP0A08:01: ECAM at [mem 0x37fff0000000-0x37ffffffffff] for [bus 00-ff]
> PCI host bridge to bus 000d:00
> pci_bus 000d:00: root bus resource [mem 0x50000000-0x5fffffff window]
> pci_bus 000d:00: root bus resource [mem 0x340000000000-0x37ffdfffffff window]
> pci_bus 000d:00: root bus resource [bus 00-ff]
> pci 000d:00:00.0: [1def:e100] type 00 class 0x060000 conventional PCI endpoint
> pci 000d:00:01.0: [1def:e101] type 01 class 0x060400 PCIe Root Port
> pci 000d:00:01.0: PCI bridge to [bus 01-03]
> pci 000d:00:01.0:   bridge window [io  0xe000-0xefff]
> pci 000d:00:01.0:   bridge window [mem 0x50000000-0x502fffff]
> pci 000d:00:01.0:   bridge window [mem 0x340000000000-0x3400101fffff 64bit pref]
> pci 000d:00:01.0: supports D1 D2
> pci 000d:00:01.0: PME# supported from D0 D1 D3hot
> ...
> pci 000d:00:01.0: bridge window [mem 0x340000000000-0x340017ffffff 64bit pref]: assigned
> pci 000d:00:01.0: bridge window [mem 0x50000000-0x502fffff]: assigned
> pci 000d:00:01.0: bridge window [io  size 0x1000]: can't assign; no space
> pci 000d:00:01.0: bridge window [io  size 0x1000]: failed to assign
> pci 000d:00:01.0: bridge window [io  size 0x1000]: can't assign; no space
> pci 000d:00:01.0: bridge window [io  size 0x1000]: failed to assign