linux-kernel - Re: [PATCH 24/25] PCI: Perform reset_resource() and build fail list in sync

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <78aed75b-137b-b48e-3699-20fcde248680@linux.intel.com>
Date: Tue, 24 Jun 2025 15:48:36 +0300 (EEST)
From: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
To: D Scott Phillips <scott@...amperecomputing.com>
cc: Bjorn Helgaas <bhelgaas@...gle.com>, linux-pci@...r.kernel.org, 
    Michał Winiarski <michal.winiarski@...el.com>, 
    Igor Mammedov <imammedo@...hat.com>, LKML <linux-kernel@...r.kernel.org>, 
    Mika Westerberg <mika.westerberg@...ux.intel.com>
Subject: Re: [PATCH 24/25] PCI: Perform reset_resource() and build fail list
 in sync

On Wed, 18 Jun 2025, D Scott Phillips wrote:

> Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com> writes:
> 
> > Resetting resource is problematic as it prevent attempting to allocate
> > the resource later, unless something in between restores the resource.
> > Similarly, if fail_head does not contain all resources that were reset,
> > those resource cannot be restored later.
> >
> > The entire reset/restore cycle adds complexity and leaving resources
> > into reseted state causes issues to other code such as for checks done
> > in pci_enable_resources(). Take a small step towards not resetting
> > resources by delaying reset until the end of resource assignment and
> > build failure list (fail_head) in sync with the reset to avoid leaving
> > behind resources that cannot be restored (for the case where the caller
> > provides fail_head in the first place to allow restore somewhere in the
> > callchain, as is not all callers pass non-NULL fail_head).
> >
> > The Expansion ROM check is temporarily left in place while building the
> > failure list until the upcoming change which reworks optional resource
> > handling.
> >
> > Ideally, whole resource reset could be removed but doing that in a big
> > step would make the impact non-tractable due to complexity of all
> > related code.
> >
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
> 
> Hi Ilpo, I'm seeing a crash on arm64 at boot that I bisected to this
> change. I don't think it's the same as the other issues reported. I've
> confirmed the crash is still there after your follow up patches.  The
> crash itself is below[1].

Thanks for the report. I'm sorry about the delay, I was away a few days
(and hoping all the gremlins in that resource change would already have 
been finally solved but it seems my hopes were premature).

> It looks like the problem begins when:
> 
> amdgpu_device_resize_fb_bar()
>  pci_resize_resource()
>   pci_reassign_bridge_resources()
>    __pci_bus_size_bridges()
> 
> adds pci_hotplug_io_size to `realloc_head`. The io resource allocation
> has failed earlier because the root port doesn't have an io window[2].
> 
> Then with this patch, pci_reassign_bridge_resources()'s call to
> __pci_bridge_assign_resources() now returns the io added space for
> hotplug in the `failed` list where the old code dropped it and did not.
> 
> That sends pci_reassign_bridge_resources() into the `cleanup:` path,
> where I think the cleanup code doesn't properly release the resources
> that were assigned by __pci_bridge_assign_resources() and there's a
> conflict reported in pci_claim_resource() where a restored resource is
> found as conflicting with itself:
> 
> > pcieport 000d:00:01.0: bridge window [mem 0x340000000000-0x340017ffffff 64bit pref]: can't claim; address conflict with PCI Bus 000d:01 [mem 0x340000000000-0x340017ffffff 64bit pref]
> 
> Setting `pci=hpiosize=0` avoids this crash, as does this change:
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 16d5d390599a..59ece11702da 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -2442,7 +2442,7 @@ int pci_reassign_bridge_resources(struct pci_dev *bridge, unsigned long type)
>  	LIST_HEAD(saved);
>  	LIST_HEAD(added);
>  	LIST_HEAD(failed);
> -	unsigned int i;
> +	unsigned int i, relevant_fails;
>  	int ret;
>  
>  	down_read(&pci_bus_sem);
> @@ -2490,7 +2490,16 @@ int pci_reassign_bridge_resources(struct pci_dev *bridge, unsigned long type)
>  	__pci_bridge_assign_resources(bridge, &added, &failed);
>  	BUG_ON(!list_empty(&added));
>  
> -	if (!list_empty(&failed)) {
> +	relevant_fails = 0;
> +	list_for_each_entry(dev_res, &failed, list) {
> +		restore_dev_resource(dev_res);
> +		if (((dev_res->res->flags ^ type) & PCI_RES_TYPE_MASK) == 0)
> +			relevant_fails++;
> +	}
> +	free_list(&failed);
> +
> +	/* Cleanup if we had failures in resources of interest */
> +	if (relevant_fails != 0) {
>  		ret = -ENOSPC;
>  		goto cleanup;
>  	}
> @@ -2509,11 +2518,6 @@ int pci_reassign_bridge_resources(struct pci_dev *bridge, unsigned long type)
>  	return 0;
>  
>  cleanup:
> -	/* Restore size and flags */
> -	list_for_each_entry(dev_res, &failed, list)
> -		restore_dev_resource(dev_res);
> -	free_list(&failed);
> -
>  	/* Revert to the old configuration */
>  	list_for_each_entry(dev_res, &saved, list) {
>  		struct resource *res = dev_res->res;
> 
> I don't know this code well enough to know if that changes is completely
> bonkers or what.

It will take me a bit of time to assimilate all the information too, this 
code has astronomical complexity and a failure due to resize is yet 
another angle to it. I'll get back to you once I've figured more out.

> Maybe a change that gets zero as pci_hotplug_io_size
> for root ports that don't have an io window would be better?

Similar to this, I personally don't like having a non-zero IO window for 
the cases where there is no io ports required for any devices. On dual 
socket systems here, it always leads to exhausting the io port space and 
the later probed devices that would actually have io port resources will 
not get them. While those failures are not critical/important in my case, 
it seems pointless to allocate IO window just for that sake of potential 
hotplug, hotplug could just get it when there's a true need for the IO
window (and nothing bogusly reserved all io port space).

So IMO, a patch to change the default hotplug IO window to zero would 
make sense which effectively matches what you provided on the cmdline. But 
this is said before fully understanding what goes wrong on the resize 
path, it may need corrections too. But regardless of that, I think zeroing 
HP IO window is worthwhile on its own right to avoid exhausting the IO 
port space for nothing.

> Any other
> ideas, or other information about the crash that I could provide?

I suppose the actual crash itself is because the driver does make some 
invalid assumption about the resources without checking the resources 
first (I've not had time to locate and check the driver code yet).

> [1]: Crash:
> > SError Interrupt on CPU0, code 0x00000000be000411 -- SError
> > CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.15.2+ #16 PREEMPT(undef)
> > Hardware name: Adlink Ampere Altra Developer Platform/Ampere Altra Developer Platform, BIOS TianoCore 2.09.100.00 (SYS: 2.10.20221028) 04/30/2
> > Workqueue: events work_for_cpu_fn
> > pstate: 204000c9 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > pc : _raw_spin_lock_irqsave+0x34/0xb0
> > lr : __wake_up+0x30/0x80
> > sp : ffff800080003e20
> > x29: ffff800080003e20 x28: ffff07ff808b0000 x27: 0000000000050260
> > x26: 0000000000000001 x25: ffffcc5decad6bd8 x24: ffffcc5decc8b5b8
> > x23: ffff080138a10000 x22: ffff080138a00000 x21: 0000000000000003
> > x20: ffff080138a14dc8 x19: 0000000000000000 x18: 006808018e775f03
> > x17: ffff3bc1330d2000 x16: ffffcc5e3a520ed8 x15: 8daad055c6e77021
> > x14: f231631cf9328575 x13: 0e51168d06a6cf91 x12: f5db8c23b764520c
> > x11: 0000000000000040 x10: 0000000000000000 x9 : ffffcc5e3a520f08
> > x8 : 0000000000002113 x7 : 0000000000000000 x6 : 0000000000000000
> > x5 : 0000000000000000 x4 : ffff800080003e08 x3 : 00000000000000c0
> > x2 : 0000000000000001 x1 : 0000000000000000 x0 : ffff080138a14dc8
> > Kernel panic - not syncing: Asynchronous SError Interrupt
> > CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.15.2+ #16 PREEMPT(undef)
> > Hardware name: Adlink Ampere Altra Developer Platform/Ampere Altra Developer Platform, BIOS TianoCore 2.09.100.00 (SYS: 2.10.20221028) 04/30/2
> > Workqueue: events work_for_cpu_fn
> > Call trace:
> >  show_stack+0x30/0x98 (C)
> >  dump_stack_lvl+0x7c/0xa0
> >  dump_stack+0x18/0x2c
> >  panic+0x184/0x3c0
> >  nmi_panic+0x90/0xa0
> >  arm64_serror_panic+0x6c/0x90
> >  do_serror+0x58/0x60
> >  el1h_64_error_handler+0x38/0x60
> >  el1h_64_error+0x84/0x88
> >  _raw_spin_lock_irqsave+0x34/0xb0 (P)
> >  amdgpu_ih_process+0xf0/0x150 [amdgpu]
> >  amdgpu_irq_handler+0x34/0xa0 [amdgpu]
> >  __handle_irq_event_percpu+0x60/0x248
> >  handle_irq_event+0x4c/0xc0
> >  handle_fasteoi_irq+0xa0/0x1c8
> >  handle_irq_desc+0x3c/0x68
> >  generic_handle_domain_irq+0x24/0x40
> >  __gic_handle_irq_from_irqson.isra.0+0x15c/0x260
> >  gic_handle_irq+0x28/0x80
> >  call_on_irq_stack+0x24/0x30
> >  do_interrupt_handler+0x88/0xa0
> >  el1_interrupt+0x44/0xd0
> >  el1h_64_irq_handler+0x18/0x28
> >  el1h_64_irq+0x84/0x88
> >  amdgpu_device_rreg.part.0+0x4c/0x190 [amdgpu] (P)
> >  amdgpu_device_rreg+0x24/0x40 [amdgpu]
> >  psp_wait_for+0x88/0xd8 [amdgpu]
> >  psp_v11_0_bootloader_load_component+0x164/0x1b0 [amdgpu]
> >  psp_v11_0_bootloader_load_kdb+0x20/0x40 [amdgpu]
> >  psp_hw_start+0x5c/0x580 [amdgpu]
> >  psp_load_fw+0x9c/0x280 [amdgpu]
> >  psp_hw_init+0x44/0xa0 [amdgpu]
> >  amdgpu_device_fw_loading+0xf8/0x358 [amdgpu]
> >  amdgpu_device_ip_init+0x684/0x990 [amdgpu]
> >  amdgpu_device_init+0xba4/0x1038 [amdgpu]
> >  amdgpu_driver_load_kms+0x20/0xb8 [amdgpu]
> >  amdgpu_pci_probe+0x1f8/0x7f8 [amdgpu]
> >  local_pci_probe+0x44/0xb0
> >  work_for_cpu_fn+0x24/0x40
> >  process_one_work+0x17c/0x410
> >  worker_thread+0x254/0x388
> >  kthread+0x10c/0x128
> >  ret_from_fork+0x10/0x20
> > Kernel Offset: 0x4c5dba380000 from 0xffff800080000000
> > PHYS_OFFSET: 0x80000000
> > CPU features: 0x0800,000042e0,01202650,8241720b
> > Memory Limit: none
> > ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---
> 
> [2]: boot snippet
> > ACPI: PCI Root Bridge [PCI1] (domain 000d [bus 00-ff])
> > acpi PNP0A08:01: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3]
> > acpi PNP0A08:01: _OSC: platform does not support [PCIeHotplug PME LTR DPC]
> > acpi PNP0A08:01: _OSC: OS now controls [PCIeCapability]
> > acpi PNP0A08:01: FADT indicates ASPM is unsupported, using BIOS configuration
> > acpi PNP0A08:01: MCFG quirk: ECAM at [mem 0x37fff0000000-0x37ffffffffff] for [bus 00-ff] with pci_32b_read_ops
> > acpi PNP0A08:01: ECAM area [mem 0x37fff0000000-0x37ffffffffff] reserved by PNP0C02:00
> > acpi PNP0A08:01: ECAM at [mem 0x37fff0000000-0x37ffffffffff] for [bus 00-ff]
> > PCI host bridge to bus 000d:00
> > pci_bus 000d:00: root bus resource [mem 0x50000000-0x5fffffff window]
> > pci_bus 000d:00: root bus resource [mem 0x340000000000-0x37ffdfffffff window]
> > pci_bus 000d:00: root bus resource [bus 00-ff]
> > pci 000d:00:00.0: [1def:e100] type 00 class 0x060000 conventional PCI endpoint
> > pci 000d:00:01.0: [1def:e101] type 01 class 0x060400 PCIe Root Port
> > pci 000d:00:01.0: PCI bridge to [bus 01-03]
> > pci 000d:00:01.0:   bridge window [io  0xe000-0xefff]
> > pci 000d:00:01.0:   bridge window [mem 0x50000000-0x502fffff]
> > pci 000d:00:01.0:   bridge window [mem 0x340000000000-0x3400101fffff 64bit pref]
> > pci 000d:00:01.0: supports D1 D2
> > pci 000d:00:01.0: PME# supported from D0 D1 D3hot
> > ...
> > pci 000d:00:01.0: bridge window [mem 0x340000000000-0x340017ffffff 64bit pref]: assigned
> > pci 000d:00:01.0: bridge window [mem 0x50000000-0x502fffff]: assigned
> > pci 000d:00:01.0: bridge window [io  size 0x1000]: can't assign; no space
> > pci 000d:00:01.0: bridge window [io  size 0x1000]: failed to assign
> > pci 000d:00:01.0: bridge window [io  size 0x1000]: can't assign; no space
> > pci 000d:00:01.0: bridge window [io  size 0x1000]: failed to assign
> 

-- 
 i.