lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <8f281667-b4ef-9385-868f-93893b9d6611@linux.intel.com>
Date: Tue, 6 May 2025 18:53:56 +0300 (EEST)
From: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
To: Tudor Ambarus <tudor.ambarus@...aro.org>
cc: Bjorn Helgaas <bhelgaas@...gle.com>, linux-pci@...r.kernel.org, 
    Michał Winiarski <michal.winiarski@...el.com>, 
    Igor Mammedov <imammedo@...hat.com>, LKML <linux-kernel@...r.kernel.org>, 
    Mika Westerberg <mika.westerberg@...ux.intel.com>, 
    William McVicker <willmcvicker@...gle.com>
Subject: Re: [PATCH 24/25] PCI: Perform reset_resource() and build fail list
 in sync

On Tue, 6 May 2025, Tudor Ambarus wrote:

> Hi!
> 
> On 12/16/24 5:56 PM, Ilpo Järvinen wrote:
> > Resetting resource is problematic as it prevent attempting to allocate
> > the resource later, unless something in between restores the resource.
> > Similarly, if fail_head does not contain all resources that were reset,
> > those resource cannot be restored later.
> > 
> > The entire reset/restore cycle adds complexity and leaving resources
> > into reseted state causes issues to other code such as for checks done
> > in pci_enable_resources(). Take a small step towards not resetting
> > resources by delaying reset until the end of resource assignment and
> > build failure list (fail_head) in sync with the reset to avoid leaving
> > behind resources that cannot be restored (for the case where the caller
> > provides fail_head in the first place to allow restore somewhere in the
> > callchain, as is not all callers pass non-NULL fail_head).
> > 
> > The Expansion ROM check is temporarily left in place while building the
> > failure list until the upcoming change which reworks optional resource
> > handling.
> > 
> > Ideally, whole resource reset could be removed but doing that in a big
> > step would make the impact non-tractable due to complexity of all
> > related code.
> > 
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
> 
> I'm hitting the BUG_ON(!list_empty(&add_list)); in
> pci_assign_unassigned_bus_resources() [1] with 6.15-rc5 and the the
> pixel6 downstream pcie driver.
> 
> I saw the thread where "a34d74877c66 PCI: Restore assigned resources
> fully after release" fixes things for some other cases, but it's not the
> case here.
> 
> Reverting the following patches fixes the problem:
> a34d74877c66 PCI: Restore assigned resources fully after release
> 2499f5348431 PCI: Rework optional resource handling
> 96336ec70264 PCI: Perform reset_resource() and build fail list in sync

So it's confirmed that you needed to revert also this last commit 
96336ec70264, not just the rework change?

> In the working case the add_list list is empty throughout the entire
> body of pci_assign_unassigned_bus_resources().
> 
> In the failing case __pci_bus_size_bridges() leaves the add_list not
> empty and __pci_bus_assign_resources() does not consume the list, thus
> the BUG_ON. The failing case contains an extra print that's not shown
> when reverting the blamed commits:
> [   13.951185][ T1101] pcieport 0000:00:00.0: bridge window [mem
> 0x00100000-0x001fffff] to [bus 01-ff] add_size 100000 add_align 100000
> 
> I've added some prints trying to describe the code path, see
> https://paste.ofcode.org/Aeu2YBpLztc49ZDw3uUJmd#
> 
> Failing case:
> [   13.944231][ T1101] pci 0000:01:00.0: [144d:a5a5] type 00 class
> 0x000000 PCIe Endpoint
> [   13.944412][ T1101] pci 0000:01:00.0: BAR 0 [mem
> 0x00000000-0x000fffff 64bit]
> [   13.944532][ T1101] pci 0000:01:00.0: ROM [mem 0x00000000-0x0000ffff
> pref]
> [   13.944649][ T1101] pci 0000:01:00.0: enabling Extended Tags
> [   13.944844][ T1101] pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
> [   13.945015][ T1101] pci 0000:01:00.0: 15.752 Gb/s available PCIe
> bandwidth, limited by 8.0 GT/s PCIe x2 link at 0000:00:00.0 (capable of
> 31.506 Gb/s with 16.0 GT/s PCIe x2 link)
> [   13.950616][ T1101] __pci_bus_size_bridges: before pbus_size_mem.
> list empty? 1
> [   13.950784][ T1101] pbus_size_mem: 2. list empty? 1
> [   13.950886][ T1101] pbus_size_mem: 1 list empty? 0
> [   13.950982][ T1101] pbus_size_mem: 3. list empty? 0
> [   13.951082][ T1101] pbus_size_mem: 4. list empty? 0
> [   13.951185][ T1101] pcieport 0000:00:00.0: bridge window [mem
> 0x00100000-0x001fffff] to [bus 01-ff] add_size 100000 add_align 100000
> [   13.951448][ T1101] __pci_bus_size_bridges: after pbus_size_mem. list
> empty? 0
> [   13.951643][ T1101] pci_assign_unassigned_bus_resources: before
> __pci_bus_assign_resources -> list empty? 0
> [   13.951924][ T1101] pcieport 0000:00:00.0: bridge window [mem
> 0x40000000-0x401fffff]: assigned
> [   13.952248][ T1101] pci_assign_unassigned_bus_resources: after
> __pci_bus_assign_resources -> list empty? 0
> [   13.952634][ T1101] ------------[ cut here ]------------
> [   13.952818][ T1101] kernel BUG at drivers/pci/setup-bus.c:2514!
> [   13.953045][ T1101] Internal error: Oops - BUG: 00000000f2000800 [#1]
>  SMP
> ...
> [   13.976086][ T1101] Call trace:
> [   13.976206][ T1101]  pci_assign_unassigned_bus_resources+0x110/0x114 (P)
> [   13.976462][ T1101]  pci_rescan_bus+0x28/0x48
> [   13.976628][ T1101]  exynos_pcie_rc_poweron
> 
> Working case:
> [   13.786961][ T1120] pci 0000:01:00.0: [144d:a5a5] type 00 class
> 0x000000 PCIe Endpoint
> [   13.787136][ T1120] pci 0000:01:00.0: BAR 0 [mem
> 0x00000000-0x000fffff 64bit]
> [   13.787280][ T1120] pci 0000:01:00.0: ROM [mem 0x00000000-0x0000ffff
> pref]
> [   13.787541][ T1120] pci 0000:01:00.0: enabling Extended Tags
> [   13.787808][ T1120] pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
> [   13.787988][ T1120] pci 0000:01:00.0: 15.752 Gb/s available PCIe
> bandwidth, limited by 8.0 GT/s PCIe x2 link at 0000:00:00.0 (capable of
> 31.506 Gb/s with 16.0 GT/s PCIe x2 link)
> [   13.795279][ T1120] __pci_bus_size_bridges: before pbus_size_mem.
> list empty? 1
> [   13.795408][ T1120] pbus_size_mem: 2. list empty? 1
> [   13.795495][ T1120] pbus_size_mem: 2. list empty? 1
> [   13.795577][ T1120] __pci_bus_size_bridges: after pbus_size_mem. list
> empty? 1
> [   13.795692][ T1120] pci_assign_unassigned_bus_resources: before
> __pci_bus_assign_resources -> list empty? 1
> [   13.795849][ T1120] pcieport 0000:00:00.0: bridge window [mem
> 0x40000000-0x401fffff]: assigned
> [   13.796072][ T1120] pci_assign_unassigned_bus_resources: after
> __pci_bus_assign_resources -> list empty? 1
> [   13.796662][ T1120] cpif: s5100_poweron_pcie: DBG: MSI sfr not set
> up, yet(s5100_pdev is NULL)
> [   13.796666][ T1120] cpif: register_pcie: s51xx_pcie_init start
> 
> 
> Any hints are welcomed. Thanks,
> ta

Hi and thanks for the report.

The interesting part occurs inside reassign_resources_sorted() where most 
items are eliminated from realloc_head by the list_del().

My guess is that somehow, the change in 96336ec70264 from !res->flags
to the more complicated check somehow causes this. If the new check 
doesn't match and subsequently, no match is found from the head list, the 
loop will do continue and not remove the entry from realloc_head.

But it's hard to confirm without knowing what that resources realloc_head 
contains. Perhaps if you print the resources that are processed around 
that part of the code in reassign_resources_sorted(), comparing the log 
from the reverted code with the non-working case might help to understand 
what is different there and why. To understand better what is in the head 
list, it would be also useful to know from which device the resources were 
added into the head list in pdev_sort_resources().


In any case, that BUG_ON() seems a bit drastic action for what might be 
just a single resource allocation failure so it should be downgraded to:

if (WARN_ON(!list_empty(&add_list))
	free_list(&add_list);
	
... or WARN_ON_ONCE().

-- 
 i.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ