[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <86a58291-b7f7-477d-89b5-39690b9ef371@linaro.org>
Date: Mon, 2 Jun 2025 13:32:34 +0100
From: Tudor Ambarus <tudor.ambarus@...aro.org>
To: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@...gle.com>, linux-pci@...r.kernel.org,
Michał Winiarski <michal.winiarski@...el.com>,
Igor Mammedov <imammedo@...hat.com>, LKML <linux-kernel@...r.kernel.org>,
Mika Westerberg <mika.westerberg@...ux.intel.com>,
William McVicker <willmcvicker@...gle.com>
Subject: Re: [PATCH 24/25] PCI: Perform reset_resource() and build fail list
in sync
On 5/30/25 7:38 AM, Ilpo Järvinen wrote:
cut
>>>> Reverting the following patches fixes the problem:
>>>> a34d74877c66 PCI: Restore assigned resources fully after release
>>>> 2499f5348431 PCI: Rework optional resource handling
>>>> 96336ec70264 PCI: Perform reset_resource() and build fail list in sync
>>>
>>> So it's confirmed that you needed to revert also this last commit
>>> 96336ec70264, not just the rework change?
>>
>> I needed to revert 96336ec70264 as well otherwise the build fails.
>
> Hi again,
Hi!
cut
>
> The missing helper is basically this:
cut
I used the following:
+static bool pci_resource_is_disabled_rom(const struct resource *res,
int resno)
+{
+ return resno == PCI_ROM_RESOURCE && !(res->flags &
IORESOURCE_ROM_ENABLE);
+}
>
> Because of this, the actual culprit could be in 2499f5348431, not it
> 96336ec70264 (which would make more sense as it does significant rework
> on the assignment algorithm).
I confirm with the above that the problem is in 2499f5348431 indeed.
cut
>> I added the suggested prints
>> (https://paste.ofcode.org/DgmZGGgS6D36nWEzmfCqMm) on top of v6.15 with
>> the downstream PCIe pixel driver and I obtain the following. Note that
>> all added prints contain "tudor" for differentiation.
>>
>> [ 15.211179][ T1107] pci 0001:01:00.0: [144d:a5a5] type 00 class
>> 0x000000 PCIe Endpoint
>> [ 15.212248][ T1107] pci 0001:01:00.0: BAR 0 [mem
>> 0x00000000-0x000fffff 64bit]
>> [ 15.212775][ T1107] pci 0001:01:00.0: ROM [mem 0x00000000-0x0000ffff
>> pref]
>> [ 15.213195][ T1107] pci 0001:01:00.0: enabling Extended Tags
>> [ 15.213720][ T1107] pci 0001:01:00.0: PME# supported from D0 D3hot
>> D3cold
>> [ 15.214035][ T1107] pci 0001:01:00.0: 15.752 Gb/s available PCIe
>> bandwidth, limited by 8.0 GT/s PCIe x2 link at 0001:00:00.0 (capable of
>> 31.506 Gb/s with 16.0 GT/s PCIe x2 link)
>> [ 15.222286][ T1107] pci 0001:01:00.0: tudor: 1: pbus_size_mem: BAR 0
>> [mem 0x00000000-0x000fffff 64bit] list empty? 1
>> [ 15.222813][ T1107] pci 0001:01:00.0: tudor: 1: pbus_size_mem: ROM
>> [mem 0x00000000-0x0000ffff pref] list empty? 1
>> [ 15.224429][ T1107] pci 0001:01:00.0: tudor: 2: pbus_size_mem: ROM
>> [mem 0x00000000-0x0000ffff pref] list empty? 0
>> [ 15.224750][ T1107] pcieport 0001:00:00.0: bridge window [mem
>> 0x00100000-0x001fffff] to [bus 01-ff] add_size 100000 add_align 100000
>>
>> [ 15.225393][ T1107] tudor : pci_assign_unassigned_bus_resources:
>> before __pci_bus_assign_resources -> list empty? 0
>> [ 15.225594][ T1107] pcieport 0001:00:00.0: tudor:
>> pdev_sort_resources: bridge window [mem 0x00100000-0x001fffff] resource
>> added in head list
>> [ 15.226078][ T1107] pcieport 0001:00:00.0: bridge window [mem
>> 0x40000000-0x401fffff]: assigned
>
> So here it ends up assigning the resource here I think.
>
>
> That print isn't one of yours in reassign_resources_sorted() so the
> assignment must have been made in assign_requested_resources_sorted(). But
> then nothing is printed out from reassign_resources_sorted() so I suspect
> __assign_resources_sorted() has short-circuited.
>
> We know that realloc_head is not empty, so that leaves the goto out from
> if (list_empty(&local_fail_head)), which kind of makes sense, all
> entries on the head list were assigned. But the code there tries to remove
> all head list resources from realloc_head so why it doesn't get removed is
> still a mystery. assign_requested_resources_sorted() doesn't seem to
> remove anything from the head list so that resource should still be on the
> head list AFAICT so it should call that remove_from_list(realloc_head,
> dev_res->res) for it.
>
> So can you see if that theory holds water and it short-circuits without
> removing the entry from realloc_head?
>
cut. I saw your other reply. Will check a bit both and respond there
directly.
>>
>>> In any case, that BUG_ON() seems a bit drastic action for what might be
>>> just a single resource allocation failure so it should be downgraded to:
>>>
>>> if (WARN_ON(!list_empty(&add_list))
>>> free_list(&add_list);
>>>
>>> ... or WARN_ON_ONCE().
>>
>> I saw your patch doing this, the phone now boots, but obviously I still
>> see the WARN, so maybe there's still something to be fixed.
>
cut
> Now that it boots, can you please check if /proc/iomem is the same both in
> the non-working and working config. If that resource got assigned
> successfully, it might well be there is no actual differences in the
> assigned resources (which again doesn't mean there wouldn't be a bug in
> the logic as discussed above).
I confirm /proc/iomem is identical when comparing the no revert and the
WARN_ON_ONCE() case, and when reverting the blamed commit case.
Powered by blists - more mailing lists