[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <61e8c93c-d096-4807-b2dd-a22657f2e06a@samsung.com>
Date: Thu, 15 Jan 2026 12:14:49 +0100
From: Marek Szyprowski <m.szyprowski@...sung.com>
To: Brian Norris <briannorris@...omium.org>
Cc: Bjorn Helgaas <helgaas@...nel.org>, Bjorn Helgaas <bhelgaas@...gle.com>,
Lukas Wunner <lukas@...ner.de>, linux-kernel@...r.kernel.org,
linux-pci@...r.kernel.org, linux-pm@...r.kernel.org, "Rafael J . Wysocki"
<rafael@...nel.org>, Ilpo Järvinen
<ilpo.jarvinen@...ux.intel.com>
Subject: Re: [PATCH v4] PCI/PM: Prevent runtime suspend before devices are
fully initialized
Hi Brian,
On 14.01.2026 21:10, Brian Norris wrote:
> On Wed, Jan 14, 2026 at 10:46:41AM +0100, Marek Szyprowski wrote:
>> On 06.01.2026 23:27, Bjorn Helgaas wrote:
>>> On Thu, Oct 23, 2025 at 02:09:01PM -0700, Brian Norris wrote:
>>>> Today, it's possible for a PCI device to be created and
>>>> runtime-suspended before it is fully initialized. When that happens, the
>>>> device will remain in D0, but the suspend process may save an
>>>> intermediate version of that device's state -- for example, without
>>>> appropriate BAR configuration. When the device later resumes, we'll
>>>> restore invalid PCI state and the device may not function.
>>>>
>>>> Prevent runtime suspend for PCI devices by deferring pm_runtime_enable()
>>>> until we've fully initialized the device.
> ...
>> This patch landed recently in linux-next as commit c796513dc54e
>> ("PCI/PM: Prevent runtime suspend until devices are fully initialized").
>> In my tests I found that it sometimes causes the "pci 0000:01:00.0:
>> runtime PM trying to activate child device 0000:01:00.0 but parent
>> (0000:00:00.0) is not active" warning on Qualcomm Robotics RB5 board
>> (arch/arm64/boot/dts/qcom/qrb5165-rb5.dts). This in turn causes a
>> lockdep warning about console lock, but this is just a consequence of
>> the runtime pm warning. Reverting $subject patch on top of current
>> linux-next hides this warning.
>>
>> Here is a kernel log:
>>
>> pci 0000:01:00.0: [17cb:1101] type 00 class 0xff0000 PCIe Endpoint
>> pci 0000:01:00.0: BAR 0 [mem 0x00000000-0x000fffff 64bit]
>> pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
>> pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0
>> GT/s PCIe x1 link at 0000:00:00.0 (capable of 7.876 Gb/s with 8.0 GT/s
>> PCIe x1 link)
>> pci 0000:01:00.0: Adding to iommu group 13
>> pci 0000:01:00.0: ASPM: default states L0s L1
>> pcieport 0000:00:00.0: bridge window [mem 0x60400000-0x604fffff]: assigned
>> pci 0000:01:00.0: BAR 0 [mem 0x60400000-0x604fffff 64bit]: assigned
>> pci 0000:01:00.0: runtime PM trying to activate child device
>> 0000:01:00.0 but parent (0000:00:00.0) is not active
> Thanks for the report. I'll try to look at reproducing this, or at least
> getting a better mental model of exactly why this might fail (or,
> "warn") this way. But if you have the time and desire to try things out
> for me, can you give v1 a try?
>
> https://lore.kernel.org/all/20251016155335.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
>
> I'm pretty sure it would not invoke the same problem.
Right, this one works fine.
> I also suspect v3
> might not, but I'm less sure:
>
> https://lore.kernel.org/all/20251022141434.v3.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
This one too, at least I was not able to reproduce any fail.
>> ======================================================
>> WARNING: possible circular locking dependency detected
>> 6.19.0-rc1+ #16398 Not tainted
>> ------------------------------------------------------
>> kworker/3:0/33 is trying to acquire lock:
>> ffffcd182ff1ae98 (console_owner){..-.}-{0:0}, at:
>> console_lock_spinning_enable+0x44/0x78
>>
>> but task is already holding lock:
>> ffff0000835c5238 (&dev->power.lock/1){....}-{3:3}, at:
>> __pm_runtime_set_status+0x240/0x384
>>
>> which lock already depends on the new lock.
> The lockdep warning is a bit messier, and I'd also have to take some
> more time to be sure, but in principle, this sounds like a totally
> orthogonal problem. It seems like simply performing printk() to a qcom
> UART in the "wrong" context is enough to cause this. If so, that's
> definitely a console/UART bug (or maybe a lockdep false positive) and
> not a PCI/runtime-PM bug.
Yes, the lockdep warning is not really a problem, it is just a
consequence of the printing that "runtime PM trying to activate child
device 0000:01:00.0 but parent (0000:00:00.0) is not active" message.
However that message is itself a problem imho.
>> (...)
>>
>> This looks a bit similar to the issue reported some time ago on a
>> different board:
>>
>> https://lore.kernel.org/all/6d438995-4d6d-4a21-9ad2-8a0352482d44@samsung.com/
> Huh, yeah, the lockdep warning is rather similar looking. So that bug
> (whether real or false positive) may have been around a while.
>
> And the "Enabling runtime PM for inactive device with active children"
> log is similar, but it involves a different set of devices -- now we're
> dealing with the PCIe port and child device, whereas that report was
> about the host bridge/controller device.
Okay, so a bit different case. At least it confirms that the lockdep
issue is not really a problem.
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
Powered by blists - more mailing lists