lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250119152940.6yum3xnrvqx2xjme@thinkpad>
Date: Sun, 19 Jan 2025 20:59:40 +0530
From: Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>
To: Bjorn Helgaas <helgaas@...nel.org>
Cc: Krishna Chaitanya Chundru <quic_krichai@...cinc.com>, rafael@...nel.org,
	ulf.hansson@...aro.org, Johan Hovold <johan@...nel.org>,
	Kevin Xie <kevin.xie@...rfivetech.com>,
	Lorenzo Pieralisi <lpieralisi@...nel.org>,
	Krzysztof Wilczyński <kw@...ux.com>,
	Rob Herring <robh@...nel.org>, Bjorn Helgaas <bhelgaas@...gle.com>,
	Markus.Elfring@....de, quic_mrana@...cinc.com,
	m.szyprowski@...sung.com, linux-pm@...r.kernel.org,
	linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
	regressions@...ts.linux.dev
Subject: Re: [PATCH v7 2/2] PCI: Enable runtime pm of the host bridge

On Tue, Jan 14, 2025 at 03:16:53PM -0600, Bjorn Helgaas wrote:
> On Mon, Jan 13, 2025 at 09:55:49PM +0530, Manivannan Sadhasivam wrote:
> > On Tue, Jan 07, 2025 at 03:27:59PM +0100, Johan Hovold wrote:
> > > On Tue, Jan 07, 2025 at 07:40:39PM +0530, Krishna Chaitanya Chundru wrote:
> > > > On 1/7/2025 6:49 PM, Johan Hovold wrote:
> > > 
> > > > >> @@ -3106,6 +3106,17 @@ int pci_host_probe(struct pci_host_bridge *bridge)
> > > > >>   		pcie_bus_configure_settings(child);
> > > > >>   
> > > > >>   	pci_bus_add_devices(bus);
> > > > >> +
> > > > >> +	/*
> > > > >> +	 * Ensure pm_runtime_enable() is called for the controller drivers,
> > > > >> +	 * before calling pci_host_probe() as pm frameworks expects if the
> > > > >> +	 * parent device supports runtime pm then it needs to enabled before
> > > > >> +	 * child runtime pm.
> > > > >> +	 */
> > > > >> +	pm_runtime_set_active(&bridge->dev);
> > > > >> +	pm_runtime_no_callbacks(&bridge->dev);
> > > > >> +	devm_pm_runtime_enable(&bridge->dev);
> > > > >> +
> > > > >>   	return 0;
> > > > >>   }
> > > > >>   EXPORT_SYMBOL_GPL(pci_host_probe);
> > > > > 
> > > > > I just noticed that this change in 6.13-rc1 is causing the
> > > > > following warning on resume from suspend on machines like the
> > > > > Lenovo ThinkPad X13s:
> > > 
> > > > Can you confirm if you are seeing this issue is seen in the
> > > > boot-up case also. As this part of the code executes only at the
> > > > boot time and will not have effect in resume from suspend.
> > > 
> > > No, I only see it during resume. And enabling runtime PM can (and
> > > in this case, obviously does) impact system suspend as well. 
> > > 
> > > > > 	pci0004:00: pcie4: Enabling runtime PM for inactive device with active children
> > > 
> > > > I believe this is not causing any functional issues.
> > > 
> > > It still needs to be fixed.
> > > 
> > > > > which may have unpopulated ports (this laptop SKU does not
> > > > > have a modem).
> > > 
> > > > Can you confirm if this warning goes away if there is some
> > > > endpoint connected to it.
> > > 
> > > I don't have anything to connect to the slot in this machine, but
> > > this seems to be the case as I do not see this warning for the
> > > populated slots, nor on the CRD reference design which has a modem
> > > on PCIe4.
> > 
> > Yes, this is only happening for unpopulated slots and the warning
> > shows up only if runtime PM is enabled for both PCI bridge and host
> > bridge. This patch enables the runtime PM for host bridge and if the
> > PCI bridge runtime PM is also enabled (only happens now for
> > ACPI/BIOS based platforms), then the warning shows up only if the
> > PCI bridge was RPM suspended (mostly happens if there was no device
> > connected) during the system wide resume time.
> > 
> > For the sake of reference, PCI host bridge is the parent of PCI
> > bridge.
> > 
> > Looking at where the warning gets triggered (in
> > pm_runtime_enable()), we have the below checks:
> > 
> > dev->power.runtime_status == RPM_SUSPENDED
> > !dev->power.ignore_children
> > atomic_read(&dev->power.child_count) > 0
> > 
> > When pm_runtime_enable() gets called for PCI host bridge:
> > 
> > dev->power.runtime_status = RPM_SUSPENDED
> > dev->power.ignore_children = 0
> > dev->power.child_count = 1
> > 
> > First 2 passes seem legit, but the issue is with the 3rd one. Here,
> > the child_count of 1 means that the PCI host bridge has an 'active'
> > child (which is the PCI bridge). The PCI bridge was supposed to be
> > RPM_SUSPENDED as the resume process should first resume the parent
> > (PCI host bridge). But this is not the case here.
> > 
> > Then looking at where the child_count gets incremented, it leads to
> > pm_runtime_set_active() of device_resume_noirq().
> > pm_runtime_set_active() is only called for a device if
> > dev_pm_skip_suspend() succeeds, which requires
> > DPM_FLAG_SMART_SUSPEND flag to be set and the device to be runtime
> > suspended.
> > 
> > This criteria matches for PCI bridge. So its status was set to
> > 'RPM_ACTIVE' even though the parent PCI host bridge was still in the
> > RPM_SUSPENDED state. I don't think this is a valid condition as seen
> > from the warning triggered for PCI host bridge when
> > pm_runtime_enable() is called from device_resume_early():
> > 
> > pci0004:00: pcie4: Enabling runtime PM for inactive device with
> > active children
> > 
> > I'm not sure of what the fix is in this case. But removing the
> > DPM_FLAG_SMART_SUSPEND flag from PCI bridge driver (portdrv) makes
> > the warning go away. This indicates that something is wrong with the
> > DPM_FLAG_SMART_SUSPEND flag handling in PM core.
> > 
> > Ulf/Rafael, thoughts?
> 
> What's the plan for this?  Does anybody have a proposal?
> 

TBH, I don't know how to fix this issue in a proper way. I need inputs from
Rafael/Ulf.

> IIUC there is no functional issue, but the new warning must be fixed,
> and it would sure be nice to do it before v6.13.  If there *is* a
> functional problem, we need to consider a revert ASAP.
> 

There is no functional problem that I'm aware of, so revert is not warranted.

- Mani

-- 
மணிவண்ணன் சதாசிவம்

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ