[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZLqwkvoOszggGhLK@lizhi-Precision-Tower-5810>
Date: Fri, 21 Jul 2023 12:21:38 -0400
From: Frank Li <Frank.li@....com>
To: Manivannan Sadhasivam <mani@...nel.org>
Cc: Shawn Lin <shawn.lin@...k-chips.com>,
Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>,
helgaas@...nel.org, imx@...ts.linux.dev, bhelgaas@...gle.com,
devicetree@...r.kernel.org, gustavo.pimentel@...opsys.com,
kw@...ux.com, leoyang.li@....com,
linux-arm-kernel@...ts.infradead.org, linux-imx@....com,
linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org,
lorenzo.pieralisi@....com, minghuan.lian@....com,
mingkai.hu@....com, robh+dt@...nel.org, roy.zang@....com,
shawnguo@...nel.org, zhiqiang.hou@....com
Subject: Re: [PATCH v3 1/2] PCI: dwc: Implement general suspend/resume
functionality for L2/L3 transitions
On Fri, Jul 21, 2023 at 09:37:55PM +0530, Manivannan Sadhasivam wrote:
> On Fri, Jul 21, 2023 at 10:10:11AM -0400, Frank Li wrote:
> > On Fri, Jul 21, 2023 at 10:09:18AM +0800, Shawn Lin wrote:
> > >
> > > On 2023/7/21 0:07, Manivannan Sadhasivam wrote:
> > > > On Thu, Jul 20, 2023 at 10:37:36AM -0400, Frank Li wrote:
> > > > > On Thu, Jul 20, 2023 at 07:55:09PM +0530, Manivannan Sadhasivam wrote:
> > > > > > On Tue, Jul 18, 2023 at 03:34:26PM +0530, Manivannan Sadhasivam wrote:
> > > > > > > On Mon, Jul 17, 2023 at 02:36:19PM -0400, Frank Li wrote:
> > > > > > > > On Mon, Jul 17, 2023 at 10:15:26PM +0530, Manivannan Sadhasivam wrote:
> > > > > > > > > On Wed, Apr 19, 2023 at 12:41:17PM -0400, Frank Li wrote:
> > > > > > > > > > Introduced helper function dw_pcie_get_ltssm to retrieve SMLH_LTSS_STATE.
> > > > > > > > > > Added API pme_turn_off and exit_from_l2 for managing L2/L3 state transitions.
> > > > > > > > > >
> > > > > > > > > > Typical L2 entry workflow:
> > > > > > > > > >
> > > > > > > > > > 1. Transmit PME turn off signal to PCI devices.
> > > > > > > > > > 2. Await link entering L2_IDLE state.
> > > > > > > > >
> > > > > > > > > AFAIK, typical workflow is to wait for PME_To_Ack.
> > > > > > > >
> > > > > > > > 1 Already wait for PME_to_ACK, 2, just wait for link actual enter L2.
> > > > > > > > I think PCI RC needs some time to set link enter L2 after get ACK from
> > > > > > > > PME.
> > > > > > > >
> > > > > >
> > > > > > One more comment. If you transition the device to L2/L3, then it can loose power
> > > > > > if Vaux was not provided. In that case, can all the devices work after resume?
> > > > > > Most notably NVMe?
> > > > >
> > > > > I have not hardware to do such test, NVMe driver will reinit everything after
> > > > > resume if no L1.1\L1.2 support. If there are L1.1\L1.2, NVME expect it leave
> > > > > at L1.2 at suspend to get better resume latency.
> > > > >
> > > >
> > > > To be precise, NVMe driver will shutdown the device if there is no ASPM support
> > > > and keep it in low power mode otherwise (there are other cases as well but we do
> > > > not need to worry).
> > > >
> > > > But here you are not checking for ASPM state in the suspend path, and just
> > > > forcing the link to be in L2/L3 (thereby D3Cold) even though NVMe driver may
> > > > expect it to be in low power state like ASPM/APST.
> > > >
> > > > So you should only put the link to L2/L3 if there is no ASPM support. Otherwise,
> > > > you'll ending up with bug reports when users connect NVMe to it.
> > > >
> > >
> > >
> > > At this topic, it's very interesting to look at
> > >
> > > drivers/pci/controller/dwc/pcie-tegra194.c
> > >
> > >
> > > static int tegra_pcie_dw_suspend_noirq(struct device *dev)
> > > {
> > > struct tegra_pcie_dw *pcie = dev_get_drvdata(dev);
> > >
> > > if (!pcie->link_state)
> > > return 0;
> > >
> > > tegra_pcie_downstream_dev_to_D0(pcie);
> > > tegra_pcie_dw_pme_turnoff(pcie);
> > > tegra_pcie_unconfig_controller(pcie);
> > >
> > > return 0;
> > > }
> > >
> > > It brings back all the downstream components to D0, as I assumed it was L0
> > > indeed, before sending PME aiming to enter L2.
> >
> > If current state is L1.1 or L1.2, hardware can auto enter to D0\L0 when
> > there are any PCI bus activity, include PME. I supposed
> > tegra_pcie_downstream_dev_to_D0() just make sure come back from L2/L3,
> > which may enter by runtime PM previously, or other reason.
> >
> > NVME ASPM problem is (at least when I debug at other platform about 1 year
> > ago):
> >
> > 1. NVME will not release MSI interrupt during suspsend.
> > 2. PCI controler enter L2 at suspned_noirq();
> > 3. CPU hot plug try to down second core (CORE1, CORE2, ...)
> > 4. GIC try to disable MSI irq by write config space.
>
> Just for the record, this will only happen during deep sleep (s2ram) where the
> CPUs are powered down (including the boot CPU).
>
> > 5. panic here because config space can't be access at L2.
> >
> > I suposed tegra should have problem when ASPM enable with NVME devices.
> >
>
> NVMe suspend issue has several faces:
>
> If NVMe is powered down during suspend, it will result in considerable power
> savings. But at the same time, the suspend should not happen too frequently as
> it may deteriorate the lifetime of the device. Most of the recent NVMe devices
> have 2M power cycles (only).
>
> We can workaround the above lifetime issue by powering down the device only
> during s2ram. It will work great for Laptop use cases if s2ram is supported.
> Unfortunately, not all Laptops support s2ram though. And if the device is
> powered down during s2idle, it will hit the above life time issue when it is
> used in Android platforms such as Tablets (even future mobile phones?) which
> doesn't support s2ram.
>
> So I'm thinking of the following options to address the issue(s):
>
> 1. Modify the NVMe driver to power down the device during s2ram (the driver can
> use the global pm_suspend_target_state flag to detect the suspend state) and use
> the same logic to put the link into L2/L3 state in the PCIe controller drivers.
> For s2idle, maintain both the device and link in low power states.
>
> 2. Get the power management decision from the userspace and use that to decide
> the power down logic in s2idle for both NVMe and PCIe drivers. This will ensure
> that the NVMe device will be powered down in suitable usecases like Laptop
> without s2ram and kept in low power states for Android platforms. But this is
> quite an involved task and I don't know if it is possible at all.
>
> I'm just dumping my thoughts here. And I plan to intiate a discussion with NVMe/
> power folks soon. Maybe during Plumbers?
Yes, it is big work. Anyway, if DWC can use common suspend/resume function,
it will be easy to be updated when a better policy applied at future.
Frank Li
>
> - Mani
>
> > Frank
> > >
> > > > - Mani
> > > >
> > > > > This API help remove duplicate codes and it can be improved gradually.
> > > > >
> > > > >
> > > > > >
> > > > > > - Mani
> > > > > >
> > > > > >
> > > > > > --
> > > > > > மணிவண்ணன் சதாசிவம்
> > > >
>
> --
> மணிவண்ணன் சதாசிவம்
Powered by blists - more mailing lists