linux-kernel - Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5z7c25nkb35prvax6vq6ud7eaeuhzsswbf7fqvmlgys3xftgwb@odocboejrdrv>
Date: Thu, 22 Jan 2026 22:31:50 +0530
From: Manivannan Sadhasivam <mani@...nel.org>
To: Bjorn Helgaas <helgaas@...nel.org>, Jon Hunter <jonathanh@...dia.com>
Cc: manivannan.sadhasivam@....qualcomm.com, 
	Bjorn Helgaas <bhelgaas@...gle.com>, Lorenzo Pieralisi <lpieralisi@...nel.org>, 
	Krzysztof Wilczyński <kwilczynski@...nel.org>, Rob Herring <robh@...nel.org>, linux-pci@...r.kernel.org, 
	linux-kernel@...r.kernel.org, linux-arm-msm@...r.kernel.org, 
	"David E. Box" <david.e.box@...ux.intel.com>, Kai-Heng Feng <kai.heng.feng@...onical.com>, 
	"Rafael J. Wysocki" <rafael@...nel.org>, Heiner Kallweit <hkallweit1@...il.com>, 
	Chia-Lin Kao <acelan.kao@...onical.com>, "linux-tegra@...r.kernel.org" <linux-tegra@...r.kernel.org>, 
	Keith Busch <kbusch@...nel.org>, Jens Axboe <axboe@...nel.dk>, Christoph Hellwig <hch@....de>, 
	Sagi Grimberg <sagi@...mberg.me>, linux-nvme@...ts.infradead.org
Subject: Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states
 set by BIOS for devicetree platforms

On Thu, Jan 22, 2026 at 09:29:03AM -0600, Bjorn Helgaas wrote:
> [+cc NVMe folks]
> 
> On Thu, Jan 22, 2026 at 12:12:42PM +0000, Jon Hunter wrote:
> > ...
> 
> > Since this commit was added in Linux v6.18, I have been observing a suspend
> > test failures on some of our boards. The suspend test suspends the devices
> > for 20 secs and before this change the board would resume in about ~27 secs
> > (including the 20 sec sleep). After this change the board would take over 80
> > secs to resume and this triggered a failure.
> > 
> > Looking at the logs, I can see it is the NVMe device on the board that is
> > having an issue, and I see the reset failing ...
> > 
> >  [  945.754939] r8169 0007:01:00.0 enP7p1s0: Link is Up - 1Gbps/Full -
> >   flow control rx/tx
> >  [ 1002.467432] nvme nvme0: I/O tag 12 (400c) opcode 0x9 (Admin Cmd) QID
> >   0 timeout, reset controller
> >  [ 1002.493713] nvme nvme0: 12/0/0 default/read/poll queues
> >  [ 1003.050448] nvme nvme0: ctrl state 1 is not RESETTING
> >  [ 1003.050481] OOM killer enabled.
> >  [ 1003.054035] nvme nvme0: Disabling device after reset failure: -19
> > 
> > From the above timestamps the delay is coming from the NVMe. I see this
> > issue on several boards with different NVMe devices and I can workaround
> > this by disabling ASPM L0/L1 for these devices ...
> > 
> >  DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5011, quirk_disable_aspm_l0s_l1);
> >  DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5036, quirk_disable_aspm_l0s_l1);
> >  DECLARE_PCI_FIXUP_HEADER(0x1b4b, 0x1322, quirk_disable_aspm_l0s_l1);
> >  DECLARE_PCI_FIXUP_HEADER(0xc0a9, 0x540a, quirk_disable_aspm_l0s_l1);
> > 
> > I am curious if you have seen any similar issues?
> > 
> > Other PCIe devices seem to be OK (like the realtek r8169) but just
> > the NVMe is having issues. So I am trying to figure out the best way
> > to resolve this?
> 
> For context, "this commit" refers to f3ac2ff14834, modified by
> df5192d9bb0e:
> 
>   f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree platforms")
>   df5192d9bb0e ("PCI/ASPM: Enable only L0s and L1 for devicetree platforms")
> 
> The fact that this suspend issue only affects NVMe reminds me of the
> code in dw_pcie_suspend_noirq() [1] that bails out early if L1 is
> enabled because of some NVMe expectation:
> 
>   dw_pcie_suspend_noirq()
>   {
>     ...
>     /*
>      * If L1SS is supported, then do not put the link into L2 as some
>      * devices such as NVMe expect low resume latency.
>      */
>     if (dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKCTL) & PCI_EXP_LNKCTL_ASPM_L1)
>       return 0;
>     ...
> 
> That suggests there's some NVMe/ASPM interaction that the PCI core
> doesn't understand yet.
> 

We have this check in place since NVMe driver keeps the device in D0 and expects
the link to be in L1ss on platforms not passing below checks:

        if (pm_suspend_via_firmware() || !ctrl->npss ||
            !pcie_aspm_enabled(pdev) ||
            (ndev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND))

Since the majority of the DWC platforms do not pass the above checks, we don't
transition the device to D3Cold or link to L2/L3 in dw_pcie_suspend_noirq() if
the link is in L1ss. Though I think we should be checking for D0 state instead
of L1ss here.

I think what is going on here is that since before commits f3ac2ff14834 and
df5192d9bb0e, !pcie_aspm_enabled() check was passing as ASPM was not enabled for
the device (and upstream port) and after those commits, this check is not
passing and the NVMe driver is not shutting down the controller and expects the
link to be in L0/L1ss. But the Tegra controller driver initiates L2/L3
transition, and also turns off the device. So all the NVMe context is lost
during suspend and while resuming, the NVMe driver got confused due to lost
context.

Jon, could you please try the below hack and see if it fixes the issue?

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 0e4caeab739c..4b8d261117f5 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -3723,7 +3723,7 @@ static int nvme_suspend(struct device *dev)
         * state (which may not be possible if the link is up).
         */
        if (pm_suspend_via_firmware() || !ctrl->npss ||
-           !pcie_aspm_enabled(pdev) ||
+           pcie_aspm_enabled(pdev) ||
            (ndev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND))
                return nvme_disable_prepare_reset(ndev, true);
 
This will confirm whether the issue is due to Tegra controller driver breaking
the NVMe driver assumption or not.

- Mani

-- 
மணிவண்ணன் சதாசிவம்