linux-kernel - Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e44c330b-778b-4fa1-b678-fa909cc05b03@nvidia.com>
Date: Thu, 22 Jan 2026 19:14:04 +0000
From: Jon Hunter <jonathanh@...dia.com>
To: Manivannan Sadhasivam <mani@...nel.org>,
 Bjorn Helgaas <helgaas@...nel.org>
Cc: manivannan.sadhasivam@....qualcomm.com,
 Bjorn Helgaas <bhelgaas@...gle.com>,
 Lorenzo Pieralisi <lpieralisi@...nel.org>,
 Krzysztof Wilczyński <kwilczynski@...nel.org>,
 Rob Herring <robh@...nel.org>, linux-pci@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-arm-msm@...r.kernel.org,
 "David E. Box" <david.e.box@...ux.intel.com>,
 Kai-Heng Feng <kai.heng.feng@...onical.com>,
 "Rafael J. Wysocki" <rafael@...nel.org>,
 Heiner Kallweit <hkallweit1@...il.com>,
 Chia-Lin Kao <acelan.kao@...onical.com>,
 "linux-tegra@...r.kernel.org" <linux-tegra@...r.kernel.org>,
 Keith Busch <kbusch@...nel.org>, Jens Axboe <axboe@...nel.dk>,
 Christoph Hellwig <hch@....de>, Sagi Grimberg <sagi@...mberg.me>,
 linux-nvme@...ts.infradead.org
Subject: Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states
 set by BIOS for devicetree platforms


On 22/01/2026 17:01, Manivannan Sadhasivam wrote:
> On Thu, Jan 22, 2026 at 09:29:03AM -0600, Bjorn Helgaas wrote:
>> [+cc NVMe folks]
>>
>> On Thu, Jan 22, 2026 at 12:12:42PM +0000, Jon Hunter wrote:
>>> ...
>>
>>> Since this commit was added in Linux v6.18, I have been observing a suspend
>>> test failures on some of our boards. The suspend test suspends the devices
>>> for 20 secs and before this change the board would resume in about ~27 secs
>>> (including the 20 sec sleep). After this change the board would take over 80
>>> secs to resume and this triggered a failure.
>>>
>>> Looking at the logs, I can see it is the NVMe device on the board that is
>>> having an issue, and I see the reset failing ...
>>>
>>>   [  945.754939] r8169 0007:01:00.0 enP7p1s0: Link is Up - 1Gbps/Full -
>>>    flow control rx/tx
>>>   [ 1002.467432] nvme nvme0: I/O tag 12 (400c) opcode 0x9 (Admin Cmd) QID
>>>    0 timeout, reset controller
>>>   [ 1002.493713] nvme nvme0: 12/0/0 default/read/poll queues
>>>   [ 1003.050448] nvme nvme0: ctrl state 1 is not RESETTING
>>>   [ 1003.050481] OOM killer enabled.
>>>   [ 1003.054035] nvme nvme0: Disabling device after reset failure: -19
>>>
>>>  From the above timestamps the delay is coming from the NVMe. I see this
>>> issue on several boards with different NVMe devices and I can workaround
>>> this by disabling ASPM L0/L1 for these devices ...
>>>
>>>   DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5011, quirk_disable_aspm_l0s_l1);
>>>   DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5036, quirk_disable_aspm_l0s_l1);
>>>   DECLARE_PCI_FIXUP_HEADER(0x1b4b, 0x1322, quirk_disable_aspm_l0s_l1);
>>>   DECLARE_PCI_FIXUP_HEADER(0xc0a9, 0x540a, quirk_disable_aspm_l0s_l1);
>>>
>>> I am curious if you have seen any similar issues?
>>>
>>> Other PCIe devices seem to be OK (like the realtek r8169) but just
>>> the NVMe is having issues. So I am trying to figure out the best way
>>> to resolve this?
>>
>> For context, "this commit" refers to f3ac2ff14834, modified by
>> df5192d9bb0e:
>>
>>    f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree platforms")
>>    df5192d9bb0e ("PCI/ASPM: Enable only L0s and L1 for devicetree platforms")
>>
>> The fact that this suspend issue only affects NVMe reminds me of the
>> code in dw_pcie_suspend_noirq() [1] that bails out early if L1 is
>> enabled because of some NVMe expectation:
>>
>>    dw_pcie_suspend_noirq()
>>    {
>>      ...
>>      /*
>>       * If L1SS is supported, then do not put the link into L2 as some
>>       * devices such as NVMe expect low resume latency.
>>       */
>>      if (dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKCTL) & PCI_EXP_LNKCTL_ASPM_L1)
>>        return 0;
>>      ...
>>
>> That suggests there's some NVMe/ASPM interaction that the PCI core
>> doesn't understand yet.
>>
> 
> We have this check in place since NVMe driver keeps the device in D0 and expects
> the link to be in L1ss on platforms not passing below checks:
> 
>          if (pm_suspend_via_firmware() || !ctrl->npss ||
>              !pcie_aspm_enabled(pdev) ||
>              (ndev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND))
> 
> Since the majority of the DWC platforms do not pass the above checks, we don't
> transition the device to D3Cold or link to L2/L3 in dw_pcie_suspend_noirq() if
> the link is in L1ss. Though I think we should be checking for D0 state instead
> of L1ss here.
> 
> I think what is going on here is that since before commits f3ac2ff14834 and
> df5192d9bb0e, !pcie_aspm_enabled() check was passing as ASPM was not enabled for
> the device (and upstream port) and after those commits, this check is not
> passing and the NVMe driver is not shutting down the controller and expects the
> link to be in L0/L1ss. But the Tegra controller driver initiates L2/L3
> transition, and also turns off the device. So all the NVMe context is lost
> during suspend and while resuming, the NVMe driver got confused due to lost
> context.
> 
> Jon, could you please try the below hack and see if it fixes the issue?
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 0e4caeab739c..4b8d261117f5 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -3723,7 +3723,7 @@ static int nvme_suspend(struct device *dev)
>           * state (which may not be possible if the link is up).
>           */
>          if (pm_suspend_via_firmware() || !ctrl->npss ||
> -           !pcie_aspm_enabled(pdev) ||
> +           pcie_aspm_enabled(pdev) ||
>              (ndev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND))
>                  return nvme_disable_prepare_reset(ndev, true);
>   
> This will confirm whether the issue is due to Tegra controller driver breaking
> the NVMe driver assumption or not.

Yes that appears to be working! I will test some more boards to confirm.

Cheers
Jon

-- 
nvpublic