lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bf37b6a5-268d-4c07-a536-a826b3d5953b@nvidia.com>
Date: Fri, 23 Jan 2026 10:55:28 +0000
From: Jon Hunter <jonathanh@...dia.com>
To: Manivannan Sadhasivam <mani@...nel.org>,
 Bjorn Helgaas <helgaas@...nel.org>
Cc: manivannan.sadhasivam@....qualcomm.com,
 Bjorn Helgaas <bhelgaas@...gle.com>,
 Lorenzo Pieralisi <lpieralisi@...nel.org>,
 Krzysztof Wilczyński <kwilczynski@...nel.org>,
 Rob Herring <robh@...nel.org>, linux-pci@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-arm-msm@...r.kernel.org,
 "David E. Box" <david.e.box@...ux.intel.com>,
 Kai-Heng Feng <kai.heng.feng@...onical.com>,
 "Rafael J. Wysocki" <rafael@...nel.org>,
 Heiner Kallweit <hkallweit1@...il.com>,
 Chia-Lin Kao <acelan.kao@...onical.com>,
 "linux-tegra@...r.kernel.org" <linux-tegra@...r.kernel.org>,
 Keith Busch <kbusch@...nel.org>, Jens Axboe <axboe@...nel.dk>,
 Christoph Hellwig <hch@....de>, Sagi Grimberg <sagi@...mberg.me>,
 linux-nvme@...ts.infradead.org
Subject: Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states
 set by BIOS for devicetree platforms


On 22/01/2026 19:14, Jon Hunter wrote:

...

>> I think what is going on here is that since before commits 
>> f3ac2ff14834 and
>> df5192d9bb0e, !pcie_aspm_enabled() check was passing as ASPM was not 
>> enabled for
>> the device (and upstream port) and after those commits, this check is not
>> passing and the NVMe driver is not shutting down the controller and 
>> expects the
>> link to be in L0/L1ss. But the Tegra controller driver initiates L2/L3
>> transition, and also turns off the device. So all the NVMe context is 
>> lost
>> during suspend and while resuming, the NVMe driver got confused due to 
>> lost
>> context.
>>
>> Jon, could you please try the below hack and see if it fixes the issue?
>>
>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>> index 0e4caeab739c..4b8d261117f5 100644
>> --- a/drivers/nvme/host/pci.c
>> +++ b/drivers/nvme/host/pci.c
>> @@ -3723,7 +3723,7 @@ static int nvme_suspend(struct device *dev)
>>           * state (which may not be possible if the link is up).
>>           */
>>          if (pm_suspend_via_firmware() || !ctrl->npss ||
>> -           !pcie_aspm_enabled(pdev) ||
>> +           pcie_aspm_enabled(pdev) ||
>>              (ndev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND))
>>                  return nvme_disable_prepare_reset(ndev, true);
>> This will confirm whether the issue is due to Tegra controller driver 
>> breaking
>> the NVMe driver assumption or not.
> 
> Yes that appears to be working! I will test some more boards to confirm.

So yes with the above all boards appear to be working fine.

How is this usually coordinated between the NVMe driver and Host 
controller driver? It is not clear to me exactly where the problem is 
and if the NVMe is not shutting down, then what should be preventing the 
Host controller from shutting down.

Thanks
Jon

-- 
nvpublic


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ