[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bf37b6a5-268d-4c07-a536-a826b3d5953b@nvidia.com>
Date: Fri, 23 Jan 2026 10:55:28 +0000
From: Jon Hunter <jonathanh@...dia.com>
To: Manivannan Sadhasivam <mani@...nel.org>,
Bjorn Helgaas <helgaas@...nel.org>
Cc: manivannan.sadhasivam@....qualcomm.com,
Bjorn Helgaas <bhelgaas@...gle.com>,
Lorenzo Pieralisi <lpieralisi@...nel.org>,
Krzysztof Wilczyński <kwilczynski@...nel.org>,
Rob Herring <robh@...nel.org>, linux-pci@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-arm-msm@...r.kernel.org,
"David E. Box" <david.e.box@...ux.intel.com>,
Kai-Heng Feng <kai.heng.feng@...onical.com>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Heiner Kallweit <hkallweit1@...il.com>,
Chia-Lin Kao <acelan.kao@...onical.com>,
"linux-tegra@...r.kernel.org" <linux-tegra@...r.kernel.org>,
Keith Busch <kbusch@...nel.org>, Jens Axboe <axboe@...nel.dk>,
Christoph Hellwig <hch@....de>, Sagi Grimberg <sagi@...mberg.me>,
linux-nvme@...ts.infradead.org
Subject: Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states
set by BIOS for devicetree platforms
On 22/01/2026 19:14, Jon Hunter wrote:
...
>> I think what is going on here is that since before commits
>> f3ac2ff14834 and
>> df5192d9bb0e, !pcie_aspm_enabled() check was passing as ASPM was not
>> enabled for
>> the device (and upstream port) and after those commits, this check is not
>> passing and the NVMe driver is not shutting down the controller and
>> expects the
>> link to be in L0/L1ss. But the Tegra controller driver initiates L2/L3
>> transition, and also turns off the device. So all the NVMe context is
>> lost
>> during suspend and while resuming, the NVMe driver got confused due to
>> lost
>> context.
>>
>> Jon, could you please try the below hack and see if it fixes the issue?
>>
>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>> index 0e4caeab739c..4b8d261117f5 100644
>> --- a/drivers/nvme/host/pci.c
>> +++ b/drivers/nvme/host/pci.c
>> @@ -3723,7 +3723,7 @@ static int nvme_suspend(struct device *dev)
>> * state (which may not be possible if the link is up).
>> */
>> if (pm_suspend_via_firmware() || !ctrl->npss ||
>> - !pcie_aspm_enabled(pdev) ||
>> + pcie_aspm_enabled(pdev) ||
>> (ndev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND))
>> return nvme_disable_prepare_reset(ndev, true);
>> This will confirm whether the issue is due to Tegra controller driver
>> breaking
>> the NVMe driver assumption or not.
>
> Yes that appears to be working! I will test some more boards to confirm.
So yes with the above all boards appear to be working fine.
How is this usually coordinated between the NVMe driver and Host
controller driver? It is not clear to me exactly where the problem is
and if the NVMe is not shutting down, then what should be preventing the
Host controller from shutting down.
Thanks
Jon
--
nvpublic
Powered by blists - more mailing lists