[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241209143821.m4dahsaqeydluyf3@thinkpad>
Date: Mon, 9 Dec 2024 20:08:21 +0530
From: Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>
To: Christoph Hellwig <hch@....de>
Cc: Bjorn Helgaas <helgaas@...nel.org>, kbusch@...nel.org, axboe@...nel.dk,
sagi@...mberg.me, linux-nvme@...ts.infradead.org,
linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org,
andersson@...nel.org, konradybcio@...nel.org,
"Rafael J. Wysocki" <rjw@...ysocki.net>,
Ulf Hansson <ulf.hansson@...aro.org>
Subject: Re: [PATCH] nvme-pci: Shutdown the device if D3Cold is allowed by
the user
On Mon, Dec 09, 2024 at 02:36:06PM +0100, Christoph Hellwig wrote:
> On Thu, Dec 05, 2024 at 07:49:34PM -0600, Bjorn Helgaas wrote:
> > Oops, I think I got this part backwards. The patch uses PCI PM if
> > d3cold_allowed is set, and it's set by default, so it does what you
> > need for the Qualcomm platform *without* user intervention.
> >
> > But I guess using the flag allows users in other situations to force
> > use of NVMe power management by clearing d3cold_allowed via sysfs.
> > Does that mean some unspecified other platforms might only work
> > correctly with that user intervention?
>
> Still seems awkward to overload fields like this.
>
> The istory here is the the NVMe internal power states are significantly
> better for the SSDs. It avoid shutting down the SSD frequently, which
> creates a lot of extra erase cycles and reduces life time. It also
> prevents the SSD from performing maintainance operations while the host
> system is idle, which is the perfect time for them. But the idea of
> putting all periphals into D3 is gaining a lot of ground because it
> makes the platform vendors life a lot simpler at the cost of others.
No, I disagree with the last comment. When the system goes to low power mode
(like S2R/hibernate), it *does* makes a lot of sense to put the devices into
D3Cold to save power. Using NVMe or other endpoint devices internal power states
only matters when the system is idle (which is not S2R/hibernate). This is what
ACPI is doing currently.
Then one might suggest to check the suspend state using
'pm_suspend_target_state' in device drivers and decide to keep the devices in
D3Cold. Unfortunately, it won't work for Qcom platforms as most of them (except
chromebooks) don't have S2R (a.k.a PSCI_SYSTEM_SUSPEND), but only S2Idle.
When it comes to non-Qcom platforms based on devicetree, they also cannot power
down the NVMe device during suspend (as they cannot satisfy existing checks in
nvme_suspend()).
The current reality is that most of the devicetree platforms *do* want to power
down the NVMe during suspend. Only when NVMe is used in an OS like Android, we
might not want to do so (that's something for future, but still a possibility).
So this is how I ended up using the existing 'd3cold_allowed' attribute as it
allows the devices to be powered down by default and if the OS doesn't want to
do so, it can tweak the attribute from userspace (similar to UFS in Android).
The flexibility of this attribute is that it just acts as a hint for the device
drivers. If the device driver doesn't want to do D3Cold (when used as a wakeup
source etc...) it can still opt out (assuming that the platform would also honor
the wakeup capability of the device).
- Mani
--
மணிவண்ணன் சதாசிவம்
Powered by blists - more mailing lists