lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <54E4999C-DBE8-4FC1-8E82-17FEDFDA9CA6@canonical.com>
Date:   Fri, 10 May 2019 23:15:05 +0800
From:   Kai Heng Feng <kai.heng.feng@...onical.com>
To:     "Rafael J. Wysocki" <rafael@...nel.org>
Cc:     Mario Limonciello <Mario.Limonciello@...l.com>,
        Keith Busch <kbusch@...nel.org>,
        Christoph Hellwig <hch@....de>, Jens Axboe <axboe@...com>,
        Sagi Grimberg <sagi@...mberg.me>,
        Linux PM <linux-pm@...r.kernel.org>,
        Rafael Wysocki <rafael.j.wysocki@...el.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-nvme <linux-nvme@...ts.infradead.org>,
        Keith Busch <keith.busch@...el.com>
Subject: Re: [PATCH] nvme-pci: Use non-operational power state instead of D3
 on Suspend-to-Idle



> On May 10, 2019, at 4:23 PM, Rafael J. Wysocki <rafael@...nel.org> wrote:
> 
> On Fri, May 10, 2019 at 8:08 AM Kai-Heng Feng
> <kai.heng.feng@...onical.com> wrote:
>> 
>> at 06:19, <Mario.Limonciello@...l.com> <Mario.Limonciello@...l.com> wrote:
>> 
>>>> -----Original Message-----
>>>> From: Keith Busch <kbusch@...nel.org>
>>>> Sent: Thursday, May 9, 2019 4:54 PM
>>>> To: Limonciello, Mario
>>>> Cc: kai.heng.feng@...onical.com; hch@....de; axboe@...com;
>>>> sagi@...mberg.me; rafael@...nel.org; linux-pm@...r.kernel.org;
>>>> rafael.j.wysocki@...el.com; linux-kernel@...r.kernel.org; linux-
>>>> nvme@...ts.infradead.org; keith.busch@...el.com
>>>> Subject: Re: [PATCH] nvme-pci: Use non-operational power state instead
>>>> of D3 on
>>>> Suspend-to-Idle
>>>> 
>>>> 
>>>> [EXTERNAL EMAIL]
>>>> 
>>>> On Thu, May 09, 2019 at 09:37:58PM +0000, Mario.Limonciello@...l.com
>>>> wrote:
>>>>>> +int nvme_set_power(struct nvme_ctrl *ctrl, unsigned npss)
>>>>>> +{
>>>>>> +  int ret;
>>>>>> +
>>>>>> +  mutex_lock(&ctrl->scan_lock);
>>>>>> +  nvme_start_freeze(ctrl);
>>>>>> +  nvme_wait_freeze(ctrl);
>>>>>> +  ret = nvme_set_features(ctrl, NVME_FEAT_POWER_MGMT, npss, NULL, 0,
>>>>>> +                          NULL);
>>>>>> +  nvme_unfreeze(ctrl);
>>>>>> +  mutex_unlock(&ctrl->scan_lock);
>>>>>> +
>>>>>> +  return ret;
>>>>>> +}
>>>>>> +EXPORT_SYMBOL_GPL(nvme_set_power);
>>>>> 
>>>>> I believe without memory barriers at the end disks with HMB this will
>>>>> still kernel panic (Such as Toshiba BG3).
>>>> 
>>>> Well, the mutex has an implied memory barrier, but your HMB explanation
>>>> doesn't make much sense to me anyway. The "mb()" in this thread's original
>>>> patch is a CPU memory barrier, and the CPU had better not be accessing
>>>> HMB memory. Is there something else going on here?
>>> 
>>> Kai Heng will need to speak up a bit in his time zone as he has this disk
>>> on hand,
>>> but what I recall from our discussion was that DMA operation MemRd after
>>> resume was the source of the hang.
>> 
>> Yes, that’ what I was told by the NVMe vendor, so all I know is to impose a
>> memory barrier.
>> If mb() shouldn’t be used here, what’s the correct variant to use in this
>> context?
>> 
>>> 
>>>>> This still allows D3 which we found at least failed to go into deepest
>>>>> state and
>>>> blocked
>>>>> platform s0ix for the following SSDs (maybe others):
>>>>> Hynix PC601
>>>>> LiteOn CL1
>>>> 
>>>> We usually write features to spec first, then quirk non-compliant
>>>> devices after.
>>> 
>>> NVME spec doesn't talk about a relationship between SetFeatures w/
>>> NVME_FEAT_POWER_MGMGT and D3 support, nor order of events.
>>> 
>>> This is why we opened a dialog with storage vendors, including
>>> contrasting the behavior
>>> of Microsoft Windows inbox NVME driver and Intel's Windows RST driver.
>>> 
>>> Those two I mention that come to mind immediately because they were most
>>> recently
>>> tested to fail.  Our discussion with storage vendors overwhelmingly
>>> requested
>>> that we don't use D3 under S2I because their current firmware
>>> architecture won't
>>> support it.
>>> 
>>> For example one vendor told us with current implementation that receiving
>>> D3hot
>>> after NVME shutdown will prevent being able to enter L1.2.  D3hot entry
>>> was supported
>>> by an IRQ handler that isn't serviced in NVME shutdown state.
>>> 
>>> Another vendor told us that with current implementation it's impossible
>>> to transition
>>> to PS4 (at least via APST) while L1.2 D3hot is active.
>> 
>> I tested the patch from Keith and it has two issues just as simply skipping
>> nvme_dev_disable():
>> 1) It consumes more power in S2I
>> 2) System freeze after resume
> 
> Well, the Keith's patch doesn't prevent pci_pm_suspend_noirq() from
> asking for D3 and both of the symptoms above may be consequences of
> that in principle.

Sorry, I should mention that I use a slightly modified drivers/nvme/host/pci.c:

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 3e4fb891a95a..ece428ce6876 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -18,6 +18,7 @@
 #include <linux/mutex.h>
 #include <linux/once.h>
 #include <linux/pci.h>
+#include <linux/suspend.h>
 #include <linux/t10-pi.h>
 #include <linux/types.h>
 #include <linux/io-64-nonatomic-lo-hi.h>
@@ -2833,6 +2834,11 @@ static int nvme_suspend(struct device *dev)
        struct pci_dev *pdev = to_pci_dev(dev);
        struct nvme_dev *ndev = pci_get_drvdata(pdev);
 
+       if (!pm_suspend_via_firmware()) {
+               nvme_set_power(&ndev->ctrl, ndev->ctrl.npss);
+               pci_save_state(pdev);
+       }
+
        nvme_dev_disable(ndev, true);
        return 0;
 }
@@ -2842,6 +2848,10 @@ static int nvme_resume(struct device *dev)
        struct pci_dev *pdev = to_pci_dev(dev);
        struct nvme_dev *ndev = pci_get_drvdata(pdev);
 
+       if (!pm_resume_via_firmware()) {
+               return nvme_set_power(&ndev->ctrl, 0);
+       }
+
        nvme_reset_ctrl(&ndev->ctrl);
        return 0;
}

Does pci_save_state() here enough to prevent the device enter to D3?

Kai-Heng

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ