lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 24 May 2017 15:06:30 -0700
From:   Andy Lutomirski <luto@...nel.org>
To:     Jens Axboe <axboe@...nel.dk>, Christoph Hellwig <hch@....de>,
        Sagi Grimberg <sagi@...mberg.me>,
        Keith Busch <keith.busch@...el.com>
Cc:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Kai-Heng Feng <kai.heng.feng@...onical.com>,
        linux-nvme <linux-nvme@...ts.infradead.org>,
        Andy Lutomirski <luto@...nel.org>
Subject: [PATCH 1/2] nvme: Wait at least 6000ms before entering the deepest idle state

This should at least make vendors less nervous about Linux's APST
policy.  I'm not aware of any concrete bugs it would fix (although I
was hoping it would fix the Samsung/Dell quirk).

Cc: stable@...r.kernel.org # v4.11
Cc: Kai-Heng Feng <kai.heng.feng@...onical.com>
Cc: Mario Limonciello <mario_limonciello@...l.com>
Signed-off-by: Andy Lutomirski <luto@...nel.org>
---
 drivers/nvme/host/core.c | 38 +++++++++++++++++++++++++++++++-------
 1 file changed, 31 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d5e0906262ea..381e9f813385 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1325,13 +1325,7 @@ static void nvme_configure_apst(struct nvme_ctrl *ctrl)
 	/*
 	 * APST (Autonomous Power State Transition) lets us program a
 	 * table of power state transitions that the controller will
-	 * perform automatically.  We configure it with a simple
-	 * heuristic: we are willing to spend at most 2% of the time
-	 * transitioning between power states.  Therefore, when running
-	 * in any given state, we will enter the next lower-power
-	 * non-operational state after waiting 50 * (enlat + exlat)
-	 * microseconds, as long as that state's total latency is under
-	 * the requested maximum latency.
+	 * perform automatically.
 	 *
 	 * We will not autonomously enter any non-operational state for
 	 * which the total latency exceeds ps_max_latency_us.  Users
@@ -1405,9 +1399,39 @@ static void nvme_configure_apst(struct nvme_ctrl *ctrl)
 			/*
 			 * This state is good.  Use it as the APST idle
 			 * target for higher power states.
+			 *
+			 * Intel RSTe supposedly uses the following algorithm:
+			 * 60ms delay to transition to the first
+			 * non-operational state and 1000*exlat to each
+			 * additional state.  This is problematic.  60ms is
+			 * too short if the first non-operational state has
+			 * high latency, and 1000*exlat into a state is
+			 * absurdly slow.  (exlat=22ms seems typical for the
+			 * deepest state.  A delay of 22 seconds to enter that
+			 * state means that it will almost never be entered at
+			 * all, wasting power and, worse, turning otherwise
+			 * easy-to-detect hardware/firmware bugs into sporadic
+			 * problems.
+			 *
+			 * Linux is willing to spend at most 2% of the time
+			 * transitioning between power states.  Therefore,
+			 * when running in any given state, we will enter the
+			 * next lower-power non-operational state after
+			 * waiting 50 * (enlat + exlat) microseconds, as long
+			 * as that state's total latency is under the
+			 * requested maximum latency.
 			 */
 			transition_ms = total_latency_us + 19;
 			do_div(transition_ms, 20);
+
+			/*
+			 * Some vendors have expressed nervousness about
+			 * entering the deepest state after less than six
+			 * seconds.
+			 */
+			if (state == ctrl->npss && transition_ms < 6000)
+				transition_ms = 6000;
+
 			if (transition_ms > (1 << 24) - 1)
 				transition_ms = (1 << 24) - 1;
 
-- 
2.9.4

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ