lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240103144607.46369-1-jalliste@amazon.com>
Date: Wed, 3 Jan 2024 14:46:04 +0000
From: Jack Allister <jalliste@...zon.com>
To:
CC: Jack Allister <jalliste@...zon.com>, "Rafael J . Wysocki"
	<rafael@...nel.org>, Paul Durrant <pdurrant@...zon.com>, Jue Wang
	<juew@...zon.com>, Usama Arif <usama.arif@...edance.com>, Jonathan Corbet
	<corbet@....net>, Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar
	<mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, Dave Hansen
	<dave.hansen@...ux.intel.com>, <x86@...nel.org>, "H. Peter Anvin"
	<hpa@...or.com>, "Paul E. McKenney" <paulmck@...nel.org>, Randy Dunlap
	<rdunlap@...radead.org>, Tejun Heo <tj@...nel.org>, Peter Zijlstra
	<peterz@...radead.org>, Yan-Jie Wang <yanjiewtw@...il.com>, Hans de Goede
	<hdegoede@...hat.com>, <linux-doc@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>
Subject: [PATCH v5] x86: intel_epb: Add earlyparam option to keep bias at performance

Buggy BIOSes may not set a sane boot-time Energy Performance Bias (EPB).
A result of this may be overheating or excess power usage. The kernel
overrides any boot-time EPB "performance" bias to "normal" to avoid this.

When used in data centers it is preferable keep the EPB at "performance"
when performing a live-update of the host kernel via a kexec to the new
kernel. This is due to boot-time being critical when performing the kexec
as running guest VMs will perceieve this as latency or downtime.

On Intel Xeon Ice Lake platforms it has been observed that a combination of
EPB being set to "normal" alongside HWP (Intel Hardware P-states) being
enabled/configured during or close to the kexec causes an increases the
live-update/kexec downtime by 7 times compared to when the EPB is set to
"performance".

Introduce a command-line parameter, "intel_epb=preserve", to skip the
"performance" -> "normal" override/workaround. This maintains prior
functionality when no parameter is set, but adds in the ability to stay at
performance for a speedy kexec if a user wishes.

Signed-off-by: Jack Allister <jalliste@...zon.com>
Acked-by: Rafael J. Wysocki <rafael@...nel.org>
Cc: Paul Durrant <pdurrant@...zon.com>
Cc: Jue Wang <juew@...zon.com>
Cc: Usama Arif <usama.arif@...edance.com>
---
 .../admin-guide/kernel-parameters.txt         | 12 +++++++++++
 arch/x86/kernel/cpu/intel_epb.c               | 21 +++++++++++++++++--
 2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 65731b060e3f..5602ee213115 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2148,6 +2148,18 @@
 			0	disables intel_idle and fall back on acpi_idle.
 			1 to 9	specify maximum depth of C-state.
 
+	intel_epb=	[X86]
+			auto
+			  Same as not passing a parameter to intel_epb. This will
+			  ensure that the intel_epb module will restore the energy
+			  performance bias to "normal" at boot-time. This workaround
+			  is for buggy BIOSes which may not set this value and cause
+			  either overheating or excess power usage.
+			preserve
+			  At kernel boot-time if the EPB value is read as "performance"
+			  keep it at this value. This prevents the "performance" -> "normal"
+			  transition which is a workaround mentioned above.
+
 	intel_pstate=	[X86]
 			disable
 			  Do not enable intel_pstate as the default
diff --git a/arch/x86/kernel/cpu/intel_epb.c b/arch/x86/kernel/cpu/intel_epb.c
index e4c3ba91321c..419e699a43e6 100644
--- a/arch/x86/kernel/cpu/intel_epb.c
+++ b/arch/x86/kernel/cpu/intel_epb.c
@@ -50,7 +50,8 @@
  * the OS will do that anyway.  That sometimes is problematic, as it may cause
  * the system battery to drain too fast, for example, so it is better to adjust
  * it on CPU bring-up and if the initial EPB value for a given CPU is 0, the
- * kernel changes it to 6 ('normal').
+ * kernel changes it to 6 ('normal'). However, if it is desirable to retain the
+ * original initial EPB value, intel_epb=preserve can be set to enforce it.
  */
 
 static DEFINE_PER_CPU(u8, saved_epb);
@@ -75,6 +76,8 @@ static u8 energ_perf_values[] = {
 	[EPB_INDEX_POWERSAVE] = ENERGY_PERF_BIAS_POWERSAVE,
 };
 
+static bool intel_epb_no_override __read_mostly;
+
 static int intel_epb_save(void)
 {
 	u64 epb;
@@ -106,7 +109,7 @@ static void intel_epb_restore(void)
 		 * ('normal').
 		 */
 		val = epb & EPB_MASK;
-		if (val == ENERGY_PERF_BIAS_PERFORMANCE) {
+		if (!intel_epb_no_override && val == ENERGY_PERF_BIAS_PERFORMANCE) {
 			val = energ_perf_values[EPB_INDEX_NORMAL];
 			pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n");
 		}
@@ -213,6 +216,20 @@ static const struct x86_cpu_id intel_epb_normal[] = {
 	{}
 };
 
+static __init int parse_intel_epb(char *str)
+{
+	if (!str)
+		return 0;
+
+	/* "intel_epb=preserve" prevents PERFORMANCE->NORMAL on restore. */
+	if (!strcmp(str, "preserve"))
+		intel_epb_no_override = true;
+
+	return 0;
+}
+
+early_param("intel_epb", parse_intel_epb);
+
 static __init int intel_epb_init(void)
 {
 	const struct x86_cpu_id *id = x86_match_cpu(intel_epb_normal);
-- 
2.40.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ