lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8fc7d9236bfa968b5f202d458e99e2e729fed343.camel@intel.com>
Date:   Wed, 6 Sep 2023 02:20:34 +0000
From:   "Zhang, Rui" <rui.zhang@...el.com>
To:     "srinivas.pandruvada@...ux.intel.com" 
        <srinivas.pandruvada@...ux.intel.com>,
        "rafael@...nel.org" <rafael@...nel.org>
CC:     "linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>,
        "Pawnikar, Sumeet R" <sumeet.r.pawnikar@...el.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: Re: [PATCH] powercap: intel_rapl: Fix setting of Power Limit 4 to 0

Hi, Srinivas,

Thanks for the patch.

On Tue, 2023-09-05 at 17:06 -0700, Srinivas Pandruvada wrote:
> System runs at minimum performance, once powercap RAPL package domain
> "enabled" flag is toggled.
> 
> Setting RAPL package domain enabled flag to 0, results in setting of
> power limit 4 (PL4) MSR 0x601 to 0.

This is the bug. Setting enabled flag should only affect the Enable bit
but not the other bits.

>  This implies disabling PL4 limit.
> The PL4 limit controls the peak power. This can significantly change
> the performance. Even worse, when the enabled flag is set to 1 again.
> This will set PL4 MSR value to 0x01, which means reduce peak power to
> 0.125W. This will force the system to run at the lowest possible
> performance.
> 
> This is caused by a change which assumes that there is an enable bit
> in the PL4 MSR like other power limits.

MSR RAPL doesn't have PL4 enable bit, but TPMI RAPL does have per power
limit enable bit.

> 
> In functions set_floor_freq_default() and rapl_remove_package(), call
> rapl_write_pl_data with PL_ENABLE and PL_CLAMP for only power limit 1
> and 2. Similarly don't read PL_ENABLE for PL4 to check the presence
> of
> power limit 4.

IMO, the rootcause is that we expose an non-exits PL4_ENABLE primitive
for MSR Interface, and even worse we expose it wrongly that the
primitive uses the power limit bits.

Commit 9050a9cd5e4c ("powercap: intel_rapl: Cleanup Power Limits
support") pokes the MSR interface PL4_ENABLE primitive and exposes this
bug.

Sumeet is testing the below fix right now, and I suppose he will give
some update soon.

thanks,
rui

diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c
index 8fac57b28f8a..d407f918876f 100644
--- a/drivers/powercap/intel_rapl_common.c
+++ b/drivers/powercap/intel_rapl_common.c
@@ -658,8 +658,6 @@ static struct rapl_primitive_info rpi_msr[NR_RAPL_PRIMITIVES] = {
 			    RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0),
 	[PL2_CLAMP] = PRIMITIVE_INFO_INIT(PL2_CLAMP, POWER_LIMIT2_CLAMP, 48,
 			    RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0),
-	[PL4_ENABLE] = PRIMITIVE_INFO_INIT(PL4_ENABLE, POWER_LIMIT4_MASK, 0,
-				RAPL_DOMAIN_REG_PL4, ARBITRARY_UNIT, 0),
 	[TIME_WINDOW1] = PRIMITIVE_INFO_INIT(TIME_WINDOW1, TIME_WINDOW1_MASK, 17,
 			    RAPL_DOMAIN_REG_LIMIT, TIME_UNIT, 0),
 	[TIME_WINDOW2] = PRIMITIVE_INFO_INIT(TIME_WINDOW2, TIME_WINDOW2_MASK, 49,



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ