lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <002701d100cc$98cb8c60$ca62a520$@net> Date: Tue, 6 Oct 2015 23:51:28 -0700 From: "Doug Smythies" <dsmythies@...us.net> To: "'Prarit Bhargava'" <prarit@...hat.com> Cc: "'Kristen Carlson Accardi'" <kristen@...ux.intel.com>, <linux-kernel@...r.kernel.org>, "'Viresh Kumar'" <viresh.kumar@...aro.org>, <linux-pm@...r.kernel.org>, "'Rafael J. Wysocki'" <rjw@...ysocki.net>, "Doug Smythies" <dsmythies@...us.net> Subject: RE: [PATCH] cpufreq, intel_pstate, set max_sysfs_pct and min_sysfs_pct on governor switch On 2015.09.06 16:48 Rafael J. Wysocki wrote: > On Wednesday, October 07, 2015 12:43:55 AM Rafael J. Wysocki wrote: >> On Tuesday, October 06, 2015 05:49:07 PM Prarit Bhargava wrote: >>> Intel CPUs will not enter higher p-states when after switching from the >>> performance governor to the powersave governor, until >>> /sys/devices/system/cpu/intel_pstate/min_perf_pct is set to a low value. It works properly for me. Isn't the root issue here an incompatibility between tools/power/cpupower/utils/cpufreq-set.c and drivers/cpufreq/intel_pstate.c? (see experiment results below, where I do not use "cpupower") I am not familiar with tools/power/cpupower/utils/cpufreq-set.c, but will look at it more tomorrow. >>> This differs from previous behaviour in which a switch to the powersave >>> governor would result in a low default value for min_perf_pct. >>> >>> The behavior of the powersave governor changed after commit a04759924e25 >>> ("[cpufreq] intel_pstate: honor user space min_perf_pct override on >>> resume"). The commit introduced tracking of performance percentage >>> changes via sysfs in order to restore userspace changes during >>> suspend/resume. The problem occurs because the global values of the newly >>> introduced max_sysfs_pct and min_sysfs_pct are not reset on a governor >>> change and this causes the new governor to inherit the previous governor's >>> settings. >>> >>> This patch sets max_sysfs_pct to 100 and min_sysfs_pct to 0 on a governor >>> change which fixes the problem with governor switching. These changes >>> also make the initial calculations for max_perf_pct and min_perf_pct >>> slightly simpler. >>> >>> Before patch: >>> [root@...el-skylake-y-01 power]# cpupower frequency-set -g performance >>> [root@...el-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct >>> 100 >>> [root@...el-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct >>> 100 >>> [root@...el-skylake-y-01 power]# cpupower frequency-set -g powersave >>> [root@...el-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct >>> 100 >>> [root@...el-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct >>> 100 And before patch I get, using primitives and not cpupower: Executive Summary: Everything works fine (or at least as I thought it was supposed to). root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave ... /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:powersave root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* /sys/devices/system/cpu/intel_pstate/max_perf_pct:100 /sys/devices/system/cpu/intel_pstate/min_perf_pct:42 root@s15:/home/doug/temp# echo 50 > /sys/devices/system/cpu/intel_pstate/min_perf_pct root@s15:/home/doug/temp# echo 80 > /sys/devices/system/cpu/intel_pstate/max_perf_pct root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* /sys/devices/system/cpu/intel_pstate/max_perf_pct:80 /sys/devices/system/cpu/intel_pstate/min_perf_pct:50 root@s15:/home/doug/temp# for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "performance" > $file; done root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* /sys/devices/system/cpu/intel_pstate/max_perf_pct:100 /sys/devices/system/cpu/intel_pstate/min_perf_pct:100 root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:performance ... /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:performance root@s15:/home/doug/temp# for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "powersave" > $file; done root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave ... /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:powersave root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* /sys/devices/system/cpu/intel_pstate/max_perf_pct:80 /sys/devices/system/cpu/intel_pstate/min_perf_pct:50 >>> >>> After patch: >>> [root@...el-skylake-y-01 power]# cpupower frequency-set -g performance >>> [root@...el-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct >>> 100 >>> [root@...el-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct >>> 100 >>> [root@...el-skylake-y-01 power]# cpupower frequency-set -g powersave >>> [root@...el-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct >>> 14 >>> [root@...el-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct >>> 100 >>> And after the patch I get, using primitives and not cpupower: Executive Summary: Settings go back to default, and user settings are lost. This is not how I thought things were supposed to behave, but I'm not actually sure. root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave ... /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:powersave root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* /sys/devices/system/cpu/intel_pstate/max_perf_pct:100 /sys/devices/system/cpu/intel_pstate/min_perf_pct:42 root@s15:/home/doug/temp# echo 50 > /sys/devices/system/cpu/intel_pstate/min_perf_pct root@s15:/home/doug/temp# echo 80 > /sys/devices/system/cpu/intel_pstate/max_perf_pct root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* /sys/devices/system/cpu/intel_pstate/max_perf_pct:80 /sys/devices/system/cpu/intel_pstate/min_perf_pct:50 root@s15:/home/doug/temp# for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "performance" > $file; done root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:performance ... /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:performance root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* /sys/devices/system/cpu/intel_pstate/max_perf_pct:100 /sys/devices/system/cpu/intel_pstate/min_perf_pct:100 root@s15:/home/doug/temp# for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "powersave" > $file; done root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave ... /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:powersave root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* /sys/devices/system/cpu/intel_pstate/max_perf_pct:100 /sys/devices/system/cpu/intel_pstate/min_perf_pct:42 >>> Also note that I have tested suspend/resume (using CONFIG_PM_DEBUG): >>> [root@...el-skylake-y-01 power]# echo 50 > /sys/devices/system/cpu/intel_pstate/min_perf_pct >>> [root@...el-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/*_perf_pct >>> 100 >>> 50 >>> [root@...el-skylake-y-01 power]# echo devices > /sys/power/pm_test >>> [root@...el-skylake-y-01 power]# echo platform > /sys/power/disk >>> [root@...el-skylake-y-01 power]# echo disk > /sys/power/state >>> [root@...el-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/*_perf_pct >>> 100 >>> 50 Before Patch, I get: root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* /sys/devices/system/cpu/intel_pstate/max_perf_pct:80 /sys/devices/system/cpu/intel_pstate/min_perf_pct:50 root@s15:/home/doug/temp# pm-suspend ... root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* /sys/devices/system/cpu/intel_pstate/max_perf_pct:80 /sys/devices/system/cpu/intel_pstate/min_perf_pct:50 After Patch, I get: root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* /sys/devices/system/cpu/intel_pstate/max_perf_pct:80 /sys/devices/system/cpu/intel_pstate/min_perf_pct:50 root@s15:/home/doug/temp# pm-suspend ... root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* /sys/devices/system/cpu/intel_pstate/max_perf_pct:100 /sys/devices/system/cpu/intel_pstate/min_perf_pct:42 >>> >>> Fixes: a04759924e25 ("[cpufreq] intel_pstate: honor user space min_perf_pct override on resume") >>> Cc: Kristen Carlson Accardi <kristen@...ux.intel.com> >>> Cc: "Rafael J. Wysocki" <rjw@...ysocki.net> >>> Cc: Viresh Kumar <viresh.kumar@...aro.org> >>> Cc: linux-pm@...r.kernel.org >>> Signed-off-by: Prarit Bhargava <prarit@...hat.com> >>> --- >>> drivers/cpufreq/intel_pstate.c | 7 +++++-- >>> 1 file changed, 5 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c >>> index 3af9dd7..bb24458 100644 >>> --- a/drivers/cpufreq/intel_pstate.c >>> +++ b/drivers/cpufreq/intel_pstate.c >>> @@ -986,6 +986,9 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy) >>> if (!policy->cpuinfo.max_freq) >>> return -ENODEV; >>> >>> + limits.min_sysfs_pct = 0; >>> + limits.max_sysfs_pct = 100; >>> + >>> if (policy->policy == CPUFREQ_POLICY_PERFORMANCE && >>> policy->max >= policy->cpuinfo.max_freq) { >>> limits.min_policy_pct = 100; >>> @@ -1004,9 +1007,9 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy) >>> limits.max_policy_pct = clamp_t(int, limits.max_policy_pct, 0 , 100); >>> >>> /* Normalize user input to [min_policy_pct, max_policy_pct] */ >>> - limits.min_perf_pct = max(limits.min_policy_pct, limits.min_sysfs_pct); >>> + limits.min_perf_pct = limits.min_policy_pct; >>> limits.min_perf_pct = min(limits.max_policy_pct, limits.min_perf_pct); >>> - limits.max_perf_pct = min(limits.max_policy_pct, limits.max_sysfs_pct); >>> + limits.max_perf_pct = limits.max_sysfs_pct; > > On a second thought, isn't that always 100? If so, doesn't it basically discard > limits.max_policy_pct? > Yes, I think so, see above. >>> limits.max_perf_pct = max(limits.min_policy_pct, limits.max_perf_pct); >>> >>> /* Make sure min_perf_pct <= max_perf_pct */ >>> Kernels used: 4.3-rc4 and same plus this patch. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists