lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3623107.tlAuqH4F7s@vostro.rjw.lan>
Date:	Fri, 01 Apr 2016 14:40:15 +0200
From:	"Rafael J. Wysocki" <rjw@...ysocki.net>
To:	Jörg Otte <jrg.otte@...il.com>
Cc:	"Rafael J. Wysocki" <rafael@...nel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Linux PM list <linux-pm@...r.kernel.org>,
	Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
	Doug Smythies <dsmythies@...us.net>
Subject: Re: [intel-pstate driver regression] processor frequency very high even if in idle

On Friday, April 01, 2016 11:20:42 AM Jörg Otte wrote:
> 2016-03-31 17:43 GMT+02:00 Rafael J. Wysocki <rjw@...ysocki.net>:
> > On Thursday, March 31, 2016 05:25:18 PM Jörg Otte wrote:
> >> 2016-03-31 13:42 GMT+02:00 Rafael J. Wysocki <rjw@...ysocki.net>:
> >> > On Thursday, March 31, 2016 11:05:56 AM Jörg Otte wrote:
> >> >
> >> > [cut]
> >> >
> >> >> >
> >> >>
> >> >> Yes, works for me.
> >> >>
> >> >> CPUID(7): No-SGX
> >> >>      CPU Avg_MHz   Busy% Bzy_MHz TSC_MHz
> >> >>        -      11    0.66 1682 2494
> >> >>        0      11    0.60 1856 2494
> >> >>        1       6    0.34    1898    2494
> >> >>        2      13    0.82    1628    2494
> >> >>        3      13    0.87    1528    2494
> >> >>      CPU Avg_MHz   Busy% Bzy_MHz TSC_MHz
> >> >>        -       6    0.58     963    2494
> >> >>        0       8    0.83     957    2494
> >> >>        1       1    0.08     984    2494
> >> >>        2      10    1.04     975    2494
> >> >>        3       3    0.35     934    2494
> >> >>
> >> >
> >
> > [cut]
> >
> >> >
> >>
> >> No, this patch doesn't help.
> >
> > Well, more work to do then.
> >
> > I've just noticed a bug in this patch, which is not relevant for the results,
> > but below is a new version.
> >
> >> CPUID(7): No-SGX
> >>       CPU Avg_MHz   Busy% Bzy_MHz TSC_MHz
> >>        -       8    0.32    2507    2495
> >>        0      13    0.53    2505    2495
> >>        1       3    0.11    2523    2495
> >>        2       1    0.06    2555    2495
> >>        3      15    0.59    2500    2495
> >>      CPU Avg_MHz   Busy% Bzy_MHz TSC_MHz
> >>        -       8    0.33    2486    2495
> >>        0      12    0.50    2482    2495
> >>        1       5    0.22    2489    2495
> >>        2       1    0.04    2492    2495
> >>        3      15    0.59    2487    2495
> >

[cut]

> 
> here they are.
> 

Thanks!

First of all, the sampling mechanics works as expected in the failing case,
which is the most important thing I wanted to know.  However, there are anomalies
in the failing case trace.  The core_busy column is clearly suspicious and it
looks like CPUs 2 and 3 never really go idle.  I guess we'll need to find out
why they don't go idle to get to the bottom of this, but it firmly falls into
the weird stuff territory already.

In the meantime, below is one more patch to test, on top of the previous one
(that is, https://patchwork.kernel.org/patch/8714401/).

Again, this is a change I'd like to make regardless, so it would be good to
know if anything more has to be done before we go further.

---
From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
Subject: [PATCH] intel_pstate: Avoid extra invocation of intel_pstate_sample()

The initialization of intel_pstate for a given CPU involves populating
the fields of its struct cpudata that represent the previous sample,
but currently that is done in a problematic way.

Namely, intel_pstate_init_cpu() makes an extra call to
intel_pstate_sample() so it reads the current register values that
will be used to populate the "previous sample" record during the
next invocation of intel_pstate_sample().  However, after commit
a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with utilization
update callbacks) that doesn't work for last_sample_time, because
the time value is passed to intel_pstate_sample() as an argument now.
Passing 0 to it from intel_pstate_init_cpu() is problematic, because
that causes cpu->last_sample_time == 0 to be visible in
get_target_pstate_use_performance() (and hence the extra
cpu->last_sample_time > 0 check in there) and effectively allows
the first invocation of intel_pstate_sample() from
intel_pstate_update_util() to happen immediately after the
initialization which may lead to a significant "turn on"
effect in the governor algorithm.

To mitigate that issue, rework the initialization to avoid the
extra intel_pstate_sample() call from intel_pstate_init_cpu().
Instead, make intel_pstate_sample() return false if it has been
called with cpu->sample.time equal to zero, which will make
intel_pstate_update_util() skip the sample in that case, and
reset cpu->sample.time from intel_pstate_set_update_util_hook()
to make the algorithm start properly every time the hook is set.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
---
 drivers/cpufreq/intel_pstate.c |   21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -910,7 +910,14 @@ static inline bool intel_pstate_sample(s
 	cpu->prev_aperf = aperf;
 	cpu->prev_mperf = mperf;
 	cpu->prev_tsc = tsc;
-	return true;
+	/*
+	 * First time this function is invoked in a given cycle, all of the
+	 * previous sample data fields are equal to zero or stale and they must
+	 * be populated with meaningful numbers for things to work, so assume
+	 * that sample.time will always be reset before setting the utilization
+	 * update hook and make the caller skip the sample then.
+	 */
+	return !!cpu->last_sample_time;
 }
 
 static inline int32_t get_avg_frequency(struct cpudata *cpu)
@@ -984,8 +991,7 @@ static inline int32_t get_target_pstate_
 	 * enough period of time to adjust our busyness.
 	 */
 	duration_ns = cpu->sample.time - cpu->last_sample_time;
-	if ((s64)duration_ns > pid_params.sample_rate_ns * 3
-	    && cpu->last_sample_time > 0) {
+	if ((s64)duration_ns > pid_params.sample_rate_ns * 3) {
 		sample_ratio = div_fp(int_tofp(pid_params.sample_rate_ns),
 				      int_tofp(duration_ns));
 		core_busy = mul_fp(core_busy, sample_ratio);
@@ -1100,7 +1106,6 @@ static int intel_pstate_init_cpu(unsigne
 	intel_pstate_get_cpu_pstates(cpu);
 
 	intel_pstate_busy_pid_reset(cpu);
-	intel_pstate_sample(cpu, 0);
 
 	cpu->update_util.func = intel_pstate_update_util;
 
@@ -1121,9 +1126,13 @@ static unsigned int intel_pstate_get(uns
 	return get_avg_frequency(cpu);
 }
 
-static void intel_pstate_set_update_util_hook(unsigned int cpu)
+static void intel_pstate_set_update_util_hook(unsigned int cpu_num)
 {
-	cpufreq_set_update_util_data(cpu, &all_cpu_data[cpu]->update_util);
+	struct cpudata *cpu = all_cpu_data[cpu_num];
+
+	/* Prevent intel_pstate_update_util() from using stale data. */
+	cpu->sample.time = 0;
+	cpufreq_set_update_util_data(cpu_num, &cpu->update_util);
 }
 
 static void intel_pstate_clear_update_util_hook(unsigned int cpu)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ