lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Tue, 27 Nov 2012 19:02:50 +0000
From:	Vladimir Davydov <VDavydov@...allels.com>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Ingo Molnar <mingo@...e.hu>,
	Arjan van de Ven <arjan@...ux.intel.com>
CC:	Len Brown <lenb@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Nikhil Rao <ncrao@...gle.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] cpuidle: menu: use nr_running instead of cpuload for
 calculating perf mult

Hello again

Playing with cpu cgroups, I've found that performance of various workloads running inside cgroups can be significantly improved by increasing sched load resolution and that this improvement has already been committed to the kernel (c8b281161dfa4bb5d5be63fb036ce19347b88c63). Unfortunately, it turned out that the commit triggered power usage regression and as a result was reverted until the root cause of the regression was found (see commit e4c2fb0d5776b58049d2556b456144a4db3fe5a9). And the root cause is still unknown...

I guess that the behavior of the cpuidle menu governor, which is stated in the patch below, can explain the regression. The governor uses this_cpu_load() to estimate idleness of the system. After commit c8b281161dfa4bb5d5be63fb036ce19347b88c63, which increases sched load resolution, this_cpu_load() seems to return large values even for small loads, which would probably lead to the cpuidle governor making wrong decisions due to overestimating the system load.

So, this seems to be another reason to use some different performance multiplier in cpuidle governor.

On Jun 4, 2012, at 2:24 PM, Vladimir Davydov <VDavydov@...allels.com> wrote:

> rq->cpuload strongly depends on cgroup hierarchy. For example, if hundreds of
> tasks are running inside cpu:/test cgroup, the sum of cpuload over all cpus
> won't exceed 1024 (by default). That makes the cpuidle menu governor take wrong
> decisions, which can negatively affect overall performance.
> 
> To cope this, use nr_running last seen in __update_cpu_load() instead of
> cpuload for calculating performance multiplier.
> ---
> drivers/cpuidle/governors/menu.c |   17 +++--------------
> include/linux/sched.h            |    2 +-
> kernel/sched/core.c              |    7 ++++---
> kernel/sched/sched.h             |    3 +++
> 4 files changed, 11 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index 0633575..2aa2625 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -123,17 +123,6 @@ struct menu_device {
> };
> 
> 
> -#define LOAD_INT(x) ((x) >> FSHIFT)
> -#define LOAD_FRAC(x) LOAD_INT(((x) & (FIXED_1-1)) * 100)
> -
> -static int get_loadavg(void)
> -{
> -	unsigned long this = this_cpu_load();
> -
> -
> -	return LOAD_INT(this) * 10 + LOAD_FRAC(this) / 10;
> -}
> -
> static inline int which_bucket(unsigned int duration)
> {
> 	int bucket = 0;
> @@ -170,13 +159,13 @@ static inline int which_bucket(unsigned int duration)
> static inline int performance_multiplier(void)
> {
> 	int mult = 1;
> +	int this_cpu = smp_processor_id();
> 
> 	/* for higher loadavg, we are more reluctant */
> -
> -	mult += 2 * get_loadavg();
> +	mult += 10 * nr_active_cpu(this_cpu);
> 
> 	/* for IO wait tasks (per cpu!) we add 5x each */
> -	mult += 10 * nr_iowait_cpu(smp_processor_id());
> +	mult += 10 * nr_iowait_cpu(this_cpu);
> 
> 	return mult;
> }
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index f45c0b2..fb83d22 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -141,7 +141,7 @@ extern unsigned long nr_running(void);
> extern unsigned long nr_uninterruptible(void);
> extern unsigned long nr_iowait(void);
> extern unsigned long nr_iowait_cpu(int cpu);
> -extern unsigned long this_cpu_load(void);
> +extern unsigned long nr_active_cpu(int cpu);
> 
> 
> extern void calc_global_load(unsigned long ticks);
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 39eb601..8cc2011 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2155,10 +2155,10 @@ unsigned long nr_iowait_cpu(int cpu)
> 	return atomic_read(&this->nr_iowait);
> }
> 
> -unsigned long this_cpu_load(void)
> +unsigned long nr_active_cpu(int cpu)
> {
> -	struct rq *this = this_rq();
> -	return this->cpu_load[0];
> +	struct rq *this = cpu_rq(cpu);
> +	return this->nr_active;
> }
> 
> 
> @@ -2494,6 +2494,7 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
> 	this_rq->nr_load_updates++;
> 
> 	/* Update our load: */
> +	this_rq->nr_active = this_rq->nr_running;
> 	this_rq->cpu_load[0] = this_load; /* Fasttrack for idx 0 */
> 	for (i = 1, scale = 2; i < CPU_LOAD_IDX_MAX; i++, scale += scale) {
> 		unsigned long old_load, new_load;
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index ba9dccf..2564712 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -379,6 +379,9 @@ struct rq {
> 	struct list_head leaf_rt_rq_list;
> #endif
> 
> +	/* nr_running last seen in __update_cpu_load() */
> +	unsigned long nr_active;
> +
> 	/*
> 	 * This is part of a global counter where only the total sum
> 	 * over all CPUs matters. A task can increase this counter on
> -- 
> 1.7.1
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ