linux-kernel - Re: [PATCH v1 00/19] Increase resolution of load weights

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110506065959.GB23166@elte.hu>
Date:	Fri, 6 May 2011 08:59:59 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Nikhil Rao <ncrao@...gle.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Mike Galbraith <efault@....de>, linux-kernel@...r.kernel.org,
	"Nikunj A. Dadhania" <nikunj@...ux.vnet.ibm.com>,
	Srivatsa Vaddagiri <vatsa@...ux.vnet.ibm.com>,
	Stephan Barwolf <stephan.baerwolf@...ilmenau.de>
Subject: Re: [PATCH v1 00/19] Increase resolution of load weights


* Nikhil Rao <ncrao@...gle.com> wrote:

> On Wed, May 4, 2011 at 4:13 AM, Ingo Molnar <mingo@...e.hu> wrote:
> >
> > * Nikhil Rao <ncrao@...gle.com> wrote:
> >
> >> Also, out of curiosity, what's an acceptable tolerance level for a 
> >> performance hit on 32-bit?
> >
> > It's a cost/benefit analysis and for 32-bit systems the benefits seem to be 
> > rather small, right?
> 
> Yes, that's right. The benefits for 32-bit systems do seem to be limited.
> 
> When I initially posted this patchset, I expected much larger benefits for 
> 32-bit systems. I ran some experiments yesterday and found negligible gains 
> for 32-bit systems. I think two aspects of this patchset are relevant for 
> 32-bit:
> 
> 1. Better distribution of weight for low-weight task groups. For example, 
> when an autogroup gets niced to +19, the task group is assigned a weight of 
> 15. Since 32-bit systems are only restricted to 8 cpus at most, I think we 
> can manage to handle this case without the need for more resolution. The 
> results for this experiment were not statistically significant.
> 
> 2. You could also run out of resolution with deep hierarchies on 32-bit 
> systems, but you need pretty complex cgroup hierarchies. Let's assume you 
> have a hierarchy with n levels and a branching factor of b at each level. 
> Let's also assume each leaf node has at least one running task and users 
> don't change any of the weights. You will need approx log_b(1024/NR_CPUS) 
> levels to run out of resolution in this setup... so, b=2 needs 7 levels, b=3 
> needs 5 levels, b=4 needs 4 levels, ... and so on. These are a pretty 
> elaborate hierarchy and I'm not sure if there are use cases for these (yet!).

Btw., the "take your patch" and "do not take your patch" choice is a false 
dichotomy, there's a third option we should consider seriously: we do not *have 
to* act for 32-bit systems, if we decide that the benefits are not clear yet.

I.e. we can act on 64-bit systems (there increasing resolution should be near 
zero cost as we have 64-bit ops), but delay any decision for 32-bit systems.

If 32-bit systems evolve in a direction (lots of cores, lots of cgroups 
complexity) that makes the increase in resolution inevitable, we can 
reconsider.

If on the other hand they are replaced more and more by 64-bit systems in all 
segments and become a niche then not acting will be the right decision. There's 
so many other reasons why 64-bit is better, better resolution and more 
scheduling precision in highly parallel systems/setups will just be another 
reason.

Personally i think this latter scenario is a lot more likely.

> > Can we make the increase in resolution dependent on max CPU count or such 
> > and use cheaper divides on 32-bit in that case, while still keeping the 
> > code clean?
> 
> Sure. Is this what you had in mind?

Yes, almost:

> commit 860030069190e3d6e1983cc77c936f7ccdaf7cff
> Author: Nikhil Rao <ncrao@...gle.com>
> Date:   Mon Apr 11 15:16:08 2011 -0700
> 
>     sched: increase SCHED_LOAD_SCALE resolution
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 8d1ff2b..f92353c 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -794,7 +794,21 @@ enum cpu_idle_type {
>  /*
>   * Increase resolution of nice-level calculations:
>   */
> -#define SCHED_LOAD_SHIFT	10
> +#if CONFIG_NR_CPUS > 32
> +#define SCHED_LOAD_RESOLUTION	10
> +
> +#define scale_up_load_resolution(w)	w << SCHED_LOAD_RESOLUTION
> +#define scale_down_load_resolution(w)	w >> SCHED_LOAD_RESOLUTION;
> +
> +#else
> +#define SCHED_LOAD_RESOLUTION	0
> +
> +#define scale_up_load_resolution(w)	w
> +#define scale_down_load_resolution(w)	w
> +
> +#endif
> +
> +#define SCHED_LOAD_SHIFT	(10 + (SCHED_LOAD_RESOLUTION))
>  #define SCHED_LOAD_SCALE	(1L << SCHED_LOAD_SHIFT)

I'd suggest to make the resolution dependent on BITS_PER_LONG. That way we have 
just two basic variants of resolution (CONFIG_NR_CPUS can vary a lot). This 
will be a *lot* more testable, and on 32-bit we will maintain the status quo.

But yes, look at how much nicer the patch looks like now.

A few small comments:

> diff --git a/kernel/sched.c b/kernel/sched.c
> index f4b4679..3dae6c5 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -293,7 +293,7 @@ static DEFINE_SPINLOCK(task_group_lock);
>   *  limitation from this.)
>   */
>  #define MIN_SHARES	2
> -#define MAX_SHARES	(1UL << 18)
> +#define MAX_SHARES	(1UL << (18 + SCHED_LOAD_RESOLUTION))
> 
>  static int root_task_group_load = ROOT_TASK_GROUP_LOAD;
>  #endif
> @@ -1307,14 +1307,18 @@ calc_delta_mine(unsigned long delta_exec, u64
> weight, struct load_weight *lw)
>  	u64 tmp;
> 
>  	if (!lw->inv_weight) {
> -		if (BITS_PER_LONG > 32 && unlikely(lw->weight >= WMULT_CONST))
> +		unsigned long w = scale_down_load_resolution(lw->weight);
> +		if (BITS_PER_LONG > 32 && unlikely(w >= WMULT_CONST))
>  			lw->inv_weight = 1;
>  		else
> -			lw->inv_weight = 1 + (WMULT_CONST-lw->weight/2)
> -				/ (lw->weight+1);
> +			lw->inv_weight = 1 + (WMULT_CONST - w/2) / (w + 1);
>  	}
> 
> -	tmp = (u64)delta_exec * weight;
> +	if (likely(weight > (1UL << SCHED_LOAD_RESOLUTION)))
> +		tmp = (u64)delta_exec * scale_down_load_resolution(weight);
> +	else
> +		tmp = (u64)delta_exec;

Couldnt the compiler figure out that on 32-bit, this:

> +		tmp = (u64)delta_exec * scale_down_load_resolution(weight);

is equivalent to:

> +		tmp = (u64)delta_exec;

?

I.e. it would be nice to check whether a reasonably recent version of GCC 
figures out this optimization by itself - in that case we can avoid the 
branching ugliness, right?

Also, the above (and the other scale-adjustment changes) probably explains why 
the instruction count went up on 64-bit.

> @@ -1758,12 +1762,13 @@ static void set_load_weight(struct task_struct *p)
>  	 * SCHED_IDLE tasks get minimal weight:
>  	 */
>  	if (p->policy == SCHED_IDLE) {
> -		p->se.load.weight = WEIGHT_IDLEPRIO;
> +		p->se.load.weight = scale_up_load_resolution(WEIGHT_IDLEPRIO);
>  		p->se.load.inv_weight = WMULT_IDLEPRIO;
>  		return;
>  	}
> 
> -	p->se.load.weight = prio_to_weight[p->static_prio - MAX_RT_PRIO];
> +	p->se.load.weight = scale_up_load_resolution(
> +			prio_to_weight[p->static_prio - MAX_RT_PRIO]);
>  	p->se.load.inv_weight = prio_to_wmult[p->static_prio - MAX_RT_PRIO];

Please create a local 'load' variable that is equal to &p->se.load, and also
create a 'prio = p->static_prio - MAX_RT_PRIO' variable.

Furthermore, please rename 'scale_up_load_resolution' to something shorter: 
scale_load() is not used within the scheduler yet so it's free for taking.

Then a lot of the above repetitious code could be written as a much nicer:

	load->weight = scale_load(prio_to_weight[prio]);
  	load->inv_weight = prio_to_wmult[prio];

... and the logic becomes a *lot* more readable and the ugly linebreak is gone 
as well.

Please make such a set_load_weight() cleanup patch separate from the main 
patch, so that your main patch can still be reviewed in separation.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/