lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110215170127.GA28865@dirshya.in.ibm.com>
Date:	Tue, 15 Feb 2011 22:31:27 +0530
From:	Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>
To:	Venkatesh Pallipadi <venki@...gle.com>
Cc:	Suresh Siddha <suresh.b.siddha@...el.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org,
	Paul Turner <pjt@...gle.com>, Mike Galbraith <efault@....de>,
	Nick Piggin <npiggin@...il.com>,
	Tim Chen <tim.c.chen@...el.com>, Alex Shi <alex.shi@...el.com>
Subject: Re: [PATCH] sched: Wholesale removal of sd_idle logic

* Venkatesh Pallipadi <venki@...gle.com> [2011-02-14 14:38:50]:

> sd_idle logic was introduced way back in 2005 (commit 5969fe06),
> as an HT optimization.
> 
> As per the discussion in the thread here
> lkml subject - sched: Resolve sd_idle and first_idle_cpu Catch-22 - v1
> https://patchwork.kernel.org/patch/532501/
> 
> the capacity based logic in the load balancer right now handles this
> in a much cleaner way, handling more than 2 SMT siblings etc, and sd_idle
> does not seem to bring any adiitional benefits. sd_idle logic also has
> some bugs that has performance impact. Here is the patch that removes
> the sd_idle logic altogether.
> 
> The initial patch here - https://patchwork.kernel.org/patch/532501/
> applies cleanly over the below change and provides a micro-optimization
> for a specific case, where an idle core can pull tasks instead of a
> core with one thread being idle and other thread being busy.
> It will be good to get some data on whether this micro-optimization
> matters or not.
> 
> Also, there was a dependency of sched_mc_power_savings == 2, with sd_idle
> logic. Copying Vaidy to know the impact of this change there.

Hi Venki,

The dependency is to avoid active balancing when there is a busy
sibling and power save balance is not set.

Another logic would propagate/force sd_idle=1 to induce more frequent
balancing for idle sibling in case of power save balance.  Removing
sd_idle will make this default.

Your changes look good.  I will test and report.

> Signed-off-by: Venkatesh Pallipadi <venki@...gle.com>

Acked-by: Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>

> ---
>  kernel/sched_fair.c |   53 ++++++++++----------------------------------------
>  1 files changed, 11 insertions(+), 42 deletions(-)
> 
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 0c26e2d..932dc13 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -2610,7 +2610,6 @@ fix_small_capacity(struct sched_domain *sd, struct sched_group *group)
>   * @this_cpu: Cpu for which load balance is currently performed.
>   * @idle: Idle status of this_cpu
>   * @load_idx: Load index of sched_domain of this_cpu for load calc.
> - * @sd_idle: Idle status of the sched_domain containing group.
>   * @local_group: Does group contain this_cpu.
>   * @cpus: Set of cpus considered for load balancing.
>   * @balance: Should we balance.
> @@ -2618,7 +2617,7 @@ fix_small_capacity(struct sched_domain *sd, struct sched_group *group)
>   */
>  static inline void update_sg_lb_stats(struct sched_domain *sd,
>  			struct sched_group *group, int this_cpu,
> -			enum cpu_idle_type idle, int load_idx, int *sd_idle,
> +			enum cpu_idle_type idle, int load_idx,
>  			int local_group, const struct cpumask *cpus,
>  			int *balance, struct sg_lb_stats *sgs)
>  {
> @@ -2638,9 +2637,6 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
>  	for_each_cpu_and(i, sched_group_cpus(group), cpus) {
>  		struct rq *rq = cpu_rq(i);
> 
> -		if (*sd_idle && rq->nr_running)
> -			*sd_idle = 0;
> -
>  		/* Bias balancing toward cpus of our domain */
>  		if (local_group) {
>  			if (idle_cpu(i) && !first_idle_cpu) {
> @@ -2755,15 +2751,13 @@ static bool update_sd_pick_busiest(struct sched_domain *sd,
>   * @sd: sched_domain whose statistics are to be updated.
>   * @this_cpu: Cpu for which load balance is currently performed.
>   * @idle: Idle status of this_cpu
> - * @sd_idle: Idle status of the sched_domain containing sg.
>   * @cpus: Set of cpus considered for load balancing.
>   * @balance: Should we balance.
>   * @sds: variable to hold the statistics for this sched_domain.
>   */
>  static inline void update_sd_lb_stats(struct sched_domain *sd, int this_cpu,
> -			enum cpu_idle_type idle, int *sd_idle,
> -			const struct cpumask *cpus, int *balance,
> -			struct sd_lb_stats *sds)
> +			enum cpu_idle_type idle, const struct cpumask *cpus,
> +			int *balance, struct sd_lb_stats *sds)
>  {
>  	struct sched_domain *child = sd->child;
>  	struct sched_group *sg = sd->groups;
> @@ -2781,7 +2775,7 @@ static inline void update_sd_lb_stats(struct sched_domain *sd, int this_cpu,
> 
>  		local_group = cpumask_test_cpu(this_cpu, sched_group_cpus(sg));
>  		memset(&sgs, 0, sizeof(sgs));
> -		update_sg_lb_stats(sd, sg, this_cpu, idle, load_idx, sd_idle,
> +		update_sg_lb_stats(sd, sg, this_cpu, idle, load_idx,
>  				local_group, cpus, balance, &sgs);
> 
>  		if (local_group && !(*balance))
> @@ -3033,7 +3027,6 @@ static inline void calculate_imbalance(struct sd_lb_stats *sds, int this_cpu,
>   * @imbalance: Variable which stores amount of weighted load which should
>   *		be moved to restore balance/put a group to idle.
>   * @idle: The idle status of this_cpu.
> - * @sd_idle: The idleness of sd
>   * @cpus: The set of CPUs under consideration for load-balancing.
>   * @balance: Pointer to a variable indicating if this_cpu
>   *	is the appropriate cpu to perform load balancing at this_level.
> @@ -3046,7 +3039,7 @@ static inline void calculate_imbalance(struct sd_lb_stats *sds, int this_cpu,
>  static struct sched_group *
>  find_busiest_group(struct sched_domain *sd, int this_cpu,
>  		   unsigned long *imbalance, enum cpu_idle_type idle,
> -		   int *sd_idle, const struct cpumask *cpus, int *balance)
> +		   const struct cpumask *cpus, int *balance)
>  {
>  	struct sd_lb_stats sds;
> 
> @@ -3056,8 +3049,7 @@ find_busiest_group(struct sched_domain *sd, int this_cpu,
>  	 * Compute the various statistics relavent for load balancing at
>  	 * this level.
>  	 */
> -	update_sd_lb_stats(sd, this_cpu, idle, sd_idle, cpus,
> -					balance, &sds);
> +	update_sd_lb_stats(sd, this_cpu, idle, cpus, balance, &sds);
> 
>  	/* Cases where imbalance does not exist from POV of this_cpu */
>  	/* 1) this_cpu is not the appropriate cpu to perform load balancing
> @@ -3193,7 +3185,7 @@ find_busiest_queue(struct sched_domain *sd, struct sched_group *group,
>  /* Working cpumask for load_balance and load_balance_newidle. */
>  static DEFINE_PER_CPU(cpumask_var_t, load_balance_tmpmask);
> 
> -static int need_active_balance(struct sched_domain *sd, int sd_idle, int idle,
> +static int need_active_balance(struct sched_domain *sd, int idle,
>  			       int busiest_cpu, int this_cpu)
>  {
>  	if (idle == CPU_NEWLY_IDLE) {
> @@ -3225,10 +3217,6 @@ static int need_active_balance(struct sched_domain *sd, int sd_idle, int idle,
>  		 * move_tasks() will succeed.  ld_moved will be true and this
>  		 * active balance code will not be triggered.
>  		 */
> -		if (!sd_idle && sd->flags & SD_SHARE_CPUPOWER &&
> -		    !test_sd_parent(sd, SD_POWERSAVINGS_BALANCE))
> -			return 0;
> -

This condition will nack active balancing for semi idle core when
sched_smt_powersavings is not set.  f_b_g() itself should have
returned NULL if there are no power savings opportunity.

>  		if (sched_mc_power_savings < POWERSAVINGS_BALANCE_WAKEUP)
>  			return 0;
>  	}
> @@ -3246,7 +3234,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>  			struct sched_domain *sd, enum cpu_idle_type idle,
>  			int *balance)
>  {
> -	int ld_moved, all_pinned = 0, active_balance = 0, sd_idle = 0;
> +	int ld_moved, all_pinned = 0, active_balance = 0;
>  	struct sched_group *group;
>  	unsigned long imbalance;
>  	struct rq *busiest;
> @@ -3255,20 +3243,10 @@ static int load_balance(int this_cpu, struct rq *this_rq,
> 
>  	cpumask_copy(cpus, cpu_active_mask);
> 
> -	/*
> -	 * When power savings policy is enabled for the parent domain, idle
> -	 * sibling can pick up load irrespective of busy siblings. In this case,
> -	 * let the state of idle sibling percolate up as CPU_IDLE, instead of
> -	 * portraying it as CPU_NOT_IDLE.
> -	 */
> -	if (idle != CPU_NOT_IDLE && sd->flags & SD_SHARE_CPUPOWER &&
> -	    !test_sd_parent(sd, SD_POWERSAVINGS_BALANCE))
> -		sd_idle = 1;

This is kind of becoming the default now when sd_idle is removed.
When powersave balance is set, we want to run load balancer more
frequently.

> -
>  	schedstat_inc(sd, lb_count[idle]);
> 
>  redo:
> -	group = find_busiest_group(sd, this_cpu, &imbalance, idle, &sd_idle,
> +	group = find_busiest_group(sd, this_cpu, &imbalance, idle,
>  				   cpus, balance);
> 
>  	if (*balance == 0)
> @@ -3330,8 +3308,7 @@ redo:
>  		if (idle != CPU_NEWLY_IDLE)
>  			sd->nr_balance_failed++;
> 
> -		if (need_active_balance(sd, sd_idle, idle, cpu_of(busiest),
> -					this_cpu)) {
> +		if (need_active_balance(sd, idle, cpu_of(busiest), this_cpu)) {
>  			raw_spin_lock_irqsave(&busiest->lock, flags);
> 
>  			/* don't kick the active_load_balance_cpu_stop,
> @@ -3386,10 +3363,6 @@ redo:
>  			sd->balance_interval *= 2;
>  	}
> 
> -	if (!ld_moved && !sd_idle && sd->flags & SD_SHARE_CPUPOWER &&
> -	    !test_sd_parent(sd, SD_POWERSAVINGS_BALANCE))
> -		ld_moved = -1;

I have not figured out where ld_moved is checked for -1 and why we
need to treat this as a special case.

Your bug fix in idle_balance() for if (pulled_task) {...} is a good
catch.

> -
>  	goto out;
> 
>  out_balanced:
> @@ -3403,11 +3376,7 @@ out_one_pinned:
>  			(sd->balance_interval < sd->max_interval))
>  		sd->balance_interval *= 2;
> 
> -	if (!sd_idle && sd->flags & SD_SHARE_CPUPOWER &&
> -	    !test_sd_parent(sd, SD_POWERSAVINGS_BALANCE))
> -		ld_moved = -1;
> -	else
> -		ld_moved = 0;

Ack.  But why did we have to flag this case earlier?

> +	ld_moved = 0;
>  out:
>  	return ld_moved;
>  }

--Vaidy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ