linux-kernel - Re: [git] CFS-devel, group scheduler, fixes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070919193557.GD20863@linux-os.sc.intel.com>
Date:	Wed, 19 Sep 2007 12:35:57 -0700
From:	"Siddha, Suresh B" <suresh.b.siddha@...el.com>
To:	Tong Li <tong.n.li@...el.com>
Cc:	Ingo Molnar <mingo@...e.hu>, dimm <dmitry.adamushko@...il.com>,
	linux-kernel@...r.kernel.org,
	Srivatsa Vaddagiri <vatsa@...ux.vnet.ibm.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Mike Galbraith <efault@....de>
Subject: Re: [git] CFS-devel, group scheduler, fixes

On Tue, Sep 18, 2007 at 11:03:59PM -0700, Tong Li wrote:
> This patch attempts to improve CFS's SMP global fairness based on the new 
> virtual time design.
> 
> Removed vruntime adjustment in set_task_cpu() as it skews global fairness.
> 
> Modified small_imbalance logic in find_busiest_group(). If there's small 
> imbalance, move tasks from busiest to local sched_group only if the local 
> group contains a CPU whose min_vruntime is the maximum among all CPUs in 
> the same sched_domain. This prevents any CPU from advancing too far ahead 
> in virtual time and avoids tasks thrashing between two CPUs without 
> utilizing other CPUs in the system. For example, for 10 tasks on 8 CPUs, 
> since the load is not evenly divisible by the number of CPUs, we want the 
> extra load to have a fair use of every CPU in the system.
> 
> Tested with a microbenchmark running 10 nice-0 tasks on 8 CPUs. Each task 

Just as an experiment, can you run 82 tasks on 8 CPUs. Current
imbalance_pct logic will not detect and fix the global fairness issue
even with this patch.

>  	if (*imbalance < busiest_load_per_task) {
> -		unsigned long tmp, pwr_now, pwr_move;
> -		unsigned int imbn;
> -
>  small_imbalance:
> -		pwr_move = pwr_now = 0;
> -		imbn = 2;
> -		if (this_nr_running) {
> -			this_load_per_task /= this_nr_running;
> -			if (busiest_load_per_task > this_load_per_task)
> -				imbn = 1;
> -		} else
> -			this_load_per_task = SCHED_LOAD_SCALE;
> -
> -		if (max_load - this_load + SCHED_LOAD_SCALE_FUZZ >=
> -					busiest_load_per_task * imbn) {
> -			*imbalance = busiest_load_per_task;
> -			return busiest;
> -		}

This patch removes quite a few lines and all this is logic is not for
fairness :( Especially the above portion handles some of the HT/MC
optimizations.

> -
> -		/*
> -		 * OK, we don't have enough imbalance to justify moving 
> tasks,
> -		 * however we may be able to increase total CPU power used by
> -		 * moving them.
> +		/* 
> +		 * When there's small imbalance, move tasks only if this
> +		 * sched_group contains a CPU whose min_vruntime is the 
> +		 * maximum among all CPUs in the same domain.
>  		 */
> -
> -		pwr_now += busiest->__cpu_power *
> -				min(busiest_load_per_task, max_load);
> -		pwr_now += this->__cpu_power *
> -				min(this_load_per_task, this_load);
> -		pwr_now /= SCHED_LOAD_SCALE;
> -
> -		/* Amount of load we'd subtract */
> -		tmp = sg_div_cpu_power(busiest,
> -				busiest_load_per_task * SCHED_LOAD_SCALE);
> -		if (max_load > tmp)
> -			pwr_move += busiest->__cpu_power *
> -				min(busiest_load_per_task, max_load - tmp);
> -
> -		/* Amount of load we'd add */
> -		if (max_load * busiest->__cpu_power <
> -				busiest_load_per_task * SCHED_LOAD_SCALE)
> -			tmp = sg_div_cpu_power(this,
> -					max_load * busiest->__cpu_power);
> -		else
> -			tmp = sg_div_cpu_power(this,
> -				busiest_load_per_task * SCHED_LOAD_SCALE);
> -		pwr_move += this->__cpu_power *
> -				min(this_load_per_task, this_load + tmp);
> -		pwr_move /= SCHED_LOAD_SCALE;
> -
> -		/* Move if we gain throughput */
> -		if (pwr_move > pwr_now)
> +		if (max_vruntime_group == this)
>  			*imbalance = busiest_load_per_task;
> +		else
> +			*imbalance = 0;

Not sure how this all interacts when some of the cpu's are idle. I have to
look more closely.

thanks,
suresh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/