linux-kernel - Re: [patch v3 0/8] sched: use runnable avg in load balance

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <515BF847.1000808@linux.vnet.ibm.com>
Date:	Wed, 03 Apr 2013 17:37:11 +0800
From:	Michael Wang <wangyun@...ux.vnet.ibm.com>
To:	Alex Shi <alex.shi@...el.com>
CC:	mingo@...hat.com, peterz@...radead.org, tglx@...utronix.de,
	akpm@...ux-foundation.org, arjan@...ux.intel.com, bp@...en8.de,
	pjt@...gle.com, namhyung@...nel.org, efault@....de,
	morten.rasmussen@....com, vincent.guittot@...aro.org,
	gregkh@...uxfoundation.org, preeti@...ux.vnet.ibm.com,
	viresh.kumar@...aro.org, linux-kernel@...r.kernel.org,
	len.brown@...el.com, rafael.j.wysocki@...el.com, jkosina@...e.cz,
	clark.williams@...il.com, tony.luck@...el.com,
	keescook@...omium.org, mgorman@...e.de, riel@...hat.com
Subject: Re: [patch v3 0/8] sched: use runnable avg in load balance

On 04/03/2013 04:46 PM, Alex Shi wrote:
> On 04/02/2013 03:23 PM, Michael Wang wrote:
>> | 15 GB   |      12 | 45393 |   | 43986 |
>> | 15 GB   |      16 | 45110 |   | 45719 |
>> | 15 GB   |      24 | 41415 |   | 36813 |	-11.11%
>> | 15 GB   |      32 | 35988 |   | 34025 |
>>
>> The reason may caused by wake_affine()'s higher overhead, and pgbench is
>> really sensitive to this stuff...
> 
> Michael:
> I changed the threshold to 0.1ms it has same effect on aim7.
> So could you try the following on pgbench?

Hi, Alex

I've done some rough test and the change point should in 60000~120000,
I'm currently running a auto test with value 500000, 250000, 120000,
60000, 30000, 15000, 6000, 3000, 1500, it will take some time to finish
the test, and we will gain detail info for analysis.

BTW, you know we have festival in China, so the report may be delayed,
forgive me on that ;-)

Regards,
Michael Wang

> 
> 
> diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
> index bf8086b..a3c3d43 100644
> --- a/include/linux/sched/sysctl.h
> +++ b/include/linux/sched/sysctl.h
> @@ -53,6 +53,7 @@ extern unsigned int sysctl_numa_balancing_settle_count;
> 
>  #ifdef CONFIG_SCHED_DEBUG
>  extern unsigned int sysctl_sched_migration_cost;
> +extern unsigned int sysctl_sched_burst_threshold;
>  extern unsigned int sysctl_sched_nr_migrate;
>  extern unsigned int sysctl_sched_time_avg;
>  extern unsigned int sysctl_timer_migration;
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index dbaa8ca..dd5a324 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -91,6 +91,7 @@ unsigned int sysctl_sched_wakeup_granularity = 1000000UL;
>  unsigned int normalized_sysctl_sched_wakeup_granularity = 1000000UL;
> 
>  const_debug unsigned int sysctl_sched_migration_cost = 500000UL;
> +const_debug unsigned int sysctl_sched_burst_threshold = 100000UL;
> 
>  /*
>   * The exponential sliding  window over which load is averaged for shares
> @@ -3103,12 +3104,24 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
>  	unsigned long weight;
>  	int balanced;
>  	int runnable_avg;
> +	int burst = 0;
> 
>  	idx	  = sd->wake_idx;
>  	this_cpu  = smp_processor_id();
>  	prev_cpu  = task_cpu(p);
> -	load	  = source_load(prev_cpu, idx);
> -	this_load = target_load(this_cpu, idx);
> +
> +	if (cpu_rq(this_cpu)->avg_idle < sysctl_sched_burst_threshold ||
> +		cpu_rq(prev_cpu)->avg_idle < sysctl_sched_burst_threshold)
> +		burst= 1;
> +
> +	/* use instant load for bursty waking up */
> +	if (!burst) {
> +		load = source_load(prev_cpu, idx);
> +		this_load = target_load(this_cpu, idx);
> +	} else {
> +		load = cpu_rq(prev_cpu)->load.weight;
> +		this_load = cpu_rq(this_cpu)->load.weight;
> +	}
> 
>  	/*
>  	 * If sync wakeup then subtract the (maximum possible)
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index afc1dc6..1f23457 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -327,6 +327,13 @@ static struct ctl_table kern_table[] = {
>  		.proc_handler	= proc_dointvec,
>  	},
>  	{
> +		.procname	= "sched_burst_threshold_ns",
> +		.data		= &sysctl_sched_burst_threshold,
> +		.maxlen		= sizeof(unsigned int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec,
> +	},
> +	{
>  		.procname	= "sched_nr_migrate",
>  		.data		= &sysctl_sched_nr_migrate,
>  		.maxlen		= sizeof(unsigned int),
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/