linux-kernel - Re: [patch v3 0/8] sched: use runnable avg in load balance

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <515BAFE6.1020804@intel.com>
Date:	Wed, 03 Apr 2013 12:28:22 +0800
From:	Alex Shi <alex.shi@...el.com>
To:	Michael Wang <wangyun@...ux.vnet.ibm.com>
CC:	mingo@...hat.com, peterz@...radead.org, tglx@...utronix.de,
	akpm@...ux-foundation.org, arjan@...ux.intel.com, bp@...en8.de,
	pjt@...gle.com, namhyung@...nel.org, efault@....de,
	morten.rasmussen@....com, vincent.guittot@...aro.org,
	gregkh@...uxfoundation.org, preeti@...ux.vnet.ibm.com,
	viresh.kumar@...aro.org, linux-kernel@...r.kernel.org,
	len.brown@...el.com, rafael.j.wysocki@...el.com, jkosina@...e.cz,
	clark.williams@...il.com, tony.luck@...el.com,
	keescook@...omium.org, mgorman@...e.de, riel@...hat.com
Subject: Re: [patch v3 0/8] sched: use runnable avg in load balance

On 04/03/2013 11:23 AM, Michael Wang wrote:
> On 04/03/2013 10:56 AM, Alex Shi wrote:
>> On 04/03/2013 10:46 AM, Michael Wang wrote:
>>> | 15 GB   |      16 | 45110 |   | 48091 |
>>> | 15 GB   |      24 | 41415 |   | 47415 |
>>> | 15 GB   |      32 | 35988 |   | 45749 |	+27.12%
>>>
>>> Very nice improvement, I'd like to test it with the wake-affine throttle
>>> patch later, let's see what will happen ;-)
>>>
>>> Any idea on why the last one caused the regression?
>>
>> you can change the burst threshold: sysctl_sched_migration_cost, to see
>> what's happen with different value. create a similar knob and tune it.
>> +
>> +	if (cpu_rq(this_cpu)->avg_idle < sysctl_sched_migration_cost)
>> +		burst_this = 1;
>> +	if (cpu_rq(prev_cpu)->avg_idle < sysctl_sched_migration_cost)
>> +		burst_prev = 1;
>> +
>>
>>
> 
> This changing the rate of adopt cpu_rq(cpu)->load.weight, correct?
> 
> So if rq is busy, cpu_rq(cpu)->load.weight is capable enough to stand
> for the load status of rq? what's the really idea here?

This patch try to resolved the aim7 liked benchmark regression.
If many tasks sleep long time, their runnable load are zero. And then if 
they are waked up bursty, too light runnable load causes big imbalance in
 select_task_rq. So such benchmark, like aim9 drop 5~7%.

this patch try to detect the burst, if so, it use load weight directly not
 zero runnable load avg to avoid the imbalance.

but the patch may cause some unfairness if this/prev cpu are not burst at 
same time. So could like try the following patch?


>From 4722a7567dccfb19aa5afbb49982ffb6d65e6ae5 Mon Sep 17 00:00:00 2001
From: Alex Shi <alex.shi@...el.com>
Date: Tue, 2 Apr 2013 10:27:45 +0800
Subject: [PATCH] sched: use instant load for burst wake up

If many tasks sleep long time, their runnable load are zero. And if they
are waked up bursty, too light runnable load causes big imbalance among
CPU. So such benchmark, like aim9 drop 5~7%.

With this patch the losing is covered, and even is slight better.

Signed-off-by: Alex Shi <alex.shi@...el.com>
---
 kernel/sched/fair.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index dbaa8ca..25ac437 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3103,12 +3103,24 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
 	unsigned long weight;
 	int balanced;
 	int runnable_avg;
+	int burst = 0;
 
 	idx	  = sd->wake_idx;
 	this_cpu  = smp_processor_id();
 	prev_cpu  = task_cpu(p);
-	load	  = source_load(prev_cpu, idx);
-	this_load = target_load(this_cpu, idx);
+
+	if (cpu_rq(this_cpu)->avg_idle < sysctl_sched_migration_cost ||
+		cpu_rq(prev_cpu)->avg_idle < sysctl_sched_migration_cost)
+		burst= 1;
+
+	/* use instant load for bursty waking up */
+	if (!burst) {
+		load = source_load(prev_cpu, idx);
+		this_load = target_load(this_cpu, idx);
+	} else {
+		load = cpu_rq(prev_cpu)->load.weight;
+		this_load = cpu_rq(this_cpu)->load.weight;
+	}
 
 	/*
 	 * If sync wakeup then subtract the (maximum possible)
-- 
1.7.12

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/