lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1297188819-19999-4-git-send-email-venki@google.com>
Date:	Tue,  8 Feb 2011 10:13:39 -0800
From:	Venkatesh Pallipadi <venki@...gle.com>
To:	Suresh Siddha <suresh.b.siddha@...el.com>
Cc:	Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...e.hu>,
	linux-kernel@...r.kernel.org, Paul Turner <pjt@...gle.com>,
	Mike Galbraith <efault@....de>,
	Nick Piggin <npiggin@...il.com>,
	Venkatesh Pallipadi <venki@...gle.com>
Subject: [PATCH 3/3] sched: newidle balance set idle_timestamp only on successful pull

load_balance() could return a negative value in the case of
SMT sibling CPU being busy. Code in idle_balance() though, uses this
return value as an indicator of successful task pull, ignoring the
-1 return value.

This has two problems:
1) Resets idle_stamp even when this return value is -1.
Specific case is on SMT capable system, CPU A is idle and its sibling
CPU B is busy. In this case, CPU A avg_idle will not depend on
a task sleeping/waking up on it. Instead it will continue to hold stale
avg_idle value for extended period of time.

Simple test case of driving avg_idle on a CPU to desired value by using
a usleep loop and then starting a 100% busy loop on its sibling and
changing the usleep rate on original CPU (or removing it completely),
I see the avg_idle on this CPU not updating at all in this case.

2) Breaks out of idle_balance, skipping all higher level domains.
Case can be made that breaking out here is a 'feature' and not a 'bug'.
Periodic balance uses this signal to drop down to busy balance for
higher level domains. But, is simple break out of balancing good for
newidle case?

We do see results in our workload, that this break increases idle time
and reduces both throughput and latency measurably. Also, as newidle
balance itself is ratelimited with avg_idle, would it be OK to continue
balancing upper domains in this case of SMT sibling busy? Or may be
reduce it to CPU_NOT_IDLE from CPU_NEWLY_IDLE? Change here goes with
the former option.

Signed-off-by: Venkatesh Pallipadi <venki@...gle.com>
---
 kernel/sched_fair.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 91227d9..56ab3a6 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -3483,7 +3483,7 @@ static void idle_balance(int this_cpu, struct rq *this_rq)
 		interval = msecs_to_jiffies(sd->balance_interval);
 		if (time_after(next_balance, sd->last_balance + interval))
 			next_balance = sd->last_balance + interval;
-		if (pulled_task) {
+		if (pulled_task > 0) {
 			this_rq->idle_stamp = 0;
 			break;
 		}
-- 
1.7.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ