lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140514114037.2d93266f@annuminas.surriel.com>
Date:	Wed, 14 May 2014 11:40:37 -0400
From:	Rik van Riel <riel@...hat.com>
To:	Mike Galbraith <umgwanakikbuti@...il.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	linux-kernel@...r.kernel.org, morten.rasmussen@....com,
	mingo@...nel.org, george.mccollister@...il.com,
	ktkhai@...allels.com, Mel Gorman <mgorman@...e.de>,
	"Vinod, Chegu\" <chegu_vinod@...com>, Suresh Siddha <suresh.b.siddha@...el.com>"@redhat.com
Subject: [PATCH] sched: call select_idle_sibling when not affine_sd

On Wed, 14 May 2014 06:08:09 +0200
Mike Galbraith <umgwanakikbuti@...il.com> wrote:
> On Tue, 2014-05-13 at 10:08 -0400, Rik van Riel wrote:

> > 1) If the node_distance between nodes on a NUMA system is
> >    <= RECLAIM_DISTANCE, we will call select_idle_sibling for
> >    a wakeup of a previously existing task (SD_BALANCE_WAKE)
> > 
> > 2) If the node distance exceeds RECLAIM_DISTANCE, we will
> >    wake up a task on prev_cpu, even if it is not currently
> >    idle
> > 
> >    This behaviour only happens on certain large NUMA systems,
> >    and is different from the behaviour on small systems.
> >    I suspect we will want to call select_idle_sibling with
> >    prev_cpu in case target and prev_cpu are not in the same
> >    SD_WAKE_AFFINE domain.
> 
> Sometimes.  It's the same can of worms remote as it is local.. latency
> gain may or may not outweigh cache miss pain.

Ahh, but it is a DIFFERENT can of worms. If the distance between
cpu and prev_cpu exceeds RECLAIM_DISTANCE, we will not look for 
an idle sibling in the same LLC domain as prev_cpu.

If the distance is smaller, and we decide not to do an affine
wakeup, then we do look for an idle sibling of prev_cpu.

This patch makes sure that both types of systems have the same
can of worms :)

---8<---

Subject: sched: call select_idle_sibling when not affine_sd

On smaller systems, the top level sched domain will be an affine
domain, and select_idle_sibling is invoked for every SD_WAKE_AFFINE
wakeup. This seems to be working well.

On larger systems, with the node distance between far away NUMA nodes
being > RECLAIM_DISTANCE, select_idle_sibling is only called if the
waker and the wakee are on nodes less than RECLAIM_DISTANCE apart.

This patch leaves in place the policy of not pulling the task across
nodes on such systems, while fixing the issue that select_idle_sibling
is not called at all in certain circumstances.

The code will look for an idle CPU in the same CPU package as the
CPU where the task ran previously.

Signed-off-by: Rik van Riel <riel@...hat.com>
---
 kernel/sched/fair.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 39b63d0..1e58159 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4423,10 +4423,10 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
 			sd = tmp;
 	}
 
-	if (affine_sd) {
-		if (cpu != prev_cpu && wake_affine(affine_sd, p, sync))
-			prev_cpu = cpu;
+	if (affine_sd && cpu != prev_cpu && wake_affine(affine_sd, p, sync))
+		prev_cpu = cpu;
 
+	if (sd_flag & SD_WAKE_AFFINE) {
 		new_cpu = select_idle_sibling(p, prev_cpu);
 		goto unlock;
 	}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ