linux-kernel - Re: [patch v2 1/2] sched: check for prev_cpu == this_cpu before calling wake

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1270189239.6513.78.camel@marge.simson.net>
Date:	Fri, 02 Apr 2010 08:20:39 +0200
From:	Mike Galbraith <efault@....de>
To:	Suresh Siddha <suresh.b.siddha@...el.com>
Cc:	Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...e.hu>,
	Arjan van de Ven <arjan@...ux.jf.intel.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>,
	Yanmin Zhang <yanmin_zhang@...ux.jf.intel.com>,
	Gautham R Shenoy <ego@...ibm.com>
Subject: Re: [patch v2 1/2] sched: check for prev_cpu == this_cpu before
 calling wake_affine()

On Thu, 2010-04-01 at 14:04 -0700, Suresh Siddha wrote:

> Consider this scenario. Today we do balance on fork() and exec(). This
> will cause the tasks to start far away. On systems like NHM-EP, tasks
> will start on two different sockets/nodes(as each socket is a numa node)
> and allocate their memory locally etc. Task A starting on Node-0 and
> Task B starting on Node-1. Once task B sleeps and if Task A or something
> else wakes up task B on Node-0, (with the recent change) just because
> there is an idle HT sibling on node-0 we endup waking the task on
> node-0. This is wrong. We should first atleast go through wake_affine()
> and if wake_affine() says ok to move the task to node-0, then we can
> look at the cache siblings for node-0 and select an appropriate cpu.

Yes, if task A and task B are more or less unrelated, you'd want them to
stay in separate domains, you'd not want some random event to pull.  The
other side of the coin is tasks which fork off partners that they will
talk to at high frequency.  They land just as far away, and desperately
need to move into a shared cache domain.  There's currently no
discriminator, so while always asking wake_affine() may reduce the risk
of moving a task with a large footprint, it also increases the risk of
leaving buddies jabbering cross cache.  You can tweak it in either
direction, and neither can be called "wrong", it's all compromise.

Do you have a compute load bouncing painfully which this patch cures?

I have no strong objections, and the result is certainly easier on the
eye.  If I were making the decision, I'd want to see some numbers.

	-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/