linux-kernel - Re: [PATCH 23/35] autonuma: core

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1338308875.26856.121.camel@twins>
Date:	Tue, 29 May 2012 18:27:55 +0200
From:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
To:	Andrea Arcangeli <aarcange@...hat.com>
Cc:	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	Hillf Danton <dhillf@...il.com>, Dan Smith <danms@...ibm.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...e.hu>, Paul Turner <pjt@...gle.com>,
	Suresh Siddha <suresh.b.siddha@...el.com>,
	Mike Galbraith <efault@....de>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Lai Jiangshan <laijs@...fujitsu.com>,
	Bharata B Rao <bharata.rao@...il.com>,
	Lee Schermerhorn <Lee.Schermerhorn@...com>,
	Rik van Riel <riel@...hat.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Srivatsa Vaddagiri <vatsa@...ux.vnet.ibm.com>,
	Christoph Lameter <cl@...ux.com>
Subject: Re: [PATCH 23/35] autonuma: core

On Fri, 2012-05-25 at 19:02 +0200, Andrea Arcangeli wrote:
> This implements knuma_scand, the numa_hinting faults started by
> knuma_scand, the knuma_migrated that migrates the memory queued by the
> NUMA hinting faults, the statistics gathering code that is done by
> knuma_scand for the mm_autonuma and by the numa hinting page faults
> for the sched_autonuma, and most of the rest of the AutoNUMA core
> logics like the false sharing detection, sysfs and initialization
> routines.
> 
> The AutoNUMA algorithm when knuma_scand is not running is a full
> bypass and it must not alter the runtime of memory management and
> scheduler.
> 
> The whole AutoNUMA logic is a chain reaction as result of the actions
> of the knuma_scand. The various parts of the code can be described
> like different gears (gears as in glxgears).
> 
> knuma_scand is the first gear and it collects the mm_autonuma per-process
> statistics and at the same time it sets the pte/pmd it scans as
> pte_numa and pmd_numa.
> 
> The second gear are the numa hinting page faults. These are triggered
> by the pte_numa/pmd_numa pmd/ptes. They collect the sched_autonuma
> per-thread statistics. They also implement the memory follow CPU logic
> where we track if pages are repeatedly accessed by remote nodes. The
> memory follow CPU logic can decide to migrate pages across different
> NUMA nodes by queuing the pages for migration in the per-node
> knuma_migrated queues.
> 
> The third gear is knuma_migrated. There is one knuma_migrated daemon
> per node. Pages pending for migration are queued in a matrix of
> lists. Each knuma_migrated (in parallel with each other) goes over
> those lists and migrates the pages queued for migration in round robin
> from each incoming node to the node where knuma_migrated is running
> on.
> 
> The fourth gear is the NUMA scheduler balancing code. That computes
> the statistical information collected in mm->mm_autonuma and
> p->sched_autonuma and evaluates the status of all CPUs to decide if
> tasks should be migrated to CPUs in remote nodes. 

IOW:

"knuma_scand 'unmaps' ptes and collects mm stats, this triggers
numa_hinting pagefaults, using these we collect per task stats.

knuma_migrated migrates pages to their destination node. Something
queues pages.

The numa scheduling code uses the gathered stats to place tasks."


That covers just about all you said, now the interesting bits are still
missing:

 - how do you do false sharing;

 - what stats do you gather, and how are they used at each stage;

 - what's your balance goal, and how is that expressed and 
   converged upon.

Also, what I've not seen anywhere are scheduling stats, what if, despite
you giving a hint a particular process should run on a particular node
it doesn't and sticks to where its at (granted with strict this can't
happen -- but it should).


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/