[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20121113072441.GA21386@gmail.com>
Date: Tue, 13 Nov 2012 08:24:41 +0100
From: Ingo Molnar <mingo@...nel.org>
To: Christoph Lameter <cl@...ux.com>
Cc: Peter Zijlstra <a.p.zijlstra@...llo.nl>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
Paul Turner <pjt@...gle.com>,
Lee Schermerhorn <Lee.Schermerhorn@...com>,
Rik van Riel <riel@...hat.com>, Mel Gorman <mgorman@...e.de>,
Andrew Morton <akpm@...ux-foundation.org>,
Andrea Arcangeli <aarcange@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH 0/8] Announcement: Enhanced NUMA scheduling with adaptive
affinity
* Christoph Lameter <cl@...ux.com> wrote:
> On Mon, 12 Nov 2012, Peter Zijlstra wrote:
>
> > The biggest conceptual addition, beyond the elimination of
> > the home node, is that the scheduler is now able to
> > recognize 'private' versus 'shared' pages, by carefully
> > analyzing the pattern of how CPUs touch the working set
> > pages. The scheduler automatically recognizes tasks that
> > share memory with each other (and make dominant use of that
> > memory) - versus tasks that allocate and use their working
> > set privately.
>
> That is a key distinction to make and if this really works
> then that is major progress.
I posted updated benchmark results yesterday, and the approach
is indeed a performance breakthrough:
http://lkml.org/lkml/2012/11/12/330
It also made the code more generic and more maintainable from a
scheduler POV.
> > This new scheduler code is then able to group tasks that are
> > "memory related" via their memory access patterns together:
> > in the NUMA context moving them on the same node if
> > possible, and spreading them amongst nodes if they use
> > private memory.
>
> What happens if processes memory accesses are related but the
> common set of data does not fit into the memory provided by a
> single node?
The other (very common) node-overload case is that there are
more tasks for a shared piece of memory than fits on a single
node.
I have measured two such workloads, one is the Java SPEC
benchmark:
v3.7-vanilla: 494828 transactions/sec
v3.7-NUMA: 627228 transactions/sec [ +26.7% ]
the other is the 'numa01' testcase of autonumabench:
v3.7-vanilla: 340.3 seconds
v3.7-NUMA: 216.9 seconds [ +56% ]
> The correct resolution usually is in that case to interleasve
> the pages over both nodes in use.
I'd not go as far as to claim that to be a general rule: the
correct placement depends on the system and workload specifics:
how much memory is on each node, how many tasks run on each
node, and whether the access patterns and working set of the
tasks is symmetric amongst each other - which is not a given at
all.
Say consider a database server that executes small and large
queries over a large, memory-shared database, and has worker
tasks to clients, to serve each query. Depending on the nature
of the queries, interleaving can easily be the wrong thing to
do.
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists