[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20121008163424.335ea7ec@annuminas.surriel.com>
Date: Mon, 8 Oct 2012 16:34:24 -0400
From: Rik van Riel <riel@...hat.com>
To: Andi Kleen <andi@...stfloor.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Andrea Arcangeli <aarcange@...hat.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
Peter Zijlstra <pzijlstr@...hat.com>,
Ingo Molnar <mingo@...e.hu>, Mel Gorman <mel@....ul.ie>,
Hugh Dickins <hughd@...gle.com>,
Johannes Weiner <hannes@...xchg.org>,
Hillf Danton <dhillf@...il.com>,
Andrew Jones <drjones@...hat.com>,
Dan Smith <danms@...ibm.com>,
Thomas Gleixner <tglx@...utronix.de>,
Paul Turner <pjt@...gle.com>, Christoph Lameter <cl@...ux.com>,
Suresh Siddha <suresh.b.siddha@...el.com>,
Mike Galbraith <efault@....de>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Lai Jiangshan <laijs@...fujitsu.com>,
Bharata B Rao <bharata.rao@...il.com>,
Lee Schermerhorn <Lee.Schermerhorn@...com>,
Srivatsa Vaddagiri <vatsa@...ux.vnet.ibm.com>,
Alex Shi <alex.shi@...el.com>,
Mauricio Faria de Oliveira <mauricfo@...ux.vnet.ibm.com>,
Konrad@...ux.intel.com, dshaks@...hat.com
Subject: Re: [PATCH 00/33] AutoNUMA27
On Fri, 05 Oct 2012 16:14:44 -0700
Andi Kleen <andi@...stfloor.org> wrote:
> IMHO needs a performance shot-out. Run both on the same 10 workloads
> and see who wins. Just a lot of of work. Any volunteers?
Here are some preliminary results from simple benchmarks on a
4-node, 32 CPU core (4x8 core) Dell PowerEdge R910 system.
For the simple linpack streams benchmark, both sched/numa and
autonuma are within the margin of error compared to manual
tuning of task affinity. This is a big win, since the current
upstream scheduler has regressions of 10-20% when the system
runs 4 through 16 streams processes.
For specjbb, the story is more complicated. After fixing the
obvious bugs in sched/numa, and getting some basic cpu-follows-memory
code (not yet in -tip AFAIK), Larry, Peter and I, averaged results
look like this:
baseline: 246019
manual pinning: 285481 (+16%)
autonuma: 266626 (+8%)
sched/numa: 226540 (-8%)
This is with newer sched/numa code than what is in -tip right now.
Once Peter pushes the fixes by Larry and me into -tip, as well as
his cpu-follows-memory code, others should be able to run tests
like this as well.
Now for some other workloads, and tests on 8 node systems, etc...
Full results for the specjbb run below:
BASELINE - disabling auto numa (matches RHEL6 within 1%)
[root@...f74 SPECjbb]# cat r7_36_auto27_specjbb4_noauto.txt
spec1.txt: throughput = 243639.70 SPECjbb2005 bops
spec2.txt: throughput = 249186.20 SPECjbb2005 bops
spec3.txt: throughput = 247216.72 SPECjbb2005 bops
spec4.txt: throughput = 244035.60 SPECjbb2005 bops
Manual NUMACTL results are:
[root@...f74 SPECjbb]# more r7_36_numactl_specjbb4.txt
spec1.txt: throughput = 291430.22 SPECjbb2005 bops
spec2.txt: throughput = 283550.85 SPECjbb2005 bops
spec3.txt: throughput = 284028.71 SPECjbb2005 bops
spec4.txt: throughput = 282919.37 SPECjbb2005 bops
AUTONUMA27 - 3.6.0-0.24.autonuma27.test.x86_64
[root@...f74 SPECjbb]# more r7_36_auto27_specjbb4.txt
spec1.txt: throughput = 261835.01 SPECjbb2005 bops
spec2.txt: throughput = 269053.06 SPECjbb2005 bops
spec3.txt: throughput = 261230.50 SPECjbb2005 bops
spec3.txt: throughput = 274386.81 SPECjbb2005 bops
Tuned SCHED_NUMA from Friday 10/4/2012 with fixes from Peter, Rik and
Larry:
[root@...f74 SPECjbb]# more r7_36_schednuma_specjbb4.txt
spec1.txt: throughput = 222349.74 SPECjbb2005 bops
spec2.txt: throughput = 232988.59 SPECjbb2005 bops
spec3.txt: throughput = 223386.03 SPECjbb2005 bops
spec4.txt: throughput = 227438.11 SPECjbb2005 bops
--
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists