[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20121014045716.GE11663@redhat.com>
Date: Sun, 14 Oct 2012 06:57:16 +0200
From: Andrea Arcangeli <aarcange@...hat.com>
To: Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
torvalds@...ux-foundation.org, akpm@...ux-foundation.org,
pzijlstr@...hat.com, mingo@...e.hu, mel@....ul.ie,
hughd@...gle.com, riel@...hat.com, hannes@...xchg.org,
dhillf@...il.com, drjones@...hat.com, tglx@...utronix.de,
pjt@...gle.com, cl@...ux.com, suresh.b.siddha@...el.com,
efault@....de, paulmck@...ux.vnet.ibm.com, alex.shi@...el.com,
konrad.wilk@...cle.com, benh@...nel.crashing.org
Subject: Re: [PATCH 00/33] AutoNUMA27
Hi Srikar,
On Sun, Oct 14, 2012 at 12:10:19AM +0530, Srikar Dronamraju wrote:
> * Andrea Arcangeli <aarcange@...hat.com> [2012-10-04 01:50:42]:
>
> > Hello everyone,
> >
> > This is a new AutoNUMA27 release for Linux v3.6.
> >
>
>
> Here results of autonumabenchmark on a 328GB 64 core with ht disabled
> comparing v3.6 with autonuma27.
*snip*
> numa01: 1805.19 1907.11 1866.39 -3.88%
Interesting. So numa01 should be improved in autonuma28fast. Not sure
why the hard binds show any difference, but I'm more concerned in
optimizing numa01. I get the same results from hard bindings on
upstream or autonuma, strange.
Could you repeat only numa01 with the origin/autonuma28fast branch?
Also if you could post the two pdf convergence chart generated by
numa01 on autonuma27 and autonuma28fast, I think that would be
interesting to see the full effect and why it is faster.
I only had the time for a quick push after having the idea added in
autonuma28fast (which is yet improved compared to autonuma28), but
I've been told already that it's dealing with numa01 on the 8 node
very well as expected.
numa01 in the 8 node is a workload without a perfect solution (other
than MADV_INTERLEAVE). Full convergence preventing cross-node traffic
is impossible because there are 2 processes spanning over 8 nodes and
all process memory is touched by all threads constantly. Yet
autonuma28fast should deal optimally that scenario too.
As a side note: numa01 on the 2 node instead converges fully (2
processes + 2 nodes = full convergence). numa01 on 2 nodes or >2nodes
is a very different kind of test.
I'll release an autonuma29 behaving like 28fast if there are no
surprises. The new algorithm change in 28fast will also save memory
once I rewrite it properly.
Thanks!
Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists