linux-kernel - Re: numa/core regressions fixed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 20 Nov 2012 22:22:58 -0500
From:	Rik van Riel <riel@...hat.com>
To:	habanero@...ux.vnet.ibm.com
CC:	Ingo Molnar <mingo@...nel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	David Rientjes <rientjes@...gle.com>,
	Mel Gorman <mgorman@...e.de>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-mm <linux-mm@...ck.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Paul Turner <pjt@...gle.com>,
	Lee Schermerhorn <Lee.Schermerhorn@...com>,
	Christoph Lameter <cl@...ux.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Johannes Weiner <hannes@...xchg.org>,
	Hugh Dickins <hughd@...gle.com>
Subject: Re: numa/core regressions fixed - more testers wanted

On 11/20/2012 08:54 PM, Andrew Theurer wrote:

> I can confirm single JVM JBB is working well for me.  I see a 30%
> improvement over autoNUMA.  What I can't make sense of is some perf
> stats (taken at 80 warehouses on 4 x WST-EX, 512GB memory):

AutoNUMA does not have native THP migration, that may explain some
of the difference.

> tips numa/core:
>
>       5,429,632,865 node-loads
>       3,806,419,082 node-load-misses(70.1%)
>       2,486,756,884 node-stores
>       2,042,557,277 node-store-misses(82.1%)
>       2,878,655,372 node-prefetches
>       2,201,441,900 node-prefetch-misses
>
> autoNUMA:
>
>       4,538,975,144 node-loads
>       2,666,374,830 node-load-misses(58.7%)
>       2,148,950,354 node-stores
>       1,682,942,931 node-store-misses(78.3%)
>       2,191,139,475 node-prefetches
>       1,633,752,109 node-prefetch-misses
>
> The percentage of misses is higher for numa/core.  I would have expected
> the performance increase be due to lower "node-misses", but perhaps I am
> misinterpreting the perf data.

Lack of native THP migration may be enough to explain the
performance difference, despite autonuma having better node
locality.

>> Next I'll work on making multi-JVM more of an improvement, and
>> I'll also address any incoming regression reports.
>
> I have issues with multiple KVM VMs running either JBB or
> dbench-in-tmpfs, and I suspect whatever I am seeing is similar to
> whatever multi-jvm in baremetal is.  What I typically see is no real
> convergence of a single node for resource usage for any of the VMs.  For
> example, when running 8 VMs, 10 vCPUs each, a VM may have the following
> resource usage:

This is an issue.  I have tried understanding the new local/shared
and shared task grouping code, but have not wrapped my mind around
that code yet.

I will have to look at that code a few more times, and ask more
questions of Ingo and Peter (and maybe ask some of the same questions
again - I see that some of my comments were addressed in the next
version of the patch, but the email never got a reply).

-- 
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/