lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 13 Nov 2007 13:52:17 -0800 (PST)
From:	Christoph Lameter <clameter@....com>
To:	Jörn Engel <joern@...fs.org>
cc:	Ray Lee <ray-lk@...rabbit.org>, akpm@...ux-foundation.org,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	Mel Gorman <mel@....ul.ie>, Andi Kleen <ak@...e.de>,
	Andy Whitcroft <apw@...dowen.org>
Subject: Re: x86_64: Make sparsemem/vmemmap the default memory model

On Tue, 13 Nov 2007, Jörn Engel wrote:

> On Mon, 12 November 2007 20:41:10 -0800, Christoph Lameter wrote:
> > On Mon, 12 Nov 2007, Ray Lee wrote:
> > 
> > > Discontig obviously needs to die. However, FlatMem is consistently
> > > faster, averaging about 2.1% better overall for your numbers above. Is
> > > the page allocator not, erm, a fast path, where that matters?
> > > 
> > > Order	Flat	Sparse	% diff
> > > 0	639	641	0.3
> > 
> > IMHO Order 0 currently matters most and the difference is negligible 
> > there.
> 
> Is it?  I am a bit concerned about the non-monotonic distribution.
> Difference starts a near-0, grows to 4.4, drops to near-0, grows to 4.9,
> drops to near-0.

The problem also is that the comparison here is between a SMP config for 
flatmem vs a NUMA config for sparsemem. There is additional overhead in 
the NUMA config. 

The effect may also be due to the system being able to place 
some pages in the same 2MB section as the memmap with flatmem. However, 
that is only feasable immeidately after bootup. In regular operations this 
should vanish.

Could you run your own test to verify?

> Is there an explanation for this behaviour?  More to the point, could
> repeated runs also return 4% difference for order-0?

I hope I have given some above. The number of the page allocator suggests 
that we have far too much fat in the allocation paths. IMHO reasonable 
numbers for an order-0 alloc should be ~100 cycles.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ