lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200902051459.30064.nickpiggin@yahoo.com.au>
Date:	Thu, 5 Feb 2009 14:59:29 +1100
From:	Nick Piggin <nickpiggin@...oo.com.au>
To:	Mel Gorman <mel@....ul.ie>
Cc:	Pekka Enberg <penberg@...helsinki.fi>,
	Nick Piggin <npiggin@...e.de>,
	Linux Memory Management List <linux-mm@...ck.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Lin Ming <ming.m.lin@...el.com>,
	"Zhang, Yanmin" <yanmin_zhang@...ux.intel.com>,
	Christoph Lameter <cl@...ux-foundation.org>
Subject: Re: [patch] SLQB slab allocator (try 2)

On Thursday 05 February 2009 02:27:10 Mel Gorman wrote:
> On Wed, Feb 04, 2009 at 05:48:40PM +1100, Nick Piggin wrote:

> > It couldn't hurt, but it's usually tricky to read anything out of these
> > from CPU cycle profiles. Especially if they are due to cache or tlb
> > effects (which tend to just get spread out all over the profile).
>
> Indeed. To date, I've used them for comparing relative counts of things
> like TLB and cache misses on the basis "relatively more misses running test
> X is bad" or working out things like tlb-misses-per-instructions but it's a
> bit vague. We might notice if one of the allocators is being particularly
> cache unfriendly due to a spike in cache misses.

Very true. Total counts of TLB and cache misses could show some insight.


> PPC64 Test Machine
> Sysbench-Postgres
> -----------------
> Client           slab  slub-default  slub-minorder            slqb
>      1         1.0000        1.0153         1.0179          1.0051
>      2         1.0000        1.0273         1.0181          1.0269
>      3         1.0000        1.0299         1.0195          1.0234
>      4         1.0000        1.0159         1.0130          1.0146
>      5         1.0000        1.0232         1.0192          1.0264
>      6         1.0000        1.0238         1.0142          1.0088
>      7         1.0000        1.0240         1.0063          1.0076
>      8         1.0000        1.0134         0.9842          1.0024
>      9         1.0000        1.0154         1.0152          1.0077
>     10         1.0000        1.0126         1.0018          1.0009
>     11         1.0000        1.0100         0.9971          0.9933
>     12         1.0000        1.0112         0.9985          0.9993
>     13         1.0000        1.0131         1.0060          1.0035
>     14         1.0000        1.0237         1.0074          1.0071
>     15         1.0000        1.0098         0.9997          0.9997
>     16         1.0000        1.0110         0.9899          0.9994
> Geo. mean      1.0000        1.0175         1.0067          1.0078
>
> The order SLUB uses does not make much of a difference to SPEC CPU on
> either test machine or sysbench on x86-64. Howeer, on the ppc64 machine,
> the performance advantage SLUB has over SLAB appears to be eliminated if
> high-order pages are not used. I think I might run SLUB again incase the
> higher average performance was a co-incidence due to lucky cache layout.
> Otherwise, Christoph can probably put together a plausible theory on this
> result faster than I can.

It's interesting, thanks. It's a good result for SLQB I guess. 1% is fairly
large here (if it is statistically significant), but I don't think the
drawbacks of using higher order pages warrant changing anything by default
in SLQB. It does encourage me to add a boot or runtime parameter, though
(even if just for testing purposes).


> On the TLB front, it is perfectly possible that the workloads on x86-64 are
> not allocator or memory intensive enough to take advantage of fewer calls
> to the page allocator or potentially reduced TLB pressure. As the kernel
> portion of the address space already uses huge pages slab objects may have
> to occupy a very large percentage of memory before TLB pressure became an
> issue. The L1 TLBs on both test machines are fully associative making
> testing reduced TLB pressure practically impossible. For bonus points, 1G
> pages are being used on the x86-64 so I have nowhere near enough memory to
> put that under TLB pressure.

TLB pressure... I would be interested in. I'm not exactly sold on the idea
that higher order allocations will give a significant TLB improvement.
Although for benchmark runs, maybe it is more likely (ie. if memory hasn't
been too fragmented).

Suppose you have a million slab objects scattered all over memory, the fact
you might have them clumped into 64K regions rather than 4K regions... is
it going to be significant? How many access patterns are likely to soon touch
exactly those objects that are in the same page?

Sure it is possible to come up with a scenario where it does help. But also
others where it will not.

OTOH, if it is a win on ppc but not x86-64, then that may point to TLB...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ