linux-kernel - Re: [UnifiedV4 00/16] The Unified slab allocator (V4)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20101006123753.GA17674@localhost>
Date:	Wed, 6 Oct 2010 20:37:53 +0800
From:	Wu Fengguang <fengguang.wu@...el.com>
To:	Pekka Enberg <penberg@...nel.org>
Cc:	Christoph Lameter <cl@...ux.com>,
	Pekka Enberg <penberg@...helsinki.fi>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, David Rientjes <rientjes@...gle.com>,
	Mel Gorman <mel@....ul.ie>, npiggin@...nel.dk,
	yanmin_zhang@...ux.intel.com, "Shi, Alex" <alex.shi@...el.com>
Subject: Re: [UnifiedV4 00/16] The Unified slab allocator (V4)

[add CC to Alex: he is now in charge of kernel performance tests]

On Wed, Oct 06, 2010 at 11:01:35AM +0300, Pekka Enberg wrote:
> (Adding more people who've taken interest in slab performance in the
> past to CC.)
> 
> On Tue, Oct 5, 2010 at 9:57 PM, Christoph Lameter <cl@...ux.com> wrote:
> > V3->V4:
> > - Lots of debugging
> > - Performance optimizations (more would be good)...
> > - Drop per slab locking in favor of per node locking for
> >  partial lists (queuing implies freeing large amounts of objects
> >  to per node lists of slab).
> > - Implement object expiration via reclaim VM logic.
> >
> > The following is a release of an allocator based on SLAB
> > and SLUB that integrates the best approaches from both allocators. The
> > per cpu queuing is like in SLAB whereas much of the infrastructure
> > comes from SLUB.
> >
> > After this patches SLUB will track the cpu cache contents
> > like SLAB attemped to. There are a number of architectural differences:
> >
> > 1. SLUB accurately tracks cpu caches instead of assuming that there
> >   is only a single cpu cache per node or system.
> >
> > 2. SLUB object expiration is tied into the page reclaim logic. There
> >   is no periodic cache expiration.
> >
> > 3. SLUB caches are dynamically configurable via the sysfs filesystem.
> >
> > 4. There is no per slab page metadata structure to maintain (aside
> >   from the object bitmap that usually fits into the page struct).
> >
> > 5. Has all the resiliency and diagnostic features of SLUB.
> >
> > The unified allocator is a merging of SLUB with some queuing concepts from
> > SLAB and a new way of managing objects in the slabs using bitmaps. Memory
> > wise this is slightly more inefficient than SLUB (due to the need to place
> > large bitmaps --sized a few words--in some slab pages if there are more
> > than BITS_PER_LONG objects in a slab) but in general does not increase space
> > use too much.
> >
> > The SLAB scheme of not touching the object during management is adopted.
> > The unified allocator can efficiently free and allocate cache cold objects
> > without causing cache misses.
> >
> > Some numbers using tcp_rr on localhost
> >
> >
> > Dell R910 128G RAM, 64 processors, 4 NUMA nodes
> >
> > threads unified         slub            slab
> > 64      4141798         3729037         3884939
> > 128     4146587         3890993         4105276
> > 192     4003063         3876570         4110971
> > 256     3928857         3942806         4099249
> > 320     3922623         3969042         4093283
> > 384     3827603         4002833         4108420
> > 448     4140345         4027251         4118534
> > 512     4163741         4050130         4122644
> > 576     4175666         4099934         4149355
> > 640     4190332         4142570         4175618
> > 704     4198779         4173177         4193657
> > 768     4662216         4200462         4222686
> 
> Are there any stability problems left? Have you tried other benchmarks
> (e.g. hackbench, sysbench)? Can we merge the series in smaller
> batches? For example, if we leave out the NUMA parts in the first
> stage, do we expect to see performance regressions?
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@...ck.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/