lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190521011525.GA25898@eros.localdomain>
Date:   Tue, 21 May 2019 11:15:25 +1000
From:   "Tobin C. Harding" <me@...in.cc>
To:     Roman Gushchin <guro@...com>
Cc:     "Tobin C. Harding" <tobin@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Matthew Wilcox <willy@...radead.org>,
        Alexander Viro <viro@....linux.org.uk>,
        Christoph Hellwig <hch@...radead.org>,
        Pekka Enberg <penberg@...helsinki.fi>,
        David Rientjes <rientjes@...gle.com>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Christopher Lameter <cl@...ux.com>,
        Miklos Szeredi <mszeredi@...hat.com>,
        Andreas Dilger <adilger@...ger.ca>,
        Waiman Long <longman@...hat.com>,
        Tycho Andersen <tycho@...ho.ws>, Theodore Ts'o <tytso@....edu>,
        Andi Kleen <ak@...ux.intel.com>,
        David Chinner <david@...morbit.com>,
        Nick Piggin <npiggin@...il.com>,
        Rik van Riel <riel@...hat.com>,
        Hugh Dickins <hughd@...gle.com>,
        Jonathan Corbet <corbet@....net>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH v5 04/16] slub: Slab defrag core

On Tue, May 21, 2019 at 12:51:57AM +0000, Roman Gushchin wrote:
> On Mon, May 20, 2019 at 03:40:05PM +1000, Tobin C. Harding wrote:
> > Internal fragmentation can occur within pages used by the slub
> > allocator.  Under some workloads large numbers of pages can be used by
> > partial slab pages.  This under-utilisation is bad simply because it
> > wastes memory but also because if the system is under memory pressure
> > higher order allocations may become difficult to satisfy.  If we can
> > defrag slab caches we can alleviate these problems.
> > 
> > Implement Slab Movable Objects in order to defragment slab caches.
> > 
> > Slab defragmentation may occur:
> > 
> > 1. Unconditionally when __kmem_cache_shrink() is called on a slab cache
> >    by the kernel calling kmem_cache_shrink().
> > 
> > 2. Unconditionally through the use of the slabinfo command.
> > 
> > 	slabinfo <cache> -s
> > 
> > 3. Conditionally via the use of kmem_cache_defrag()
> > 
> > - Use Slab Movable Objects when shrinking cache.
> > 
> > Currently when the kernel calls kmem_cache_shrink() we curate the
> > partial slabs list.  If object migration is not enabled for the cache we
> > still do this, if however, SMO is enabled we attempt to move objects in
> > partially full slabs in order to defragment the cache.  Shrink attempts
> > to move all objects in order to reduce the cache to a single partial
> > slab for each node.
> > 
> > - Add conditional per node defrag via new function:
> > 
> > 	kmem_defrag_slabs(int node).
> > 
> > kmem_defrag_slabs() attempts to defragment all slab caches for
> > node. Defragmentation is done conditionally dependent on MAX_PARTIAL
> > _and_ defrag_used_ratio.
> > 
> >    Caches are only considered for defragmentation if the number of
> >    partial slabs exceeds MAX_PARTIAL (per node).
> > 
> >    Also, defragmentation only occurs if the usage ratio of the slab is
> >    lower than the configured percentage (sysfs field added in this
> >    patch).  Fragmentation ratios are measured by calculating the
> >    percentage of objects in use compared to the total number of objects
> >    that the slab page can accommodate.
> > 
> >    The scanning of slab caches is optimized because the defragmentable
> >    slabs come first on the list. Thus we can terminate scans on the
> >    first slab encountered that does not support defragmentation.
> > 
> >    kmem_defrag_slabs() takes a node parameter. This can either be -1 if
> >    defragmentation should be performed on all nodes, or a node number.
> > 
> >    Defragmentation may be disabled by setting defrag ratio to 0
> > 
> > 	echo 0 > /sys/kernel/slab/<cache>/defrag_used_ratio
> > 
> > - Add a defrag ratio sysfs field and set it to 30% by default. A limit
> > of 30% specifies that more than 3 out of 10 available slots for objects
> > need to be in use otherwise slab defragmentation will be attempted on
> > the remaining objects.
> > 
> > In order for a cache to be defragmentable the cache must support object
> > migration (SMO).  Enabling SMO for a cache is done via a call to the
> > recently added function:
> > 
> > 	void kmem_cache_setup_mobility(struct kmem_cache *,
> > 				       kmem_cache_isolate_func,
> > 			               kmem_cache_migrate_func);
> > 
> > Co-developed-by: Christoph Lameter <cl@...ux.com>
> > Signed-off-by: Tobin C. Harding <tobin@...nel.org>
> > ---
> >  Documentation/ABI/testing/sysfs-kernel-slab |  14 +
> >  include/linux/slab.h                        |   1 +
> >  include/linux/slub_def.h                    |   7 +
> >  mm/slub.c                                   | 385 ++++++++++++++++----
> >  4 files changed, 334 insertions(+), 73 deletions(-)
> 
> Hi Tobin!
> 
> Overall looks very good to me! I'll take another look when you'll post
> a non-RFC version, but so far I can't find any issues.

Thanks for the reviews.

> A generic question: as I understand, you do support only root kmemcaches now.
> Is kmemcg support in plans?

I know very little about cgroups, I have no plans for this work.
However, I'm not the architect behind this - Christoph is guiding the
direction on this one.  Perhaps he will comment.

> Without it the patchset isn't as attractive to anyone using cgroups,
> as it could be. Also, I hope it can solve (or mitigate) the memcg-specific
> problem of scattering vfs cache workingset over multiple generations of the
> same cgroup (their kmem_caches).

I'm keen to work on anything that makes this more useful so I'll do some
research.  Thanks for the idea.

Regards,
Tobin.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ