linux-kernel - Re: [PATCH v3 3/4] limit nr

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20110816021154.GI26978@dastard>
Date:	Tue, 16 Aug 2011 12:11:54 +1000
From:	Dave Chinner <david@...morbit.com>
To:	Pekka Enberg <penberg@...nel.org>
Cc:	Pavel Emelyanov <xemul@...allels.com>,
	Glauber Costa <glommer@...allels.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	"containers@...ts.linux-foundation.org" 
	<containers@...ts.linux-foundation.org>,
	Al Viro <viro@...iv.linux.org.uk>,
	Hugh Dickins <hughd@...gle.com>,
	Nick Piggin <npiggin@...nel.dk>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Rik van Riel <riel@...hat.com>,
	Dave Hansen <dave@...ux.vnet.ibm.com>,
	James Bottomley <jbottomley@...allels.com>,
	Eric Dumazet <eric.dumazet@...il.com>,
	Christoph Lameter <cl@...ux.com>,
	David Rientjes <rientjes@...gle.com>
Subject: Re: [PATCH v3 3/4] limit nr_dentries per superblock

On Mon, Aug 15, 2011 at 02:14:39PM +0300, Pekka Enberg wrote:
> Hi Pavel,
> 
> On Mon, Aug 15, 2011 at 2:05 PM, Pavel Emelyanov <xemul@...allels.com> wrote:
> > This will make sense, since the kernel memory management per-cgroup is one of the
> > things we'd live to have, but this particular idea will definitely not work in case
> > we keep the containers' files on one partition keeping each container in its own
> > chroot environment.
> 
> And you want a per-container dcache limit? Will the containers share
> the same superblock?

Yes, and that's one of the problems with the "arbitrary container"
approach to controlling the dentry cache size. Arbitrary containers
don't map easily to predictable and scalable LRU and reclaim
implementations. Hence right now the container scope is limited to
per-superblock.

> Couldn't you simply do per-container "struct
> kmem_accounted_cache" in struct superblock?

Probably could do it that way, but it's still not really and
integrated solution. What we'll end up with is this LRU structure:

struct lru_node {
	struct list_head		lru;
	spinlock_t			lock;
	long				nr_items;
} ____cacheline_aligned_in_smp;

struct lru {
	struct kmem_accounted_cache	*cache;
	struct lru_node			lru_node[MAX_NUMNODES];
	nodemask_t			active_nodes;
	int (*isolate_item)(struct list_head *item);
	int (*dispose)(struct list_head *list);
}

Where the only thing that the lru->cache is used for is getting the
number of items allocated to the cache. Seems relatively pointless
to make that statistic abstraction for just a single value that we
can get via a simple per-cpu counter...

Then, when you consider SLUB has this structure for every individual
slab cache:

struct kmem_cache_node {
        spinlock_t list_lock;   /* Protect partial list and nr_partial */
        unsigned long nr_partial;
        struct list_head partial;
#ifdef CONFIG_SLUB_DEBUG
        atomic_long_t nr_slabs;
        atomic_long_t total_objects;
        struct list_head full;
#endif
};

you can see why tight integration of the per-node LRU infrastructure
is appealing - there's no unnecessary duplication and the accounting
is done in the right spot. It also means there is only one shrinker
implmentation for all slabs, with a couple of simple per-slab
callbacks for isolating objects for disposal and then to dispose of
them. This would mean that most slab caches that use shrinkers would
no longer need to implement their own LRU, get LRU scalability and
node-aware reclaim for free, have built in size limits, etc.

And FWIW, integrating the LRU shrinker mechanism into the slab cache
also provides the mechanisms needed for capping the size of the
cache as well as slab defragmentation.  Much smarter things can be
done when you know both the age and the locality of objects. e.g.
there's no point preventing allocation from a slab due to maximum
object count limitations if there are partial pages in the slab
cache because the allocation can be done without increasing memory
footprint.....

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/