linux-kernel - Re: [PATCH v2 21/28] mm: memcg/slab: use a single set of kmem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200203205834.GA6781@xps.dhcp.thefacebook.com>
Date:   Mon, 3 Feb 2020 12:58:34 -0800
From:   Roman Gushchin <guro@...com>
To:     Johannes Weiner <hannes@...xchg.org>
CC:     <linux-mm@...ck.org>, Andrew Morton <akpm@...ux-foundation.org>,
        Michal Hocko <mhocko@...nel.org>,
        Shakeel Butt <shakeelb@...gle.com>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        <linux-kernel@...r.kernel.org>, <kernel-team@...com>,
        Bharata B Rao <bharata@...ux.ibm.com>,
        Yafang Shao <laoar.shao@...il.com>
Subject: Re: [PATCH v2 21/28] mm: memcg/slab: use a single set of kmem_caches
 for all memory cgroups

On Mon, Feb 03, 2020 at 02:50:48PM -0500, Johannes Weiner wrote:
> On Mon, Jan 27, 2020 at 09:34:46AM -0800, Roman Gushchin wrote:
> > This is fairly big but mostly red patch, which makes all non-root
> > slab allocations use a single set of kmem_caches instead of
> > creating a separate set for each memory cgroup.
> > 
> > Because the number of non-root kmem_caches is now capped by the number
> > of root kmem_caches, there is no need to shrink or destroy them
> > prematurely. They can be perfectly destroyed together with their
> > root counterparts. This allows to dramatically simplify the
> > management of non-root kmem_caches and delete a ton of code.
> 
> This is definitely going in the right direction. But it doesn't quite
> explain why we still need two sets of kmem_caches?
> 
> In the old scheme, we had completely separate per-cgroup caches with
> separate slab pages. If a cgrouped process wanted to allocate a slab
> object, we'd go to the root cache and used the cgroup id to look up
> the right cgroup cache. On slab free we'd use page->slab_cache.
> 
> Now we have slab pages that have a page->objcg array. Why can't all
> allocations go through a single set of kmem caches? If an allocation
> is coming from a cgroup and the slab page the allocator wants to use
> doesn't have an objcg array yet, we can allocate it on the fly, no?

Well, arguably it can be done, but there are few drawbacks:

1) On the release path you'll need to make some extra work even for
   root allocations: calculate the offset only to find the NULL objcg pointer.

2) There will be a memory overhead for root allocations
   (which might or might not be compensated by the increase
   of the slab utilization).

3) I'm working on percpu memory accounting that resembles the same scheme,
   except that obj_cgroups vector is created for the whole percpu block.
   There will be root- and memcg-blocks, and it will be expensive to merge them.
   I kinda like using the same scheme here and there.

Upsides?

1) slab utilization might increase a little bit (but I doubt it will have
   a huge effect, because both merging sets should be relatively big and well
   utilized)
2) eliminate memcg kmem_cache dynamic creation/destruction. it's nice,
   but there isn't so much code left anyway.

So IMO it's an interesting direction to explore, but not something
that necessarily has to be done in the context of this patchset.