linux-kernel - Re: [PATCH 0/2] Fix memcg/memory.high in case kmem accounting is enabled

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.11.1509021307280.14827@east.gentwo.org>
Date:	Wed, 2 Sep 2015 13:16:47 -0500 (CDT)
From:	Christoph Lameter <cl@...ux.com>
To:	Vladimir Davydov <vdavydov@...allels.com>
cc:	Michal Hocko <mhocko@...nel.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Pekka Enberg <penberg@...nel.org>,
	David Rientjes <rientjes@...gle.com>,
	Joonsoo Kim <iamjoonsoo.kim@....com>,
	Tejun Heo <tj@...nel.org>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2] Fix memcg/memory.high in case kmem accounting is
 enabled

On Wed, 2 Sep 2015, Vladimir Davydov wrote:

> Slab is a kind of abnormal alloc_pages user. By calling alloc_pages_node
> with __GFP_THISNODE and w/o __GFP_WAIT before falling back to
> alloc_pages with the caller's context, it does the job normally done by
> alloc_pages itself. It's not what is done massively.
>
> Leaving slab charge path as is looks really ugly to me. Look, slab
> iterates over all nodes, inspecting if they have free pages and fails
> even if they do due to the memcg constraint...

Well yes it needs to do that due to the way NUMA support was designed in.
SLAB needs to check the per node caches if objects are present before
going to more remote nodes. Sorry about this. I realized the design issue
in 2006 and SLUB was the result in 2007 of an alternate design to let the
page allocator do its proper job.

> To sum it up. Basically, there are two ways of handling kmemcg charges:
>
>  1. Make the memcg try_charge mimic alloc_pages behavior.
>  2. Make API functions (kmalloc, etc) work in memcg as if they were
>     called from the root cgroup, while keeping interactions between the
>     low level subsys (slab) and memcg private.
>
> Way 1 might look appealing at the first glance, but at the same time it
> is much more complex, because alloc_pages has grown over the years to
> handle a lot of subtle situations that may arise on global memory
> pressure, but impossible in memcg. What does way 1 give us then? We
> can't insert try_charge directly to alloc_pages and have to spread its
> calls all over the code anyway, so why is it better? Easier to use it in
> places where users depend on buddy allocator peculiarities? There are
> not many such users.

Would it be possible to have a special alloc_pages_memcg with different
semantics?

On the other hand alloc_pages() has grown to handle all the special cases.
Why cant it also handle the special memcg case? There are numerous other
allocators that cache memory in the kernel from networking to
the bizarre compressed swap approaches. How does memcg handle that? Isnt
that situation similar to what the slab allocators do?

> exists solely for memcg-vs-list_lru and memcg-vs-slab interactions. We
> even handle kmem_cache destruction on memcg offline differently for SLAB
> and SLUB for performance reasons.

Ugly. Internal allocator design impacts container handling.

> Way 2 gives us more space to maneuver IMO. SLAB/SLUB may do weird tricks
> for optimization, but their API is well defined, so we just make kmalloc
> work as expected while providing inter-subsys calls, like
> memcg_charge_slab, for SLAB/SLUB that have their own conventions. You
> mentioned kmem users that allocate memory using alloc_pages. There is an
> API function for them too, alloc_kmem_pages. Everything behind the API
> is hidden and may be done in such a way to achieve optimal performance.

Can we also hide cgroups memory handling behind the page based schemes
without having extra handling for the slab allocators?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/