linux-kernel - Re: [PATCH 0/2] Fix memcg/memory.high in case kmem accounting is enabled

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150903163243.GD10394@mtj.duckdns.org>
Date:	Thu, 3 Sep 2015 12:32:43 -0400
From:	Tejun Heo <tj@...nel.org>
To:	Vladimir Davydov <vdavydov@...allels.com>
Cc:	Michal Hocko <mhocko@...nel.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Christoph Lameter <cl@...ux.com>,
	Pekka Enberg <penberg@...nel.org>,
	David Rientjes <rientjes@...gle.com>,
	Joonsoo Kim <iamjoonsoo.kim@....com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2] Fix memcg/memory.high in case kmem accounting is
 enabled

Hello, Vladimir.

On Wed, Sep 02, 2015 at 12:30:39PM +0300, Vladimir Davydov wrote:
...
> To sum it up. Basically, there are two ways of handling kmemcg charges:
> 
>  1. Make the memcg try_charge mimic alloc_pages behavior.
>  2. Make API functions (kmalloc, etc) work in memcg as if they were
>     called from the root cgroup, while keeping interactions between the
>     low level subsys (slab) and memcg private.
> 
> Way 1 might look appealing at the first glance, but at the same time it
> is much more complex, because alloc_pages has grown over the years to
> handle a lot of subtle situations that may arise on global memory
> pressure, but impossible in memcg. What does way 1 give us then? We
> can't insert try_charge directly to alloc_pages and have to spread its
> calls all over the code anyway, so why is it better? Easier to use it in
> places where users depend on buddy allocator peculiarities? There are
> not many such users.

Maybe this is from inexperience but wouldn't 1 also be simpler than
the global case for the same reasons that doing 2 is simpler?  It's
not like the fact that memory shortage inside memcg usually doesn't
mean global shortage goes away depending on whether we take 1 or 2.

That said, it is true that slab is an integral part of kmemcg and I
can't see how it can be made oblivious of memcg operations, so yeah
one way or the other slab has to know the details and we may have to
do some unusual things at that layer.

> I understand that the idea of way 1 is to provide a well-defined memcg
> API independent of the rest of the code, but that's just impossible. You
> need special casing anyway. E.g. you need those get/put_kmem_cache
> helpers, which exist solely for SLAB/SLUB. You need all this special
> stuff for growing per-memcg array in list_lru and kmem_cache, which
> exists solely for memcg-vs-list_lru and memcg-vs-slab interactions. We
> even handle kmem_cache destruction on memcg offline differently for SLAB
> and SLUB for performance reasons.

It isn't a black or white thing.  Sure, slab should be involved in
kmemcg but at the same time if we can keep the amount of exposure in
check, that's the better way to go.

> Way 2 gives us more space to maneuver IMO. SLAB/SLUB may do weird tricks
> for optimization, but their API is well defined, so we just make kmalloc
> work as expected while providing inter-subsys calls, like
> memcg_charge_slab, for SLAB/SLUB that have their own conventions. You
> mentioned kmem users that allocate memory using alloc_pages. There is an
> API function for them too, alloc_kmem_pages. Everything behind the API
> is hidden and may be done in such a way to achieve optimal performance.

Ditto.  Nobody is arguing that we can get it out completely but at the
same time handling of GFP_NOWAIT seems like a pretty fundamental
proprety that we'd wanna maintain at memcg boundary.

You said elsewhere that GFP_NOWAIT happening back-to-back is unlikely.
I'm not sure how much we can commit to that statement.  GFP_KERNEL
allocating huge amount of memory in a single go is a kernel bug.
GFP_NOWAIT optimization in a hot path which is accessible to userland
isn't and we'll be growing more and more of them.  We need to be
protected against back-to-back GFP_NOWAIT allocations.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/