linux-kernel - Re: [PATCH v2 1/3] mm: rework memcg kernel stack accounting

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180829212422.GA13097@castle>
Date:   Wed, 29 Aug 2018 14:24:25 -0700
From:   Roman Gushchin <guro@...com>
To:     Shakeel Butt <shakeelb@...gle.com>
CC:     Linux MM <linux-mm@...ck.org>, LKML <linux-kernel@...r.kernel.org>,
        <kernel-team@...com>, Johannes Weiner <hannes@...xchg.org>,
        Michal Hocko <mhocko@...nel.org>, <luto@...nel.org>,
        Konstantin Khlebnikov <koct9i@...il.com>,
        Tejun Heo <tj@...nel.org>
Subject: Re: [PATCH v2 1/3] mm: rework memcg kernel stack accounting

On Tue, Aug 21, 2018 at 03:10:52PM -0700, Shakeel Butt wrote:
> On Tue, Aug 21, 2018 at 2:36 PM Roman Gushchin <guro@...com> wrote:
> >
> > If CONFIG_VMAP_STACK is set, kernel stacks are allocated
> > using __vmalloc_node_range() with __GFP_ACCOUNT. So kernel
> > stack pages are charged against corresponding memory cgroups
> > on allocation and uncharged on releasing them.
> >
> > The problem is that we do cache kernel stacks in small
> > per-cpu caches and do reuse them for new tasks, which can
> > belong to different memory cgroups.
> >
> > Each stack page still holds a reference to the original cgroup,
> > so the cgroup can't be released until the vmap area is released.
> >
> > To make this happen we need more than two subsequent exits
> > without forks in between on the current cpu, which makes it
> > very unlikely to happen. As a result, I saw a significant number
> > of dying cgroups (in theory, up to 2 * number_of_cpu +
> > number_of_tasks), which can't be released even by significant
> > memory pressure.
> >
> > As a cgroup structure can take a significant amount of memory
> > (first of all, per-cpu data like memcg statistics), it leads
> > to a noticeable waste of memory.
> >
> > Signed-off-by: Roman Gushchin <guro@...com>
> 
> Reviewed-by: Shakeel Butt <shakeelb@...gle.com>
> 
> BTW this makes a very good use-case for optimizing kmem uncharging
> similar to what you did for skmem uncharging.

The only thing I'm slightly worried here is that it can make
reclaiming of memory cgroups harder. Probably, it's still ok,
but let me first finish the work I'm doing on optimizing the
whole memcg reclaim process, and then return to this case.

Thanks!