linux-kernel - Re: [PATCH 4.20 71/92] Revert "mm: slowly shrink slabs with a relatively small number of objects"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190218185738.GA3745@castle.DHCP.thefacebook.com>
Date:   Mon, 18 Feb 2019 18:57:45 +0000
From:   Roman Gushchin <guro@...com>
To:     Michal Hocko <mhocko@...nel.org>
CC:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Rik van Riel <riel@...riel.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>,
        Dave Chinner <dchinner@...hat.com>,
        Wolfgang Walter <linux@...m.de>, Spock <dairinin@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH 4.20 71/92] Revert "mm: slowly shrink slabs with a
 relatively small number of objects"

On Mon, Feb 18, 2019 at 06:38:25PM +0100, Michal Hocko wrote:
> On Mon 18-02-19 17:16:34, Greg KH wrote:
> > On Mon, Feb 18, 2019 at 10:30:44AM -0500, Rik van Riel wrote:
> > > On Mon, 2019-02-18 at 14:43 +0100, Greg Kroah-Hartman wrote:
> > > > 4.20-stable review patch.  If anyone has any objections, please let
> > > > me know.
> > > > 
> > > > ------------------
> > > > 
> > > > From: Dave Chinner <dchinner@...hat.com>
> > > > 
> > > > commit a9a238e83fbb0df31c3b9b67003f8f9d1d1b6c96 upstream.
> > > > 
> > > > This reverts commit 172b06c32b9497 ("mm: slowly shrink slabs with a
> > > > relatively small number of objects").
> > > 
> > > This revert will result in the slab caches of dead
> > > cgroups with a small number of remaining objects never
> > > getting reclaimed, which can be a memory leak in some
> > > configurations.
> > > 
> > > But hey, that's your tradeoff to make.
> > 
> > That's what is in Linus's tree.  Should we somehow diverge from that?
> 
> I believe we should start working on a memcg specific solution to
> minimize regressions for others and start a more complex solution from
> there.
> 
> Can we special case dead memcgs in the slab reclaim and reclaim more
> aggressively?

It's probably better to start a new thread to discuss this issue (btw,
doesn't LSF/MM looks like the best place to do it? I can send a proposal).

But I don't think dead cgroups are any special here. At the moment when
a cgroup is deleted, associated slab objects can be perfectly used by processes
in other cgroups, so we can't reclaim them. Slab objects (vfs objects first
of all) are quite often shared between cgroups, we can't just ignore it.

So in order to avoid leaks we'll need to apply some artificial pressure
constantly, and then it's not clear why we need to do it separately
for dead and living cgroups.

So I still believe that Rik's/mine approach is the right thing to do,
we just need to apply the pressure gently, including all corner cases
(e.g. concurrency issues spotted by Dave).

Generally speaking, the problem occurs because the lifecycle of a
slab object can be much longer than the lifecycle of the corresponding memory
cgroup. And because we pin the memcg by the object, we're wasting lot of
memory. Right now we allow certain amount of vfs objects to reside in the
memory pretty much forever unless we have a really strong memory pressure.
It's arguable fine because inodes and dentries are relatively small, but
if each of them holds a 200kb+ dead memcg, it becomes very noticeable.

So we either have to apply the memory pressure more evenly (what Rik and I
are proposing), or completely reparent slab objects on cgroup removal.

Thanks!