lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180904175243.GA4889@tower.DHCP.thefacebook.com>
Date:   Tue, 4 Sep 2018 10:52:46 -0700
From:   Roman Gushchin <guro@...com>
To:     Michal Hocko <mhocko@...nel.org>
CC:     Rik van Riel <riel@...riel.com>, <linux-mm@...ck.org>,
        <linux-kernel@...r.kernel.org>, <kernel-team@...com>,
        Josef Bacik <jbacik@...com>,
        Johannes Weiner <hannes@...xchg.org>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] mm: slowly shrink slabs with a relatively small number
 of objects

On Tue, Sep 04, 2018 at 06:14:31PM +0200, Michal Hocko wrote:
> On Tue 04-09-18 08:34:49, Roman Gushchin wrote:
> > On Tue, Sep 04, 2018 at 09:00:05AM +0200, Michal Hocko wrote:
> > > On Mon 03-09-18 13:28:06, Roman Gushchin wrote:
> > > > On Mon, Sep 03, 2018 at 08:29:56PM +0200, Michal Hocko wrote:
> > > > > On Fri 31-08-18 14:31:41, Roman Gushchin wrote:
> > > > > > On Fri, Aug 31, 2018 at 05:15:39PM -0400, Rik van Riel wrote:
> > > > > > > On Fri, 2018-08-31 at 13:34 -0700, Roman Gushchin wrote:
> > > > > > > 
> > > > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > > > > > index fa2c150ab7b9..c910cf6bf606 100644
> > > > > > > > --- a/mm/vmscan.c
> > > > > > > > +++ b/mm/vmscan.c
> > > > > > > > @@ -476,6 +476,10 @@ static unsigned long do_shrink_slab(struct
> > > > > > > > shrink_control *shrinkctl,
> > > > > > > >  	delta = freeable >> priority;
> > > > > > > >  	delta *= 4;
> > > > > > > >  	do_div(delta, shrinker->seeks);
> > > > > > > > +
> > > > > > > > +	if (delta == 0 && freeable > 0)
> > > > > > > > +		delta = min(freeable, batch_size);
> > > > > > > > +
> > > > > > > >  	total_scan += delta;
> > > > > > > >  	if (total_scan < 0) {
> > > > > > > >  		pr_err("shrink_slab: %pF negative objects to delete
> > > > > > > > nr=%ld\n",
> > > > > > > 
> > > > > > > I agree that we need to shrink slabs with fewer than
> > > > > > > 4096 objects, but do we want to put more pressure on
> > > > > > > a slab the moment it drops below 4096 than we applied
> > > > > > > when it had just over 4096 objects on it?
> > > > > > > 
> > > > > > > With this patch, a slab with 5000 objects on it will
> > > > > > > get 1 item scanned, while a slab with 4000 objects on
> > > > > > > it will see shrinker->batch or SHRINK_BATCH objects
> > > > > > > scanned every time.
> > > > > > > 
> > > > > > > I don't know if this would cause any issues, just
> > > > > > > something to ponder.
> > > > > > 
> > > > > > Hm, fair enough. So, basically we can always do
> > > > > > 
> > > > > >     delta = max(delta, min(freeable, batch_size));
> > > > > > 
> > > > > > Does it look better?
> > > > > 
> > > > > Why don't you use the same heuristic we use for the normal LRU raclaim?
> > > > 
> > > > Because we do reparent kmem lru lists on offlining.
> > > > Take a look at memcg_offline_kmem().
> > > 
> > > Then I must be missing something. Why are we growing the number of dead
> > > cgroups then?
> > 
> > We do reparent LRU lists, but not objects. Objects (or, more precisely, pages)
> > are still holding a reference to the memcg.
> 
> OK, this is what I missed. I thought that the reparenting includes all
> the pages as well. Is there any strong reason that we cannot do that?
> Performance/Locking/etc.?
> 
> Or maybe do not reparent at all and rely on the same reclaim heuristic
> we do for normal pages?
> 
> I am not opposing your patch but I am trying to figure out whether that
> is the best approach.

I don't think the current logic does make sense. Why should cgroups
with less than 4k kernel objects be excluded from being scanned?

Reparenting of all pages is definitely an option to consider,
but it's not free in any case, so if there is no problem,
why should we? Let's keep it as a last measure. In my case,
the proposed patch works perfectly: the number of dying cgroups
jumps around 100, where it grew steadily to 2k and more before.

I believe that reparenting of LRU lists is required to minimize
the number of LRU lists to scan, but I'm not sure.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ