lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180904153445.GA22328@tower.DHCP.thefacebook.com>
Date:   Tue, 4 Sep 2018 08:34:49 -0700
From:   Roman Gushchin <guro@...com>
To:     Michal Hocko <mhocko@...nel.org>
CC:     Rik van Riel <riel@...riel.com>, <linux-mm@...ck.org>,
        <linux-kernel@...r.kernel.org>, <kernel-team@...com>,
        Josef Bacik <jbacik@...com>,
        Johannes Weiner <hannes@...xchg.org>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] mm: slowly shrink slabs with a relatively small number
 of objects

On Tue, Sep 04, 2018 at 09:00:05AM +0200, Michal Hocko wrote:
> On Mon 03-09-18 13:28:06, Roman Gushchin wrote:
> > On Mon, Sep 03, 2018 at 08:29:56PM +0200, Michal Hocko wrote:
> > > On Fri 31-08-18 14:31:41, Roman Gushchin wrote:
> > > > On Fri, Aug 31, 2018 at 05:15:39PM -0400, Rik van Riel wrote:
> > > > > On Fri, 2018-08-31 at 13:34 -0700, Roman Gushchin wrote:
> > > > > 
> > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > > > index fa2c150ab7b9..c910cf6bf606 100644
> > > > > > --- a/mm/vmscan.c
> > > > > > +++ b/mm/vmscan.c
> > > > > > @@ -476,6 +476,10 @@ static unsigned long do_shrink_slab(struct
> > > > > > shrink_control *shrinkctl,
> > > > > >  	delta = freeable >> priority;
> > > > > >  	delta *= 4;
> > > > > >  	do_div(delta, shrinker->seeks);
> > > > > > +
> > > > > > +	if (delta == 0 && freeable > 0)
> > > > > > +		delta = min(freeable, batch_size);
> > > > > > +
> > > > > >  	total_scan += delta;
> > > > > >  	if (total_scan < 0) {
> > > > > >  		pr_err("shrink_slab: %pF negative objects to delete
> > > > > > nr=%ld\n",
> > > > > 
> > > > > I agree that we need to shrink slabs with fewer than
> > > > > 4096 objects, but do we want to put more pressure on
> > > > > a slab the moment it drops below 4096 than we applied
> > > > > when it had just over 4096 objects on it?
> > > > > 
> > > > > With this patch, a slab with 5000 objects on it will
> > > > > get 1 item scanned, while a slab with 4000 objects on
> > > > > it will see shrinker->batch or SHRINK_BATCH objects
> > > > > scanned every time.
> > > > > 
> > > > > I don't know if this would cause any issues, just
> > > > > something to ponder.
> > > > 
> > > > Hm, fair enough. So, basically we can always do
> > > > 
> > > >     delta = max(delta, min(freeable, batch_size));
> > > > 
> > > > Does it look better?
> > > 
> > > Why don't you use the same heuristic we use for the normal LRU raclaim?
> > 
> > Because we do reparent kmem lru lists on offlining.
> > Take a look at memcg_offline_kmem().
> 
> Then I must be missing something. Why are we growing the number of dead
> cgroups then?

We do reparent LRU lists, but not objects. Objects (or, more precisely, pages)
are still holding a reference to the memcg.

Let's say we have systemd periodically restarting some service in system.slice.
Then all accounted objects after removal of the service's memcg are placed
into the system.slice's LRU. But under small/moderate memory pressure we
won't scan it at all, unless it get's really big. If there is "only" a couple
of thousands of objects, we don't scan it, and can easily have several hundreds
of pinned dying cgroups.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ