lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 11 May 2020 11:56:46 -0400
From:   Johannes Weiner <hannes@...xchg.org>
To:     Shakeel Butt <shakeelb@...gle.com>
Cc:     Michal Hocko <mhocko@...nel.org>, Roman Gushchin <guro@...com>,
        Greg Thelen <gthelen@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux MM <linux-mm@...ck.org>,
        Cgroups <cgroups@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] memcg: effective memory.high reclaim for remote charging

On Thu, May 07, 2020 at 10:00:07AM -0700, Shakeel Butt wrote:
> On Thu, May 7, 2020 at 9:47 AM Michal Hocko <mhocko@...nel.org> wrote:
> >
> > On Thu 07-05-20 09:33:01, Shakeel Butt wrote:
> > [...]
> > > @@ -2600,8 +2596,23 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
> > >                               schedule_work(&memcg->high_work);
> > >                               break;
> > >                       }
> > > -                     current->memcg_nr_pages_over_high += batch;
> > > -                     set_notify_resume(current);
> > > +
> > > +                     if (gfpflags_allow_blocking(gfp_mask))
> > > +                             reclaim_over_high(memcg, gfp_mask, batch);
> > > +
> > > +                     if (page_counter_read(&memcg->memory) <=
> > > +                         READ_ONCE(memcg->high))
> > > +                             break;
> >
> > I am half way to a long weekend so bear with me. Shouldn't this be continue? The
> > parent memcg might be still in excess even the child got reclaimed,
> > right?
> >
> 
> The reclaim_high() actually already does this walk up to the root and
> reclaim from ones who are still over their high limit. Though having
> 'continue' here is correct too.

If reclaim was weak and failed to bring the child back in line, we
still do set_notify_resume(). We should do that for ancestors too.

But it seems we keep adding hierarchy walks and it's getting somewhat
convoluted: page_counter does it, then we check high overage
recursively, and now we add the call to reclaim which itself is a walk
up the ancestor line.

Can we hitchhike on the page_counter_try_charge() walk, which already
has the concept of identifying counters with overage? Rename the @fail
to @limited and return the first counter that is in excess of its high
as well, even when the function succeeds?

Then we could ditch the entire high checking loop here and simply
replace it with

done_restock:
	...

	if (*limited) {
		if (gfpflags_allow_blocking())
			reclaim_over_high(memcg_from_counter(limited));
		/* Reclaim may not be able to do much, ... */
		set_notify_resume(); // or schedule_work()
	};

In the long-term, the best thing might be to integrate memory.high
reclaim with the regular reclaim that try_charge() is already
doing. Especially the part where it retries several times - we
currently give up on memory.high unnecessarily early. Make
page_counter_try_charge() fail on high and max equally, and after
several reclaim cycles, instead of invoking the OOM killer, inject the
penalty sleep and force the charges. OOM killing and throttling is
supposed to be the only difference between the two, anyway, and yet
the code diverges far more than that for no apparent reason.

But I also appreciate that this is a cleanup beyond the scope of this
patch here, so it's up to you how far you want to take it.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ