[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YSPfe4yf2fRdzijh@cmpxchg.org>
Date: Mon, 23 Aug 2021 13:48:43 -0400
From: Johannes Weiner <hannes@...xchg.org>
To: Michal Koutný <mkoutny@...e.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Leon Yang <lnyng@...com>, Chris Down <chris@...isdown.name>,
Roman Gushchin <guro@...com>, Michal Hocko <mhocko@...e.com>,
linux-mm@...ck.org, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org, kernel-team@...com
Subject: Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional
memory.low reclaim
Hi Michal,
On Mon, Aug 23, 2021 at 06:09:29PM +0200, Michal Koutný wrote:
> Hello
>
> (and sorry for a belated reply).
It's never too late, thanks for taking a look.
> On Tue, Aug 17, 2021 at 02:05:06PM -0400, Johannes Weiner <hannes@...xchg.org> wrote:
> > @@ -2576,6 +2578,15 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
> > [...]
> > + /* memory.low scaling, make sure we retry before OOM */
> > + if (!sc->memcg_low_reclaim && low > min) {
> > + protection = low;
> > + sc->memcg_low_skipped = 1;
>
> IIUC, this won't result in memory.events:low increment although the
> effect is similar (breaching (partial) memory.low protection) and signal
> to the user is comparable (overcommited memory.low).
Good observation. I think you're right, we should probably count such
partial breaches as LOW events as well.
Note that this isn't new behavior. My patch merely moved this part
from mem_cgroup_protection():
- if (in_low_reclaim)
- return READ_ONCE(memcg->memory.emin);
Even before, if we retried due to just one (possibly insignificant)
cgroup below low, we'd ignore proportional reclaim and partially
breach ALL protected cgroups, while only counting a low event for the
one group that is usage < low.
> Admittedly, this patch's behavior adheres to the current documentation
> (Documentation/admin-guide/cgroup-v2.rst):
>
> > The number of times the cgroup is reclaimed due to high memory
> > pressure even though its usage is under the low boundary,
>
> however, that definition might not be what the useful indicator would
> be now.
> Is it worth including these partial breaches into memory.events:low?
I think it is. How about:
"The number of times the cgroup's memory.low-protected memory was
reclaimed in order to avoid OOM during high memory pressure."
And adding a MEMCG_LOW event to partial breaches. BTW, the comment
block above this code is also out-of-date, because it says we're
honoring memory.low on the retries, but that's not the case.
I'll prepare a follow-up patch for these 3 things as well as the more
verbose comment that Michal Hocko asked for on the retry logic.
Powered by blists - more mailing lists