[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231129173540.gl2pufeo6ciubcny@CAB-WSD-L081021>
Date: Wed, 29 Nov 2023 20:35:40 +0300
From: Dmitry Rokosov <ddrokosov@...utedevices.com>
To: Michal Hocko <mhocko@...e.com>
CC: <akpm@...ux-foundation.org>, <rostedt@...dmis.org>,
<mhiramat@...nel.org>, <hannes@...xchg.org>,
<roman.gushchin@...ux.dev>, <shakeelb@...gle.com>,
<muchun.song@...ux.dev>, <kernel@...rdevices.ru>,
<rockosov@...il.com>, <cgroups@...r.kernel.org>,
<linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>,
<bpf@...r.kernel.org>
Subject: Re: [PATCH v3 2/2] mm: memcg: introduce new event to trace
shrink_memcg
On Wed, Nov 29, 2023 at 06:10:33PM +0100, Michal Hocko wrote:
> On Wed 29-11-23 19:57:52, Dmitry Rokosov wrote:
> > On Wed, Nov 29, 2023 at 05:06:37PM +0100, Michal Hocko wrote:
> > > On Wed 29-11-23 18:20:57, Dmitry Rokosov wrote:
> > > > On Tue, Nov 28, 2023 at 10:32:50AM +0100, Michal Hocko wrote:
> > > > > On Mon 27-11-23 19:16:37, Dmitry Rokosov wrote:
> > > [...]
> > > > > > 2) With this approach, we will not have the ability to trace a situation
> > > > > > where the kernel is requesting reclaim for a specific memcg, but due to
> > > > > > limits issues, we are unable to run it.
> > > > >
> > > > > I do not follow. Could you be more specific please?
> > > > >
> > > >
> > > > I'm referring to a situation where kswapd() or another kernel mm code
> > > > requests some reclaim pages from memcg, but memcg rejects it due to
> > > > limits checkers. This occurs in the shrink_node_memcgs() function.
> > >
> > > Ohh, you mean reclaim protection
> > >
> > > > ===
> > > > mem_cgroup_calculate_protection(target_memcg, memcg);
> > > >
> > > > if (mem_cgroup_below_min(target_memcg, memcg)) {
> > > > /*
> > > > * Hard protection.
> > > > * If there is no reclaimable memory, OOM.
> > > > */
> > > > continue;
> > > > } else if (mem_cgroup_below_low(target_memcg, memcg)) {
> > > > /*
> > > > * Soft protection.
> > > > * Respect the protection only as long as
> > > > * there is an unprotected supply
> > > > * of reclaimable memory from other cgroups.
> > > > */
> > > > if (!sc->memcg_low_reclaim) {
> > > > sc->memcg_low_skipped = 1;
> > > > continue;
> > > > }
> > > > memcg_memory_event(memcg, MEMCG_LOW);
> > > > }
> > > > ===
> > > >
> > > > With separate shrink begin()/end() tracepoints we can detect such
> > > > problem.
> > >
> > > How? You are only reporting the number of reclaimed pages and no
> > > reclaimed pages could be not just because of low/min limits but
> > > generally because of other reasons. You would need to report also the
> > > number of scanned/isolated pages.
> > >
> >
> > From my perspective, if memory control group (memcg) protection
> > restrictions occur, we can identify them by the absence of the end()
> > pair of begin(). Other reasons will have both tracepoints raised.
>
> That is not really great way to detect that TBH. Trace events could be
> lost and then you simply do not know what has happened.
I see, thank you very much for the detailed review! I will prepare a new
patchset with memcg names in the lruvec and slab paths, will back soon.
--
Thank you,
Dmitry
Powered by blists - more mailing lists