lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 29 Nov 2023 18:10:33 +0100
From:   Michal Hocko <mhocko@...e.com>
To:     Dmitry Rokosov <ddrokosov@...utedevices.com>
Cc:     akpm@...ux-foundation.org, rostedt@...dmis.org,
        mhiramat@...nel.org, hannes@...xchg.org, roman.gushchin@...ux.dev,
        shakeelb@...gle.com, muchun.song@...ux.dev, kernel@...rdevices.ru,
        rockosov@...il.com, cgroups@...r.kernel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, bpf@...r.kernel.org
Subject: Re: [PATCH v3 2/2] mm: memcg: introduce new event to trace
 shrink_memcg

On Wed 29-11-23 19:57:52, Dmitry Rokosov wrote:
> On Wed, Nov 29, 2023 at 05:06:37PM +0100, Michal Hocko wrote:
> > On Wed 29-11-23 18:20:57, Dmitry Rokosov wrote:
> > > On Tue, Nov 28, 2023 at 10:32:50AM +0100, Michal Hocko wrote:
> > > > On Mon 27-11-23 19:16:37, Dmitry Rokosov wrote:
> > [...]
> > > > > 2) With this approach, we will not have the ability to trace a situation
> > > > > where the kernel is requesting reclaim for a specific memcg, but due to
> > > > > limits issues, we are unable to run it.
> > > > 
> > > > I do not follow. Could you be more specific please?
> > > > 
> > > 
> > > I'm referring to a situation where kswapd() or another kernel mm code
> > > requests some reclaim pages from memcg, but memcg rejects it due to
> > > limits checkers. This occurs in the shrink_node_memcgs() function.
> > 
> > Ohh, you mean reclaim protection
> > 
> > > ===
> > > 		mem_cgroup_calculate_protection(target_memcg, memcg);
> > > 
> > > 		if (mem_cgroup_below_min(target_memcg, memcg)) {
> > > 			/*
> > > 			 * Hard protection.
> > > 			 * If there is no reclaimable memory, OOM.
> > > 			 */
> > > 			continue;
> > > 		} else if (mem_cgroup_below_low(target_memcg, memcg)) {
> > > 			/*
> > > 			 * Soft protection.
> > > 			 * Respect the protection only as long as
> > > 			 * there is an unprotected supply
> > > 			 * of reclaimable memory from other cgroups.
> > > 			 */
> > > 			if (!sc->memcg_low_reclaim) {
> > > 				sc->memcg_low_skipped = 1;
> > > 				continue;
> > > 			}
> > > 			memcg_memory_event(memcg, MEMCG_LOW);
> > > 		}
> > > ===
> > > 
> > > With separate shrink begin()/end() tracepoints we can detect such
> > > problem.
> > 
> > How? You are only reporting the number of reclaimed pages and no
> > reclaimed pages could be not just because of low/min limits but
> > generally because of other reasons. You would need to report also the
> > number of scanned/isolated pages.
> >  
> 
> From my perspective, if memory control group (memcg) protection
> restrictions occur, we can identify them by the absence of the end()
> pair of begin(). Other reasons will have both tracepoints raised.

That is not really great way to detect that TBH. Trace events could be
lost and then you simply do not know what has happened.

-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ