lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <xtcmz6b66wayqxzfio4funmrja7ezgmp3mvudjodt5xfx64rot@s6whj735oimb>
Date: Wed, 17 Jul 2024 14:55:08 +0300
From: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
To: Michal Hocko <mhocko@...e.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, 
	"Borislav Petkov (AMD)" <bp@...en8.de>, Mel Gorman <mgorman@...e.de>, Vlastimil Babka <vbabka@...e.cz>, 
	Tom Lendacky <thomas.lendacky@....com>, Mike Rapoport <rppt@...nel.org>, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, Jianxiong Gao <jxgao@...gle.com>, stable@...r.kernel.org
Subject: Re: [PATCH] mm: Fix endless reclaim on machines with unaccepted
 memory.

On Wed, Jul 17, 2024 at 09:19:12AM +0200, Michal Hocko wrote:
> On Tue 16-07-24 16:00:13, Kirill A. Shutemov wrote:
> > Unaccepted memory is considered unusable free memory, which is not
> > counted as free on the zone watermark check. This causes
> > get_page_from_freelist() to accept more memory to hit the high
> > watermark, but it creates problems in the reclaim path.
> > 
> > The reclaim path encounters a failed zone watermark check and attempts
> > to reclaim memory. This is usually successful, but if there is little or
> > no reclaimable memory, it can result in endless reclaim with little to
> > no progress. This can occur early in the boot process, just after start
> > of the init process when the only reclaimable memory is the page cache
> > of the init executable and its libraries.
> 
> How does this happen when try_to_accept_memory is the first thing to do
> when wmark check fails in the allocation path?

Good question.

I've lost access to the test setup and cannot check it directly right now.

Reading the code Looks like __alloc_pages_bulk() bypasses
get_page_from_freelist() where we usually accept more pages and goes
directly to __rmqueue_pcplist() -> rmqueue_bulk() -> __rmqueue().

Will look more into it when I have access to the test setup.

> Could you describe what was the initial configuration of the system? How
> much of the unaccepted memory was there to trigger this?

This is large TDX guest VM: 176 vCPUs and ~800GiB of memory.

One thing that I noticed that the problem is only triggered when LRU_GEN
enabled. But I failed to identify why.

The system hang (or have very little progress) shortly after systemd
starts.

> > To address this issue, teach shrink_node() and shrink_zones() to accept
> > memory before attempting to reclaim.
> > 
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
> > Reported-by: Jianxiong Gao <jxgao@...gle.com>
> > Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory")
> > Cc: stable@...r.kernel.org # v6.5+
> [...]
> >  static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
> >  {
> >  	unsigned long nr_reclaimed, nr_scanned, nr_node_reclaimed;
> >  	struct lruvec *target_lruvec;
> >  	bool reclaimable = false;
> >  
> > +	/* Try to accept memory before going for reclaim */
> > +	if (node_try_to_accept_memory(pgdat, sc)) {
> > +		if (!should_continue_reclaim(pgdat, 0, sc))
> > +			return;
> > +	}
> > +
> 
> This would need an exemption from the memcg reclaim.

Hm. Could you elaborate why?

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ