lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZpdwcOv9WiILZNvz@tiehlicka>
Date: Wed, 17 Jul 2024 09:19:12 +0200
From: Michal Hocko <mhocko@...e.com>
To: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	"Borislav Petkov (AMD)" <bp@...en8.de>,
	Mel Gorman <mgorman@...e.de>, Vlastimil Babka <vbabka@...e.cz>,
	Tom Lendacky <thomas.lendacky@....com>,
	Mike Rapoport <rppt@...nel.org>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, Jianxiong Gao <jxgao@...gle.com>,
	stable@...r.kernel.org
Subject: Re: [PATCH] mm: Fix endless reclaim on machines with unaccepted
 memory.

On Tue 16-07-24 16:00:13, Kirill A. Shutemov wrote:
> Unaccepted memory is considered unusable free memory, which is not
> counted as free on the zone watermark check. This causes
> get_page_from_freelist() to accept more memory to hit the high
> watermark, but it creates problems in the reclaim path.
> 
> The reclaim path encounters a failed zone watermark check and attempts
> to reclaim memory. This is usually successful, but if there is little or
> no reclaimable memory, it can result in endless reclaim with little to
> no progress. This can occur early in the boot process, just after start
> of the init process when the only reclaimable memory is the page cache
> of the init executable and its libraries.

How does this happen when try_to_accept_memory is the first thing to do
when wmark check fails in the allocation path?

Could you describe what was the initial configuration of the system? How
much of the unaccepted memory was there to trigger this?

> To address this issue, teach shrink_node() and shrink_zones() to accept
> memory before attempting to reclaim.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
> Reported-by: Jianxiong Gao <jxgao@...gle.com>
> Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory")
> Cc: stable@...r.kernel.org # v6.5+
[...]
>  static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
>  {
>  	unsigned long nr_reclaimed, nr_scanned, nr_node_reclaimed;
>  	struct lruvec *target_lruvec;
>  	bool reclaimable = false;
>  
> +	/* Try to accept memory before going for reclaim */
> +	if (node_try_to_accept_memory(pgdat, sc)) {
> +		if (!should_continue_reclaim(pgdat, 0, sc))
> +			return;
> +	}
> +

This would need an exemption from the memcg reclaim.

>  	if (lru_gen_enabled() && root_reclaim(sc)) {
>  		lru_gen_shrink_node(pgdat, sc);
>  		return;

-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ