lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <46f1b2ab-2903-4cde-9e68-e334a0d0df22@suse.cz>
Date: Fri, 11 Apr 2025 10:19:58 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: Johannes Weiner <hannes@...xchg.org>,
 Andrew Morton <akpm@...ux-foundation.org>
Cc: Mel Gorman <mgorman@...hsingularity.net>, Zi Yan <ziy@...dia.com>,
 linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 5/5] mm: page_alloc: defrag_mode kswapd/kcompactd
 watermarks

On 3/13/25 22:05, Johannes Weiner wrote:
> The previous patch added pageblock_order reclaim to kswapd/kcompactd,
> which helps, but produces only one block at a time. Allocation stalls
> and THP failure rates are still higher than they could be.
> 
> To adequately reflect ALLOC_NOFRAGMENT demand for pageblocks, change
> the watermarking for kswapd & kcompactd: instead of targeting the high
> watermark in order-0 pages and checking for one suitable block, simply
> require that the high watermark is entirely met in pageblocks.

Hrm.

> ---
>  include/linux/mmzone.h |  1 +
>  mm/compaction.c        | 37 ++++++++++++++++++++++++++++++-------
>  mm/internal.h          |  1 +
>  mm/page_alloc.c        | 29 +++++++++++++++++++++++------
>  mm/vmscan.c            | 15 ++++++++++++++-
>  mm/vmstat.c            |  1 +
>  6 files changed, 70 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index dbb0ad69e17f..37c29f3fbca8 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -138,6 +138,7 @@ enum numa_stat_item {
>  enum zone_stat_item {
>  	/* First 128 byte cacheline (assuming 64 bit words) */
>  	NR_FREE_PAGES,
> +	NR_FREE_PAGES_BLOCKS,
>  	NR_ZONE_LRU_BASE, /* Used only for compaction and reclaim retry */
>  	NR_ZONE_INACTIVE_ANON = NR_ZONE_LRU_BASE,
>  	NR_ZONE_ACTIVE_ANON,
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 036353ef1878..4a2ccb82d0b2 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2329,6 +2329,22 @@ static enum compact_result __compact_finished(struct compact_control *cc)
>  	if (!pageblock_aligned(cc->migrate_pfn))
>  		return COMPACT_CONTINUE;
>  
> +	/*
> +	 * When defrag_mode is enabled, make kcompactd target
> +	 * watermarks in whole pageblocks. Because they can be stolen
> +	 * without polluting, no further fallback checks are needed.
> +	 */
> +	if (defrag_mode && !cc->direct_compaction) {
> +		if (__zone_watermark_ok(cc->zone, cc->order,
> +					high_wmark_pages(cc->zone),
> +					cc->highest_zoneidx, cc->alloc_flags,
> +					zone_page_state(cc->zone,
> +							NR_FREE_PAGES_BLOCKS)))
> +			return COMPACT_SUCCESS;
> +
> +		return COMPACT_CONTINUE;
> +	}

Wonder if this ever succeds in practice. Is high_wmark_pages() even aligned
to pageblock size? If not, and it's X pageblocks and a half, we will rarely
have NR_FREE_PAGES_BLOCKS cover all of that? Also concurrent allocations can
put us below high wmark quickly and then we never satisfy this?

Doesn't then happen that with defrag_mode, in practice kcompactd basically
always runs until scanners met?

Maybe the check could instead e.g. compare NR_FREE_PAGES (aligned down to
pageblock size, or even with some extra slack?) with NR_FREE_PAGES_BLOCKS?

> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -6724,11 +6724,24 @@ static bool pgdat_balanced(pg_data_t *pgdat, int order, int highest_zoneidx)
>  	 * meet watermarks.
>  	 */
>  	for_each_managed_zone_pgdat(zone, pgdat, i, highest_zoneidx) {
> +		unsigned long free_pages;
> +
>  		if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING)
>  			mark = promo_wmark_pages(zone);
>  		else
>  			mark = high_wmark_pages(zone);
> -		if (zone_watermark_ok_safe(zone, order, mark, highest_zoneidx))

I think you just removed the only user of this _safe() function. Is the
cpu-drift control it does no longer necessary?

> +
> +		/*
> +		 * In defrag_mode, watermarks must be met in whole
> +		 * blocks to avoid polluting allocator fallbacks.
> +		 */
> +		if (defrag_mode)
> +			free_pages = zone_page_state(zone, NR_FREE_PAGES_BLOCKS);
> +		else
> +			free_pages = zone_page_state(zone, NR_FREE_PAGES);
> +
> +		if (__zone_watermark_ok(zone, order, mark, highest_zoneidx,
> +					0, free_pages))
>  			return true;
>  	}
>  

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ