linux-kernel - Re: [PATCH 1/5] mm: compaction: push watermark into compaction

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250410201718.GA366747@cmpxchg.org>
Date: Thu, 10 Apr 2025 16:17:18 -0400
From: Johannes Weiner <hannes@...xchg.org>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	Mel Gorman <mgorman@...hsingularity.net>, Zi Yan <ziy@...dia.com>,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/5] mm: compaction: push watermark into
 compaction_suitable() callers

On Thu, Apr 10, 2025 at 05:19:06PM +0200, Vlastimil Babka wrote:
> On 3/13/25 22:05, Johannes Weiner wrote:
> > compaction_suitable() hardcodes the min watermark, with a boost to the
> > low watermark for costly orders. However, compaction_ready() requires
> > order-0 at the high watermark. It currently checks the marks twice.
> > 
> > Make the watermark a parameter to compaction_suitable() and have the
> > callers pass in what they require:
> > 
> > - compaction_zonelist_suitable() is used by the direct reclaim path,
> >   so use the min watermark.
> > 
> > - compact_suit_allocation_order() has a watermark in context derived
> >   from cc->alloc_flags.
> > 
> >   The only quirk is that kcompactd doesn't initialize cc->alloc_flags
> >   explicitly. There is a direct check in kcompactd_do_work() that
> >   passes ALLOC_WMARK_MIN, but there is another check downstack in
> >   compact_zone() that ends up passing the unset alloc_flags. Since
> >   they default to 0, and that coincides with ALLOC_WMARK_MIN, it is
> >   correct. But it's subtle. Set cc->alloc_flags explicitly.
> > 
> > - should_continue_reclaim() is direct reclaim, use the min watermark.
> > 
> > - Finally, consolidate the two checks in compaction_ready() to a
> >   single compaction_suitable() call passing the high watermark.
> > 
> >   There is a tiny change in behavior: before, compaction_suitable()
> >   would check order-0 against min or low, depending on costly
> >   order. Then there'd be another high watermark check.
> > 
> >   Now, the high watermark is passed to compaction_suitable(), and the
> >   costly order-boost (low - min) is added on top. This means
> >   compaction_ready() sets a marginally higher target for free pages.
> > 
> >   In a kernelbuild + THP pressure test, though, this didn't show any
> >   measurable negative effects on memory pressure or reclaim rates. As
> >   the comment above the check says, reclaim is usually stopped short
> >   on should_continue_reclaim(), and this just defines the worst-case
> >   reclaim cutoff in case compaction is not making any headway.
> > 
> > Signed-off-by: Johannes Weiner <hannes@...xchg.org>
> 
> <snip>
> 
> > @@ -2513,13 +2516,13 @@ compaction_suit_allocation_order(struct zone *zone, unsigned int order,
> >  	 */
> >  	if (order > PAGE_ALLOC_COSTLY_ORDER && async &&
> >  	    !(alloc_flags & ALLOC_CMA)) {
> > -		watermark = low_wmark_pages(zone) + compact_gap(order);
> > -		if (!__zone_watermark_ok(zone, 0, watermark, highest_zoneidx,
> > -					   0, zone_page_state(zone, NR_FREE_PAGES)))
> > +		if (!__zone_watermark_ok(zone, 0, watermark + compact_gap(order),
> > +					 highest_zoneidx, 0,
> > +					 zone_page_state(zone, NR_FREE_PAGES)))
> >  			return COMPACT_SKIPPED;
> 
> The watermark here is no longer recalculated as low_wmark_pages() but the
> value from above based on alloc_flags is reused.
> It's probably ok, maybe it's even more correct, just wasn't mentioned in the
> changelog as another tiny change of behavior so I wanted to point it out.

Ah yes, it would have made sense to point out.

I was wondering about this check. It was introduced to bail on
compaction if there are not enough free non-CMA pages. But if there
are, we still fall through and check the superset of regular + CMA
pages against the watermarks as well. We know this will succeed, so
this seems moot.

It's also a little odd that compaction_suitable() hardcodes ALLOC_CMA
with the explanation that "CMA are migration targets", but then this
check says "actually, it doesn't help us if blocks are formed in CMA".

Does it make more sense to plumb alloc_flags to compaction_suitable()?

There is more head-scratching, though. The check is meant to test
whether compaction has a chance of forming non-CMA blocks. But free
pages are targets. You could have plenty of non-contiguous, free
non-CMA memory - compaction will then form blocks in CMA by moving CMA
pages into those non-CMA targets.

The longer I look at this, the more I feel like this just hard-coded
the very specific scenario the patch author had a problem with: CMA is
massive. The page allocator fills up regular memory first. Once
regular memory is full, non-CMA requests stall on compaction making
CMA blocks. So just bail on compaction then.

It's a valid problem, but I don't see how this code makes any general
sense outside of this exact sequence of events. Especially once
compaction has moved stuff around between regular and CMA memory, the
issue will be back, and the check does nothing to prevent it.