lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230426152206.GA30064@cmpxchg.org>
Date:   Wed, 26 Apr 2023 11:22:06 -0400
From:   Johannes Weiner <hannes@...xchg.org>
To:     "Huang, Ying" <ying.huang@...el.com>
Cc:     linux-mm@...ck.org, Kaiyang Zhao <kaiyang2@...cmu.edu>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Vlastimil Babka <vbabka@...e.cz>,
        David Rientjes <rientjes@...gle.com>,
        linux-kernel@...r.kernel.org, kernel-team@...com
Subject: Re: [RFC PATCH 20/26] mm: vmscan: use compaction_suitable() check in
 kswapd

On Wed, Apr 26, 2023 at 09:30:23AM +0800, Huang, Ying wrote:
> Johannes Weiner <hannes@...xchg.org> writes:
> 
> > On Tue, Apr 25, 2023 at 11:12:28AM +0800, Huang, Ying wrote:
> >> Johannes Weiner <hannes@...xchg.org> writes:
> >> 
> >> > Kswapd currently bails on higher-order allocations with an open-coded
> >> > check for whether it's reclaimed the compaction gap.
> >> >
> >> > compaction_suitable() is the customary interface to coordinate reclaim
> >> > with compaction.
> >> >
> >> > Signed-off-by: Johannes Weiner <hannes@...xchg.org>
> >> > ---
> >> >  mm/vmscan.c | 67 ++++++++++++++++++-----------------------------------
> >> >  1 file changed, 23 insertions(+), 44 deletions(-)
> >> >
> >> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> >> > index ee8c8ca2e7b5..723705b9e4d9 100644
> >> > --- a/mm/vmscan.c
> >> > +++ b/mm/vmscan.c
> >> > @@ -6872,12 +6872,18 @@ static bool pgdat_balanced(pg_data_t *pgdat, int order, int highest_zoneidx)
> >> >  		if (!managed_zone(zone))
> >> >  			continue;
> >> >  
> >> > +		/* Allocation can succeed in any zone, done */
> >> >  		if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING)
> >> >  			mark = wmark_pages(zone, WMARK_PROMO);
> >> >  		else
> >> >  			mark = high_wmark_pages(zone);
> >> >  		if (zone_watermark_ok_safe(zone, order, mark, highest_zoneidx))
> >> >  			return true;
> >> > +
> >> > +		/* Allocation can't succeed, but enough order-0 to compact */
> >> > +		if (compaction_suitable(zone, order,
> >> > +					highest_zoneidx) == COMPACT_CONTINUE)
> >> > +			return true;
> >> 
> >> Should we check the following first?
> >> 
> >>         order > 0 && zone_watermark_ok_safe(zone, 0, mark, highest_zoneidx)
> >
> > That's what compaction_suitable() does. It checks whether there are
> > enough migration targets for compaction (COMPACT_CONTINUE) or whether
> > reclaim needs to do some more work (COMPACT_SKIPPED).
> 
> Yes.  And I found that the watermark used in compaction_suitable() is
> low_wmark_pages() or min_wmark_pages(), which doesn't match the
> watermark here.  Or did I miss something?

Ahh, you're right, kswapd will bail prematurely. Compaction cannot
reliably meet the high watermark with a low watermark scratch space.

I'll add the order check before the suitable test, for clarity, and so
that order-0 requests don't check the same thing twice.

For the watermark, I'd make it an arg to compaction_suitable() and use
whatever the reclaimer targets (high for kswapd, min for direct).

However, there is a minor snag. compaction_suitable() currently has
its own smarts regarding the watermark:

	/*
	 * Watermarks for order-0 must be met for compaction to be able to
	 * isolate free pages for migration targets. This means that the
	 * watermark and alloc_flags have to match, or be more pessimistic than
	 * the check in __isolate_free_page(). We don't use the direct
	 * compactor's alloc_flags, as they are not relevant for freepage
	 * isolation. We however do use the direct compactor's highest_zoneidx
	 * to skip over zones where lowmem reserves would prevent allocation
	 * even if compaction succeeds.
	 * For costly orders, we require low watermark instead of min for
	 * compaction to proceed to increase its chances.
	 * ALLOC_CMA is used, as pages in CMA pageblocks are considered
	 * suitable migration targets
	 */
	watermark = (order > PAGE_ALLOC_COSTLY_ORDER) ?
				low_wmark_pages(zone) : min_wmark_pages(zone);

Historically it has always checked low instead of min. Then Vlastimil
changed it to min for non-costly orders here:

commit 8348faf91f56371d4bada6fc5915e19580a15ffe
Author: Vlastimil Babka <vbabka@...e.cz>
Date:   Fri Oct 7 16:58:00 2016 -0700

    mm, compaction: require only min watermarks for non-costly orders
    
    The __compaction_suitable() function checks the low watermark plus a
    compact_gap() gap to decide if there's enough free memory to perform
    compaction.  Then __isolate_free_page uses low watermark check to decide
    if particular free page can be isolated.  In the latter case, using low
    watermark is needlessly pessimistic, as the free page isolations are
    only temporary.  For __compaction_suitable() the higher watermark makes
    sense for high-order allocations where more freepages increase the
    chance of success, and we can typically fail with some order-0 fallback
    when the system is struggling to reach that watermark.  But for
    low-order allocation, forming the page should not be that hard.  So
    using low watermark here might just prevent compaction from even trying,
    and eventually lead to OOM killer even if we are above min watermarks.
    
    So after this patch, we use min watermark for non-costly orders in
    __compaction_suitable(), and for all orders in __isolate_free_page().

Lowering to min wasn't an issue for non-costly, but AFAICS there was
no explicit testing for whether min would work for costly orders too.

I'd propose trying it with min even for costly and see what happens.

If it does regress, a better place to boost scratch space for costly
orders might be compact_gap(), so I'd move it there.

Does that sound reasonable?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ