lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 27 Apr 2023 13:41:38 +0800
From:   "Huang, Ying" <ying.huang@...el.com>
To:     Johannes Weiner <hannes@...xchg.org>
Cc:     linux-mm@...ck.org, Kaiyang Zhao <kaiyang2@...cmu.edu>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Vlastimil Babka <vbabka@...e.cz>,
        David Rientjes <rientjes@...gle.com>,
        linux-kernel@...r.kernel.org, kernel-team@...com
Subject: Re: [RFC PATCH 20/26] mm: vmscan: use compaction_suitable() check
 in kswapd

Johannes Weiner <hannes@...xchg.org> writes:

> On Wed, Apr 26, 2023 at 09:30:23AM +0800, Huang, Ying wrote:
>> Johannes Weiner <hannes@...xchg.org> writes:
>> 
>> > On Tue, Apr 25, 2023 at 11:12:28AM +0800, Huang, Ying wrote:
>> >> Johannes Weiner <hannes@...xchg.org> writes:
>> >> 
>> >> > Kswapd currently bails on higher-order allocations with an open-coded
>> >> > check for whether it's reclaimed the compaction gap.
>> >> >
>> >> > compaction_suitable() is the customary interface to coordinate reclaim
>> >> > with compaction.
>> >> >
>> >> > Signed-off-by: Johannes Weiner <hannes@...xchg.org>
>> >> > ---
>> >> >  mm/vmscan.c | 67 ++++++++++++++++++-----------------------------------
>> >> >  1 file changed, 23 insertions(+), 44 deletions(-)
>> >> >
>> >> > diff --git a/mm/vmscan.c b/mm/vmscan.c
>> >> > index ee8c8ca2e7b5..723705b9e4d9 100644
>> >> > --- a/mm/vmscan.c
>> >> > +++ b/mm/vmscan.c
>> >> > @@ -6872,12 +6872,18 @@ static bool pgdat_balanced(pg_data_t *pgdat, int order, int highest_zoneidx)
>> >> >  		if (!managed_zone(zone))
>> >> >  			continue;
>> >> >  
>> >> > +		/* Allocation can succeed in any zone, done */
>> >> >  		if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING)
>> >> >  			mark = wmark_pages(zone, WMARK_PROMO);
>> >> >  		else
>> >> >  			mark = high_wmark_pages(zone);
>> >> >  		if (zone_watermark_ok_safe(zone, order, mark, highest_zoneidx))
>> >> >  			return true;
>> >> > +
>> >> > +		/* Allocation can't succeed, but enough order-0 to compact */
>> >> > +		if (compaction_suitable(zone, order,
>> >> > +					highest_zoneidx) == COMPACT_CONTINUE)
>> >> > +			return true;
>> >> 
>> >> Should we check the following first?
>> >> 
>> >>         order > 0 && zone_watermark_ok_safe(zone, 0, mark, highest_zoneidx)
>> >
>> > That's what compaction_suitable() does. It checks whether there are
>> > enough migration targets for compaction (COMPACT_CONTINUE) or whether
>> > reclaim needs to do some more work (COMPACT_SKIPPED).
>> 
>> Yes.  And I found that the watermark used in compaction_suitable() is
>> low_wmark_pages() or min_wmark_pages(), which doesn't match the
>> watermark here.  Or did I miss something?
>
> Ahh, you're right, kswapd will bail prematurely. Compaction cannot
> reliably meet the high watermark with a low watermark scratch space.
>
> I'll add the order check before the suitable test, for clarity, and so
> that order-0 requests don't check the same thing twice.
>
> For the watermark, I'd make it an arg to compaction_suitable() and use
> whatever the reclaimer targets (high for kswapd, min for direct).
>
> However, there is a minor snag. compaction_suitable() currently has
> its own smarts regarding the watermark:
>
> 	/*
> 	 * Watermarks for order-0 must be met for compaction to be able to
> 	 * isolate free pages for migration targets. This means that the
> 	 * watermark and alloc_flags have to match, or be more pessimistic than
> 	 * the check in __isolate_free_page(). We don't use the direct
> 	 * compactor's alloc_flags, as they are not relevant for freepage
> 	 * isolation. We however do use the direct compactor's highest_zoneidx
> 	 * to skip over zones where lowmem reserves would prevent allocation
> 	 * even if compaction succeeds.
> 	 * For costly orders, we require low watermark instead of min for
> 	 * compaction to proceed to increase its chances.
> 	 * ALLOC_CMA is used, as pages in CMA pageblocks are considered
> 	 * suitable migration targets
> 	 */
> 	watermark = (order > PAGE_ALLOC_COSTLY_ORDER) ?
> 				low_wmark_pages(zone) : min_wmark_pages(zone);
>
> Historically it has always checked low instead of min. Then Vlastimil
> changed it to min for non-costly orders here:
>
> commit 8348faf91f56371d4bada6fc5915e19580a15ffe
> Author: Vlastimil Babka <vbabka@...e.cz>
> Date:   Fri Oct 7 16:58:00 2016 -0700
>
>     mm, compaction: require only min watermarks for non-costly orders
>     
>     The __compaction_suitable() function checks the low watermark plus a
>     compact_gap() gap to decide if there's enough free memory to perform
>     compaction.  Then __isolate_free_page uses low watermark check to decide
>     if particular free page can be isolated.  In the latter case, using low
>     watermark is needlessly pessimistic, as the free page isolations are
>     only temporary.  For __compaction_suitable() the higher watermark makes
>     sense for high-order allocations where more freepages increase the
>     chance of success, and we can typically fail with some order-0 fallback
>     when the system is struggling to reach that watermark.  But for
>     low-order allocation, forming the page should not be that hard.  So
>     using low watermark here might just prevent compaction from even trying,
>     and eventually lead to OOM killer even if we are above min watermarks.
>     
>     So after this patch, we use min watermark for non-costly orders in
>     __compaction_suitable(), and for all orders in __isolate_free_page().
>
> Lowering to min wasn't an issue for non-costly, but AFAICS there was
> no explicit testing for whether min would work for costly orders too.
>
> I'd propose trying it with min even for costly and see what happens.
>
> If it does regress, a better place to boost scratch space for costly
> orders might be compact_gap(), so I'd move it there.
>
> Does that sound reasonable?

Sounds good to me, Thanks!

Best Regards,
Huang, Ying

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ