linux-kernel - Re: [RFC PATCH 02/26] mm: compaction: avoid GFP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230421122743.d7xfvzyhiunbphh3@techsingularity.net>
Date:   Fri, 21 Apr 2023 13:27:43 +0100
From:   Mel Gorman <mgorman@...hsingularity.net>
To:     Johannes Weiner <hannes@...xchg.org>
Cc:     linux-mm@...ck.org, Kaiyang Zhao <kaiyang2@...cmu.edu>,
        Vlastimil Babka <vbabka@...e.cz>,
        David Rientjes <rientjes@...gle.com>,
        linux-kernel@...r.kernel.org, kernel-team@...com
Subject: Re: [RFC PATCH 02/26] mm: compaction: avoid GFP_NOFS deadlocks

On Tue, Apr 18, 2023 at 03:12:49PM -0400, Johannes Weiner wrote:
> During stress testing, two deadlock scenarios were observed:
> 
> 1. One GFP_NOFS allocation was sleeping on too_many_isolated(), and
>    all CPUs were busy with compactors that appeared to be spinning on
>    buffer locks.
> 
>    Give GFP_NOFS compactors additional isolation headroom, the same
>    way we do during reclaim, to eliminate this deadlock scenario.
> 
> 2. In a more pernicious scenario, the GFP_NOFS allocation was
>    busy-spinning in compaction, but seemingly never making
>    progress. Upon closer inspection, memory was dominated by file
>    pages, which the fs compactor isn't allowed to touch. The remaining
>    anon pages didn't have the contiguity to satisfy the request.
> 
>    Allow GFP_NOFS allocations to bypass watermarks when compaction
>    failed at the highest priority.
> 
> While these deadlocks were encountered only in tests with the
> subsequent patches (which put a lot more demand on compaction), in
> theory these problems already exist in the code today. Fix them now.
> 
> Signed-off-by: Johannes Weiner <hannes@...xchg.org>

Definitely needs to be split out.


> ---
>  mm/compaction.c | 15 +++++++++++++--
>  mm/page_alloc.c | 10 +++++++++-
>  2 files changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 8238e83385a7..84db84e8fd3a 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -745,8 +745,9 @@ isolate_freepages_range(struct compact_control *cc,
>  }
>  
>  /* Similar to reclaim, but different enough that they don't share logic */
> -static bool too_many_isolated(pg_data_t *pgdat)
> +static bool too_many_isolated(struct compact_control *cc)
>  {
> +	pg_data_t *pgdat = cc->zone->zone_pgdat;
>  	bool too_many;
>  
>  	unsigned long active, inactive, isolated;
> @@ -758,6 +759,16 @@ static bool too_many_isolated(pg_data_t *pgdat)
>  	isolated = node_page_state(pgdat, NR_ISOLATED_FILE) +
>  			node_page_state(pgdat, NR_ISOLATED_ANON);
>  
> +	/*
> +	 * GFP_NOFS callers are allowed to isolate more pages, so they
> +	 * won't get blocked by normal direct-reclaimers, forming a
> +	 * circular deadlock. GFP_NOIO won't get here.
> +	 */
> +	if (cc->gfp_mask & __GFP_FS) {
> +		inactive >>= 3;
> +		active >>= 3;
> +	}
> +

This comment needs to explain why GFP_NOFS gets special treatment
explaning that a GFP_NOFS context may not be able to migrate pages and
why.

As a follow-up, if GFP_NOFS cannot deal with the majority of the
migration contexts then it should bail out of compaction entirely. The
changelog doesn't say why but maybe SYNC_LIGHT is the issue?

-- 
Mel Gorman
SUSE Labs