linux-kernel - Re: [PATCH] mm, vmscan: prevent infinite loop for costly GFP_NOIO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJZwx_niaTD+n7mvKbzBQeEEki591Rg=W1cJpJew-iTo8P8X8g@mail.gmail.com>
Date: Wed, 28 Feb 2024 12:33:38 -0700
From: Karthikeyan Ramasubramanian <kramasub@...omium.org>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: Andrew Morton <akpm@...ux-foundation.org>, svenva@...omium.org, bgeffon@...gle.com, 
	cujomalainey@...omium.org, linux-kernel@...r.kernel.org, linux-mm@...ck.org, 
	linux-sound@...r.kernel.org, perex@...ex.cz, stable@...r.kernel.org, 
	tiwai@...e.com, tiwai@...e.de, Michal Hocko <mhocko@...nel.org>, 
	Mel Gorman <mgorman@...hsingularity.net>
Subject: Re: [PATCH] mm, vmscan: prevent infinite loop for costly GFP_NOIO |
 __GFP_RETRY_MAYFAIL allocations

On Wed, Feb 21, 2024 at 4:44 AM Vlastimil Babka <vbabka@...e.cz> wrote:
>
> Sven reports an infinite loop in __alloc_pages_slowpath() for costly
> order __GFP_RETRY_MAYFAIL allocations that are also GFP_NOIO. Such
> combination can happen in a suspend/resume context where a GFP_KERNEL
> allocation can have __GFP_IO masked out via gfp_allowed_mask.
>
> Quoting Sven:
>
> 1. try to do a "costly" allocation (order > PAGE_ALLOC_COSTLY_ORDER)
>    with __GFP_RETRY_MAYFAIL set.
>
> 2. page alloc's __alloc_pages_slowpath tries to get a page from the
>    freelist. This fails because there is nothing free of that costly
>    order.
>
> 3. page alloc tries to reclaim by calling __alloc_pages_direct_reclaim,
>    which bails out because a zone is ready to be compacted; it pretends
>    to have made a single page of progress.
>
> 4. page alloc tries to compact, but this always bails out early because
>    __GFP_IO is not set (it's not passed by the snd allocator, and even
>    if it were, we are suspending so the __GFP_IO flag would be cleared
>    anyway).
>
> 5. page alloc believes reclaim progress was made (because of the
>    pretense in item 3) and so it checks whether it should retry
>    compaction. The compaction retry logic thinks it should try again,
>    because:
>     a) reclaim is needed because of the early bail-out in item 4
>     b) a zonelist is suitable for compaction
>
> 6. goto 2. indefinite stall.
>
> (end quote)
>
> The immediate root cause is confusing the COMPACT_SKIPPED returned from
> __alloc_pages_direct_compact() (step 4) due to lack of __GFP_IO to be
> indicating a lack of order-0 pages, and in step 5 evaluating that in
> should_compact_retry() as a reason to retry, before incrementing and
> limiting the number of retries. There are however other places that
> wrongly assume that compaction can happen while we lack __GFP_IO.
>
> To fix this, introduce gfp_compaction_allowed() to abstract the __GFP_IO
> evaluation and switch the open-coded test in try_to_compact_pages() to
> use it.
>
> Also use the new helper in:
> - compaction_ready(), which will make reclaim not bail out in step 3, so
>   there's at least one attempt to actually reclaim, even if chances are
>   small for a costly order
> - in_reclaim_compaction() which will make should_continue_reclaim()
>   return false and we don't over-reclaim unnecessarily
> - in __alloc_pages_slowpath() to set a local variable can_compact,
>   which is then used to avoid retrying reclaim/compaction for costly
>   allocations (step 5) if we can't compact and also to skip the early
>   compaction attempt that we do in some cases
>
> Reported-by: Sven van Ashbrook <svenva@...omium.org>
> Closes: https://lore.kernel.org/all/CAG-rBihs_xMKb3wrMO1%2B-%2Bp4fowP9oy1pa_OTkfxBzPUVOZF%2Bg@mail.gmail.com/
> Fixes: 3250845d0526 ("Revert "mm, oom: prevent premature OOM killer invocation for high order request"")
> Cc: <stable@...r.kernel.org>
> Signed-off-by: Vlastimil Babka <vbabka@...e.cz>
Tested-by: Karthikeyan Ramasubramanian <kramasub@...omium.org>
> ---
>  include/linux/gfp.h |  9 +++++++++
>  mm/compaction.c     |  7 +------
>  mm/page_alloc.c     | 10 ++++++----
>  mm/vmscan.c         |  5 ++++-
>  4 files changed, 20 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index de292a007138..e2a916cf29c4 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -353,6 +353,15 @@ static inline bool gfp_has_io_fs(gfp_t gfp)
>         return (gfp & (__GFP_IO | __GFP_FS)) == (__GFP_IO | __GFP_FS);
>  }
>
> +/*
> + * Check if the gfp flags allow compaction - GFP_NOIO is a really
> + * tricky context because the migration might require IO.
> + */
> +static inline bool gfp_compaction_allowed(gfp_t gfp_mask)
> +{
> +       return IS_ENABLED(CONFIG_COMPACTION) && (gfp_mask & __GFP_IO);
> +}
> +
>  extern gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma);
>
>  #ifdef CONFIG_CONTIG_ALLOC
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 4add68d40e8d..b961db601df4 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2723,16 +2723,11 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
>                 unsigned int alloc_flags, const struct alloc_context *ac,
>                 enum compact_priority prio, struct page **capture)
>  {
> -       int may_perform_io = (__force int)(gfp_mask & __GFP_IO);
>         struct zoneref *z;
>         struct zone *zone;
>         enum compact_result rc = COMPACT_SKIPPED;
>
> -       /*
> -        * Check if the GFP flags allow compaction - GFP_NOIO is really
> -        * tricky context because the migration might require IO
> -        */
> -       if (!may_perform_io)
> +       if (!gfp_compaction_allowed(gfp_mask))
>                 return COMPACT_SKIPPED;
>
>         trace_mm_compaction_try_to_compact_pages(order, gfp_mask, prio);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 150d4f23b010..a663202045dc 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4041,6 +4041,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>                                                 struct alloc_context *ac)
>  {
>         bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM;
> +       bool can_compact = gfp_compaction_allowed(gfp_mask);
>         const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER;
>         struct page *page = NULL;
>         unsigned int alloc_flags;
> @@ -4111,7 +4112,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>          * Don't try this for allocations that are allowed to ignore
>          * watermarks, as the ALLOC_NO_WATERMARKS attempt didn't yet happen.
>          */
> -       if (can_direct_reclaim &&
> +       if (can_direct_reclaim && can_compact &&
>                         (costly_order ||
>                            (order > 0 && ac->migratetype != MIGRATE_MOVABLE))
>                         && !gfp_pfmemalloc_allowed(gfp_mask)) {
> @@ -4209,9 +4210,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>
>         /*
>          * Do not retry costly high order allocations unless they are
> -        * __GFP_RETRY_MAYFAIL
> +        * __GFP_RETRY_MAYFAIL and we can compact
>          */
> -       if (costly_order && !(gfp_mask & __GFP_RETRY_MAYFAIL))
> +       if (costly_order && (!can_compact ||
> +                            !(gfp_mask & __GFP_RETRY_MAYFAIL)))
>                 goto nopage;
>
>         if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
> @@ -4224,7 +4226,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>          * implementation of the compaction depends on the sufficient amount
>          * of free memory (see __compaction_suitable)
>          */
> -       if (did_some_progress > 0 &&
> +       if (did_some_progress > 0 && can_compact &&
>                         should_compact_retry(ac, order, alloc_flags,
>                                 compact_result, &compact_priority,
>                                 &compaction_retries))
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 4f9c854ce6cc..4255619a1a31 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -5753,7 +5753,7 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
>  /* Use reclaim/compaction for costly allocs or under memory pressure */
>  static bool in_reclaim_compaction(struct scan_control *sc)
>  {
> -       if (IS_ENABLED(CONFIG_COMPACTION) && sc->order &&
> +       if (gfp_compaction_allowed(sc->gfp_mask) && sc->order &&
>                         (sc->order > PAGE_ALLOC_COSTLY_ORDER ||
>                          sc->priority < DEF_PRIORITY - 2))
>                 return true;
> @@ -5998,6 +5998,9 @@ static inline bool compaction_ready(struct zone *zone, struct scan_control *sc)
>  {
>         unsigned long watermark;
>
> +       if (!gfp_compaction_allowed(sc->gfp_mask))
> +               return false;
> +
>         /* Allocation can already succeed, nothing to do */
>         if (zone_watermark_ok(zone, sc->order, min_wmark_pages(zone),
>                               sc->reclaim_idx, 0))
> --
> 2.43.1
>