lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20241201041124.472908-1-snishika@redhat.com>
Date: Sun,  1 Dec 2024 13:11:23 +0900
From: Seiji Nishikawa <snishika@...hat.com>
To: akpm@...ux-foundation.org
Cc: linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	mgorman@...hsingularity.net,
	snishika@...hat.com
Subject: Re: [PATCH] mm: vmscan: account for free pages to prevent infinite Loop in throttle_direct_reclaim()

On Sun, Dec 1, 2024 at 11:40 AM Andrew Morton <akpm@...ux-foundation.org> wrote:
>
> On Sun,  1 Dec 2024 01:12:34 +0900 Seiji Nishikawa <snishika@...hat.com> wrote:
>
> > The kernel hangs due to a task stuck in throttle_direct_reclaim(),
> > caused by a node being incorrectly deemed balanced despite pressure in
> > certain zones, such as ZONE_NORMAL. This issue arises from
> > zone_reclaimable_pages() returning 0 for zones without reclaimable file-
> > backed or anonymous pages, causing zones like ZONE_DMA32 with sufficient
> > free pages to be skipped.
> >
> > The lack of swap or reclaimable pages results in ZONE_DMA32 being
> > ignored during reclaim, masking pressure in other zones. Consequently,
> > pgdat->kswapd_failures remains 0 in balance_pgdat(), preventing fallback
> > mechanisms in allow_direct_reclaim() from being triggered, leading to an
> > infinite loop in throttle_direct_reclaim().
> >
> > This patch modifies zone_reclaimable_pages() to account for free pages
> > (NR_FREE_PAGES) when no other reclaimable pages exist. This ensures
> > zones with sufficient free pages are not skipped, enabling proper
> > balancing and reclaim behavior.
>
> We'll want to backport a fix for this into -stable kernels.  For that
> it's best to be able to identify a suitable Fixes: target, to tell
> others whether their kernel needs the fix.  Are you able to help
> identify that commit?

Based on my analysis, the issue appears to be fundamentally rooted in 
the original design of zone_reclaimable_pages(). The subsequent change 
introduced with a2a36488a61c ("mm/vmscan: Consider anonymous pages 
without swap") does not fundamentally alter the behavior but it just 
refines the handling of anonymous pages. It does not account for zones 
with sufficient free pages but no reclaimable file-backed or anonymous 
pages. The relevant commit that introduced this function is:

Fixes: 5a1c84b404a7 ("mm: remove reclaim and compaction retry approximations")

This commit seems to be the most appropriate target for the Fixes: tag,
as it introduced the logic that my patch modifies to address the 
observed kernel hang.

>
> Thanks.
>
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -374,7 +374,14 @@ unsigned long zone_reclaimable_pages(struct zone *zone)
> >       if (can_reclaim_anon_pages(NULL, zone_to_nid(zone), NULL))
> >               nr += zone_page_state_snapshot(zone, NR_ZONE_INACTIVE_ANON) +
> >                       zone_page_state_snapshot(zone, NR_ZONE_ACTIVE_ANON);
> > -
> > +     /*
> > +      * If there are no reclaimable file-backed or anonymous pages,
> > +      * ensure zones with sufficient free pages are not skipped.
> > +      * This prevents zones like DMA32 from being ignored in reclaim
> > +      * scenarios where they can still help alleviate memory pressure.
> > +      */
> > +     if (nr == 0)
> > +         nr = zone_page_state_snapshot(zone, NR_FREE_PAGES);
> >       return nr;
> >  }
> >
> > --
> > 2.47.0
>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ