lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250323005823.GB1894930@cmpxchg.org>
Date: Sat, 22 Mar 2025 20:58:23 -0400
From: Johannes Weiner <hannes@...xchg.org>
To: Brendan Jackman <jackmanb@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	Vlastimil Babka <vbabka@...e.cz>,
	Mel Gorman <mgorman@...hsingularity.net>, Zi Yan <ziy@...dia.com>,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/5] mm: page_alloc: defrag_mode

On Sat, Mar 22, 2025 at 04:05:52PM +0100, Brendan Jackman wrote:
> On Thu Mar 13, 2025 at 10:05 PM CET, Johannes Weiner wrote:
> > +	/* Reclaim/compaction failed to prevent the fallback */
> > +	if (defrag_mode) {
> > +		alloc_flags &= ALLOC_NOFRAGMENT;
> > +		goto retry;
> > +	}
> 
> I can't see where ALLOC_NOFRAGMENT gets cleared, is it supposed to be
> here (i.e. should this be ~ALLOC_NOFRAGMENT)?

Yes, it should be. Thanks for catching that.

Note that this happens right before OOM, and __alloc_pages_may_oom()
does another allocation attempt without the flag set. In fact, I was
briefly debating whether I need the explicit retry here at all, but
then decided it's clearer and more future proof than quietly relying
on that OOM attempt, which is really only there to check for racing
frees. But this is most likely what hid this during testing.

What might be more of an issue is retrying without ALLOC_CPUSET and
then potentially violating cgroup placement rules too readily -
e.g. OOM only does that for __GFP_NOFAIL.

---

>From e81c2086ee8e4b9f2750b821e104d3b5174b81f2 Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@...xchg.org>
Date: Sat, 22 Mar 2025 19:21:45 -0400
Subject: [PATCH] mm: page_alloc: fix defrag_mode's last allocation before OOM

Brendan points out that defrag_mode doesn't properly clear
ALLOC_NOFRAGMENT on its last-ditch attempt to allocate.

This is not necessarily a practical issue because it's followed by
__alloc_pages_may_oom(), which does its own attempt at the freelist
without ALLOC_NOFRAGMENT set. However, this is restricted to the high
watermark instead of the usual min mark (since it's merely to check
for racing frees). While this usually works - we just ran a full set
of reclaim/compaction, after all, and likely failed due to a lack of
pageblocks rather than watermarks - it's not as reliable as intended.

A more practical implication is retrying with the other flags cleared,
which means ALLOC_CPUSET is cleared, which can violate placement rules
defined by cgroup policy - OOM usually only does this for GFP_NOFAIL.

Reported-by: Brendan Jackman <jackmanb@...gle.com>
Signed-off-by: Johannes Weiner <hannes@...xchg.org>
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0c01998cb3a0..b9ee0c00eea5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4544,7 +4544,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 
 	/* Reclaim/compaction failed to prevent the fallback */
 	if (defrag_mode) {
-		alloc_flags &= ALLOC_NOFRAGMENT;
+		alloc_flags &= ~ALLOC_NOFRAGMENT;
 		goto retry;
 	}
 
-- 
2.49.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ