linux-kernel - Re: [PATCH 6/7] mm/page_alloc: Give GFP_ATOMIC and non-blocking allocations access to reserves

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230112092452.rtvo6tkp4rpmxm7v@techsingularity.net>
Date:   Thu, 12 Jan 2023 09:24:52 +0000
From:   Mel Gorman <mgorman@...hsingularity.net>
To:     Michal Hocko <mhocko@...e.com>
Cc:     Linux-MM <linux-mm@...ck.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        NeilBrown <neilb@...e.de>,
        Thierry Reding <thierry.reding@...il.com>,
        Matthew Wilcox <willy@...radead.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 6/7] mm/page_alloc: Give GFP_ATOMIC and non-blocking
 allocations access to reserves

On Thu, Jan 12, 2023 at 09:11:06AM +0100, Michal Hocko wrote:
> On Wed 11-01-23 17:05:52, Mel Gorman wrote:
> > On Wed, Jan 11, 2023 at 04:58:02PM +0100, Michal Hocko wrote:
> > > On Mon 09-01-23 15:16:30, Mel Gorman wrote:
> > > > Explicit GFP_ATOMIC allocations get flagged ALLOC_HARDER which is a bit
> > > > vague. In preparation for removing __GFP_ATOMIC, give GFP_ATOMIC and
> > > > other non-blocking allocation requests equal access to reserve.  Rename
> > > > ALLOC_HARDER to ALLOC_NON_BLOCK to make it more clear what the flag
> > > > means.
> > > 
> > > GFP_NOWAIT can be also used for opportunistic allocations which can and
> > > should fail quickly if the memory is tight and more elaborate path
> > > should be taken (e.g. try higher order allocation first but fall back to
> > > smaller request if the memory is fragmented). Do we really want to give
> > > those access to memory reserves as well?
> > 
> > Good question. Without __GFP_ATOMIC, GFP_NOWAIT only differs from GFP_ATOMIC
> > by __GFP_HIGH but that is not enough to distinguish between a caller that
> > cannot sleep versus one that is speculatively attempting an allocation but
> > has other options. That changelog is misleading, it's not equal access
> > as GFP_NOWAIT ends up with 25% of the reserves which is less than what
> > GFP_ATOMIC gets.
> > 
> > Because it becomes impossible to distinguish between non-blocking and
> > atomic without __GFP_ATOMIC, there is some justification for allowing
> > access to reserves for GFP_NOWAIT. bio for example attempts an allocation
> > (clears __GFP_DIRECT_RECLAIM) before falling back to mempool but delays
> > in IO can also lead to further allocation pressure. mmu gather failing
> > GFP_WAIT slows the rate memory can be freed. NFS failing GFP_NOWAIT will
> > have to retry IOs multiple times. The examples were picked at random but
> > the point is that there are cases where failing GFP_NOWAIT can degrade
> > the system, particularly delay the cleaning of pages before reclaim.
> 
> Fair points.
> 
> > A lot of the truly speculative users appear to use GFP_NOWAIT | __GFP_NOWARN
> > so one compromise would be to avoid using reserves if __GFP_NOWARN is
> > also specified.
> > 
> > Something like this as a separate patch?
> 
> I cannot say I would be happy about adding more side effects to
> __GFP_NOWARN. You are right that it should be used for those optimistic
> allocation requests but historically all many of these subtle side effects
> have kicked back at some point.

True.

> Wouldn't it make sense to explicitly
> mark those places which really benefit from reserves instead?

That would be __GFP_HIGH and would require context from every caller on
whether they need reserves or not and to determine what the consequences
are if there is a stall. Is there immediate local fallout or wider fallout
such as a variable delay before pages can be cleaned?

> This is
> more work but it should pay off long term. Your examples above would use
> GFP_ATOMIC instead of GFP_NOWAIT.
> 

Yes, although it would confuse the meaning of GFP_ATOMIC as a result.
It's described as "%GFP_ATOMIC users can not sleep and need the allocation to
succeed" and something like the bio callsite does not *need* the allocation
to succeed. It can fallback to the mempool and performance simply degrades
temporarily. No doubt there are a few abuses of GFP_ATOMIC just to get
non-blocking behaviour already.

> The semantic would be easier to explain as well. GFP_ATOMIC - non
> sleeping allocations which are important so they have access to memory
> reserves. GFP_NOWAIT - non sleeping allocations.
> 

People's definition of "important" will vary wildly. The following would
avoid reserve access for GFP_NOWAIT for now. It would need to be folded
into this patch and a new changelog

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7244ab522028..aa20165224cf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3989,18 +3989,19 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
 		 * __GFP_HIGH allows access to 50% of the min reserve as well
 		 * as OOM.
 		 */
-		if (alloc_flags & ALLOC_MIN_RESERVE)
+		if (alloc_flags & ALLOC_MIN_RESERVE) {
 			min -= min / 2;
 
-		/*
-		 * Non-blocking allocations can access some of the reserve
-		 * with more access if also __GFP_HIGH. The reasoning is that
-		 * a non-blocking caller may incur a more severe penalty
-		 * if it cannot get memory quickly, particularly if it's
-		 * also __GFP_HIGH.
-		 */
-		if (alloc_flags & ALLOC_NON_BLOCK)
-			min -= min / 4;
+			/*
+			 * Non-blocking allocations (e.g. GFP_ATOMIC) can
+			 * access more reserves than just __GFP_HIGH. Other
+			 * non-blocking allocations requests such as GFP_NOWAIT
+			 * or (GFP_KERNEL & ~__GFP_DIRECT_RECLAIM) do not get
+			 * access to the min reserve.
+			 */
+			if (alloc_flags & ALLOC_NON_BLOCK)
+				min -= min / 4;
+		}
 
 		/*
 		 * OOM victims can try even harder than the normal reserve





-- 
Mel Gorman
SUSE Labs