linux-kernel - Re: [PATCH 06/10] mm: page_alloc: Distinguish between being unable to sleep, unwilling to unwilling and avoiding waking kswapd

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <55D49658.50802@suse.cz>
Date:	Wed, 19 Aug 2015 16:44:40 +0200
From:	Vlastimil Babka <vbabka@...e.cz>
To:	Mel Gorman <mgorman@...hsingularity.net>,
	Linux-MM <linux-mm@...ck.org>
Cc:	Johannes Weiner <hannes@...xchg.org>,
	Rik van Riel <riel@...hat.com>,
	David Rientjes <rientjes@...gle.com>,
	Joonsoo Kim <iamjoonsoo.kim@....com>,
	Michal Hocko <mhocko@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 06/10] mm: page_alloc: Distinguish between being unable to
 sleep, unwilling to unwilling and avoiding waking kswapd

On 08/12/2015 12:45 PM, Mel Gorman wrote:
> __GFP_WAIT has been used to identify atomic context in callers that hold
> spinlocks or are in interrupts. They are expected to be high priority and
> have access one of two watermarks lower than "min". __GFP_HIGH users get
> access to the first lower watermark and can be called the "high priority
> reserve". Atomic users and interrupts access yet another lower watermark
> that can be called the "atomic reserve".
> 
> Over time, callers had a requirement to not block when fallback options
> were available. Some have abused __GFP_WAIT leading to a situation where
> an optimisitic allocation with a fallback option can access atomic reserves.
> 
> This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
> cannot sleep and have no alternative. High priority users continue to use
> __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and are
> willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify callers
> that want to wake kswapd for background reclaim. __GFP_WAIT is redefined
> as a caller that is willing to enter direct reclaim and wake kswapd for
> background reclaim.
> 
> This patch then converts a number of sites
> 
> o __GFP_ATOMIC is used by callers that are high priority and have memory
>   pools for those requests. GFP_ATOMIC uses this flag. Callers with
>   interrupts disabled still automatically use the atomic reserves.
> 
> o Callers that have a limited mempool to guarantee forward progress use
>   __GFP_DIRECT_RECLAIM. bio allocations fall into this category where
>   kswapd will still be woken but atomic reserves are not used as there
>   is a one-entry mempool to guarantee progress.
> 
> o Callers that are checking if they are non-blocking should use the
>   helper gfpflags_allows_blocking() where possible. This is because
>   checking for __GFP_WAIT as was done historically now can trigger false
>   positives. Some exceptions like dm-crypt.c exist where the code intent
>   is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
>   flag manipulations.
> 
> The key hazard to watch out for is callers that removed __GFP_WAIT and
> was depending on access to atomic reserves for inconspicuous reasons.
> In some cases it may be appropriate for them to use __GFP_HIGH.
> 
> Signed-off-by: Mel Gorman <mgorman@...hsingularity.net>

I like the approach, it makes things much clearer. Note that this isn't full
review, just something that crossed my mind.

> @@ -117,9 +132,9 @@ struct vm_area_struct;
>  #define GFP_HIGHUSER	(GFP_USER | __GFP_HIGHMEM)
>  #define GFP_HIGHUSER_MOVABLE	(GFP_HIGHUSER | __GFP_MOVABLE)
>  #define GFP_IOFS	(__GFP_IO | __GFP_FS)
> -#define GFP_TRANSHUGE	(GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
> -			 __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN | \
> -			 __GFP_NO_KSWAPD)
> +#define GFP_TRANSHUGE	((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
> +			 __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN) & \
> +			 ~__GFP_KSWAPD_RECLAIM)

Unfortunately this is not as simple for all uses of GFP_TRANSHUGE.
Namely in __alloc_pages_slowpath() the checks could use __GFP_NO_KSWAPD as one
of the distinguishing flags, but to test for lack of __GFP_KSWAPD_RECLAIM, they
should be adjusted in order to be functionally equivalent.
Yes, it would be better if we could get rid of them, but that's out of scope
here. So, something like this?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index beda417..e019a89 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3073,7 +3073,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		goto got_pg;
 
 	/* Checks for THP-specific high-order allocations */
-	if ((gfp_mask & GFP_TRANSHUGE) == GFP_TRANSHUGE) {
+	if ((gfp_mask & (GFP_TRANSHUGE | __GFP_KSWAPD_RECLAIM))
+							== GFP_TRANSHUGE) {
 		/*
 		 * If compaction is deferred for high-order allocations, it is
 		 * because sync compaction recently failed. If this is the case
@@ -3108,8 +3109,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	 * fault, so use asynchronous memory compaction for THP unless it is
 	 * khugepaged trying to collapse.
 	 */
-	if ((gfp_mask & GFP_TRANSHUGE) != GFP_TRANSHUGE ||
-						(current->flags & PF_KTHREAD))
+	if ((gfp_mask & (GFP_TRANSHUGE | __GFP_KSWAPD_RECLAIM)
+			!= GFP_TRANSHUGE) || (current->flags & PF_KTHREAD))
 		migration_mode = MIGRATE_SYNC_LIGHT;
 
 	/* Try direct reclaim and then allocating */



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/