linux-kernel - Re: [PATCH v2] mm/page_alloc: call check_new_pages() while zone spinlock is not held

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <862170fd-a325-a158-36b8-eb73b15c2629@suse.cz>
Date:   Mon, 7 Mar 2022 10:24:43 +0100
From:   Vlastimil Babka <vbabka@...e.cz>
To:     Eric Dumazet <eric.dumazet@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>
Cc:     linux-kernel <linux-kernel@...r.kernel.org>,
        linux-mm <linux-mm@...ck.org>,
        Eric Dumazet <edumazet@...gle.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Michal Hocko <mhocko@...nel.org>,
        Shakeel Butt <shakeelb@...gle.com>,
        Wei Xu <weixugc@...gle.com>, Greg Thelen <gthelen@...gle.com>,
        Hugh Dickins <hughd@...gle.com>,
        David Rientjes <rientjes@...gle.com>
Subject: Re: [PATCH v2] mm/page_alloc: call check_new_pages() while zone
 spinlock is not held

On 3/4/22 18:02, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@...gle.com>
> 
> For high order pages not using pcp, rmqueue() is currently calling
> the costly check_new_pages() while zone spinlock is held,
> and hard irqs masked.
> 
> This is not needed, we can release the spinlock sooner to reduce
> zone spinlock contention.
> 
> Note that after this patch, we call __mod_zone_freepage_state()
> before deciding to leak the page because it is in bad state.

Which is arguably an accounting fix on its own, because when we remove page
from the free list, we should decrease the respective counter(s) even if we
find the page is in bad state and discard (effectively leak) it.

> 
> v2: We need to keep interrupts disabled to call __mod_zone_freepage_state()
> 
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>

Reviewed-by: Vlastimil Babka <vbabka@...e.cz>

> Cc: Mel Gorman <mgorman@...hsingularity.net>
> Cc: Vlastimil Babka <vbabka@...e.cz>
> Cc: Michal Hocko <mhocko@...nel.org>
> Cc: Shakeel Butt <shakeelb@...gle.com>
> Cc: Wei Xu <weixugc@...gle.com>
> Cc: Greg Thelen <gthelen@...gle.com>
> Cc: Hugh Dickins <hughd@...gle.com>
> Cc: David Rientjes <rientjes@...gle.com>
> ---
>  mm/page_alloc.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3589febc6d31928f850ebe5a4015ddc40e0469f3..1804287c1b792b8aa0e964b17eb002b6b1115258 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3706,10 +3706,10 @@ struct page *rmqueue(struct zone *preferred_zone,
>  	 * allocate greater than order-1 page units with __GFP_NOFAIL.
>  	 */
>  	WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
> -	spin_lock_irqsave(&zone->lock, flags);
>  
>  	do {
>  		page = NULL;
> +		spin_lock_irqsave(&zone->lock, flags);
>  		/*
>  		 * order-0 request can reach here when the pcplist is skipped
>  		 * due to non-CMA allocation context. HIGHATOMIC area is
> @@ -3721,15 +3721,15 @@ struct page *rmqueue(struct zone *preferred_zone,
>  			if (page)
>  				trace_mm_page_alloc_zone_locked(page, order, migratetype);
>  		}
> -		if (!page)
> +		if (!page) {
>  			page = __rmqueue(zone, order, migratetype, alloc_flags);
> -	} while (page && check_new_pages(page, order));
> -	if (!page)
> -		goto failed;
> -
> -	__mod_zone_freepage_state(zone, -(1 << order),
> -				  get_pcppage_migratetype(page));
> -	spin_unlock_irqrestore(&zone->lock, flags);
> +			if (!page)
> +				goto failed;
> +		}
> +		__mod_zone_freepage_state(zone, -(1 << order),
> +					  get_pcppage_migratetype(page));
> +		spin_unlock_irqrestore(&zone->lock, flags);
> +	} while (check_new_pages(page, order));
>  
>  	__count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order);
>  	zone_statistics(preferred_zone, zone, 1);