linux-kernel - Re: [PATCH RFC 1/3] mm/hugetlb: split alloc_fresh_huge_page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170124165200.GB30832@dhcp22.suse.cz>
Date:   Tue, 24 Jan 2017 17:52:00 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     Jia He <hejianet@...il.com>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        "Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
        Gerald Schaefer <gerald.schaefer@...ibm.com>,
        zhong jiang <zhongjiang@...wei.com>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Vaishali Thakkar <vaishali.thakkar@...cle.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Vlastimil Babka <vbabka@...e.cz>,
        Minchan Kim <minchan@...nel.org>,
        Rik van Riel <riel@...hat.com>
Subject: Re: [PATCH RFC 1/3] mm/hugetlb: split alloc_fresh_huge_page_node
 into fast and slow path

On Tue 24-01-17 15:49:02, Jia He wrote:
> This patch split alloc_fresh_huge_page_node into 2 parts:
> - fast path without __GFP_REPEAT flag
> - slow path with __GFP_REPEAT flag
> 
> Thus, if there is a server with uneven numa memory layout:
> available: 7 nodes (0-6)
> node 0 cpus: 0 1 2 3 4 5 6 7
> node 0 size: 6603 MB
> node 0 free: 91 MB
> node 1 cpus:
> node 1 size: 12527 MB
> node 1 free: 157 MB
> node 2 cpus:
> node 2 size: 15087 MB
> node 2 free: 189 MB
> node 3 cpus:
> node 3 size: 16111 MB
> node 3 free: 205 MB
> node 4 cpus: 8 9 10 11 12 13 14 15
> node 4 size: 24815 MB
> node 4 free: 310 MB
> node 5 cpus:
> node 5 size: 4095 MB
> node 5 free: 61 MB
> node 6 cpus:
> node 6 size: 22750 MB
> node 6 free: 283 MB
> node distances:
> node   0   1   2   3   4   5   6
>   0:  10  20  40  40  40  40  40
>   1:  20  10  40  40  40  40  40
>   2:  40  40  10  20  40  40  40
>   3:  40  40  20  10  40  40  40
>   4:  40  40  40  40  10  20  40
>   5:  40  40  40  40  20  10  40
>   6:  40  40  40  40  40  40  10
> 
> In this case node 5 has less memory and we will alloc the hugepages
> from these nodes one by one.
> After this patch, we will not trigger too early direct memory/kswap
> reclaim for node 5 if there are enough memory in other nodes.

This description is doesn't explain what is the problem, why it matters
and how the fix actually works. Moreover it does opposite what is
claims. Which brings me to another question. How has this been tested? 

> Signed-off-by: Jia He <hejianet@...il.com>
> ---
>  mm/hugetlb.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index c7025c1..f2415ce 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1364,10 +1364,19 @@ static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid)
>  {
>  	struct page *page;
>  
> +	/* fast path without __GFP_REPEAT */
>  	page = __alloc_pages_node(nid,
>  		htlb_alloc_mask(h)|__GFP_COMP|__GFP_THISNODE|
>  						__GFP_REPEAT|__GFP_NOWARN,
>  		huge_page_order(h));

this does opposite what the comment says.

> +
> +	/* slow path with __GFP_REPEAT*/
> +	if (!page)
> +		page = __alloc_pages_node(nid,
> +			htlb_alloc_mask(h)|__GFP_COMP|__GFP_THISNODE|
> +					__GFP_NOWARN,
> +			huge_page_order(h));
> +
>  	if (page) {
>  		prep_new_huge_page(h, page, nid);
>  	}
> -- 
> 2.5.5
> 

-- 
Michal Hocko
SUSE Labs