lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170124165200.GB30832@dhcp22.suse.cz>
Date:   Tue, 24 Jan 2017 17:52:00 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     Jia He <hejianet@...il.com>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        "Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
        Gerald Schaefer <gerald.schaefer@...ibm.com>,
        zhong jiang <zhongjiang@...wei.com>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Vaishali Thakkar <vaishali.thakkar@...cle.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Vlastimil Babka <vbabka@...e.cz>,
        Minchan Kim <minchan@...nel.org>,
        Rik van Riel <riel@...hat.com>
Subject: Re: [PATCH RFC 1/3] mm/hugetlb: split alloc_fresh_huge_page_node
 into fast and slow path

On Tue 24-01-17 15:49:02, Jia He wrote:
> This patch split alloc_fresh_huge_page_node into 2 parts:
> - fast path without __GFP_REPEAT flag
> - slow path with __GFP_REPEAT flag
> 
> Thus, if there is a server with uneven numa memory layout:
> available: 7 nodes (0-6)
> node 0 cpus: 0 1 2 3 4 5 6 7
> node 0 size: 6603 MB
> node 0 free: 91 MB
> node 1 cpus:
> node 1 size: 12527 MB
> node 1 free: 157 MB
> node 2 cpus:
> node 2 size: 15087 MB
> node 2 free: 189 MB
> node 3 cpus:
> node 3 size: 16111 MB
> node 3 free: 205 MB
> node 4 cpus: 8 9 10 11 12 13 14 15
> node 4 size: 24815 MB
> node 4 free: 310 MB
> node 5 cpus:
> node 5 size: 4095 MB
> node 5 free: 61 MB
> node 6 cpus:
> node 6 size: 22750 MB
> node 6 free: 283 MB
> node distances:
> node   0   1   2   3   4   5   6
>   0:  10  20  40  40  40  40  40
>   1:  20  10  40  40  40  40  40
>   2:  40  40  10  20  40  40  40
>   3:  40  40  20  10  40  40  40
>   4:  40  40  40  40  10  20  40
>   5:  40  40  40  40  20  10  40
>   6:  40  40  40  40  40  40  10
> 
> In this case node 5 has less memory and we will alloc the hugepages
> from these nodes one by one.
> After this patch, we will not trigger too early direct memory/kswap
> reclaim for node 5 if there are enough memory in other nodes.

This description is doesn't explain what is the problem, why it matters
and how the fix actually works. Moreover it does opposite what is
claims. Which brings me to another question. How has this been tested? 

> Signed-off-by: Jia He <hejianet@...il.com>
> ---
>  mm/hugetlb.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index c7025c1..f2415ce 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1364,10 +1364,19 @@ static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid)
>  {
>  	struct page *page;
>  
> +	/* fast path without __GFP_REPEAT */
>  	page = __alloc_pages_node(nid,
>  		htlb_alloc_mask(h)|__GFP_COMP|__GFP_THISNODE|
>  						__GFP_REPEAT|__GFP_NOWARN,
>  		huge_page_order(h));

this does opposite what the comment says.

> +
> +	/* slow path with __GFP_REPEAT*/
> +	if (!page)
> +		page = __alloc_pages_node(nid,
> +			htlb_alloc_mask(h)|__GFP_COMP|__GFP_THISNODE|
> +					__GFP_NOWARN,
> +			huge_page_order(h));
> +
>  	if (page) {
>  		prep_new_huge_page(h, page, nid);
>  	}
> -- 
> 2.5.5
> 

-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ