linux-kernel - Re: [PATCH 17/18] mm, hugetlb: retry if we fail to allocate a hugepage with use

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130729072823.GD29970@voom.fritz.box>
Date:	Mon, 29 Jul 2013 17:28:23 +1000
From:	David Gibson <david@...son.dropbear.id.au>
To:	Joonsoo Kim <iamjoonsoo.kim@....com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Rik van Riel <riel@...hat.com>, Mel Gorman <mgorman@...e.de>,
	Michal Hocko <mhocko@...e.cz>,
	"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Hugh Dickins <hughd@...gle.com>,
	Davidlohr Bueso <davidlohr.bueso@...com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, Joonsoo Kim <js1304@...il.com>,
	Wanpeng Li <liwanp@...ux.vnet.ibm.com>,
	Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
	Hillf Danton <dhillf@...il.com>
Subject: Re: [PATCH 17/18] mm, hugetlb: retry if we fail to allocate a
 hugepage with use_reserve

On Mon, Jul 29, 2013 at 02:32:08PM +0900, Joonsoo Kim wrote:
> If parallel fault occur, we can fail to allocate a hugepage,
> because many threads dequeue a hugepage to handle a fault of same address.
> This makes reserved pool shortage just for a little while and this cause
> faulting thread who is ensured to have enough reserved hugepages
> to get a SIGBUS signal.

It's not just about reserved pages.  The same race can happen
perfectly well when you're really, truly allocating the last hugepage
in the system.

> 
> To solve this problem, we already have a nice solution, that is,
> a hugetlb_instantiation_mutex. This blocks other threads to dive into
> a fault handler. This solve the problem clearly, but it introduce
> performance degradation, because it serialize all fault handling.
> 
> Now, I try to remove a hugetlb_instantiation_mutex to get rid of
> performance degradation. A prerequisite is that other thread should
> not get a SIGBUS if they are ensured to have enough reserved pages.
> 
> For this purpose, if we fail to allocate a new hugepage with use_reserve,
> we return just 0, instead of VM_FAULT_SIGBUS. use_reserve
> represent that this user is legimate one who are ensured to have enough
> reserved pages. This prevent these thread not to get a SIGBUS signal and
> make these thread retrying fault handling.

Not sufficient, since it can happen without reserved pages.

Also, I think there are edge cases where even reserved mappings can
run out, in particular with the interaction between MAP_PRIVATE,
fork() and reservations.  In this case, when you have a genuine out of
memory condition, you will spin forever on the fault.

> 
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@....com>
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 6a9ec69..909075b 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2623,7 +2623,10 @@ retry_avoidcopy:
>  			WARN_ON_ONCE(1);
>  		}
>  
> -		ret = VM_FAULT_SIGBUS;
> +		if (use_reserve)
> +			ret = 0;
> +		else
> +			ret = VM_FAULT_SIGBUS;
>  		goto out_lock;
>  	}
>  
> @@ -2741,7 +2744,10 @@ retry:
>  
>  		page = alloc_huge_page(vma, address, use_reserve);
>  		if (IS_ERR(page)) {
> -			ret = VM_FAULT_SIGBUS;
> +			if (use_reserve)
> +				ret = 0;
> +			else
> +				ret = VM_FAULT_SIGBUS;
>  			goto out;
>  		}
>  		clear_huge_page(page, address, pages_per_huge_page(h));

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

Content of type "application/pgp-signature" skipped