lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 31 Jul 2013 14:37:53 +0900
From:	Joonsoo Kim <iamjoonsoo.kim@....com>
To:	David Gibson <david@...son.dropbear.id.au>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Rik van Riel <riel@...hat.com>, Mel Gorman <mgorman@...e.de>,
	Michal Hocko <mhocko@...e.cz>,
	"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Hugh Dickins <hughd@...gle.com>,
	Davidlohr Bueso <davidlohr.bueso@...com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	Wanpeng Li <liwanp@...ux.vnet.ibm.com>,
	Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
	Hillf Danton <dhillf@...il.com>
Subject: Re: [PATCH 17/18] mm, hugetlb: retry if we fail to allocate a
 hugepage with use_reserve

Hello, David.

On Mon, Jul 29, 2013 at 05:28:23PM +1000, David Gibson wrote:
> On Mon, Jul 29, 2013 at 02:32:08PM +0900, Joonsoo Kim wrote:
> > If parallel fault occur, we can fail to allocate a hugepage,
> > because many threads dequeue a hugepage to handle a fault of same address.
> > This makes reserved pool shortage just for a little while and this cause
> > faulting thread who is ensured to have enough reserved hugepages
> > to get a SIGBUS signal.
> 
> It's not just about reserved pages.  The same race can happen
> perfectly well when you're really, truly allocating the last hugepage
> in the system.

Yes, you are right.
This is a critical comment to this patchset :(

IIUC, the case you mentioned is about tasks which have a mapping
with MAP_NORESERVE. Should we ensure them to allocate the last hugepage?
They map a region with MAP_NORESERVE, so don't assume that their requests
always succeed.

> 
> > 
> > To solve this problem, we already have a nice solution, that is,
> > a hugetlb_instantiation_mutex. This blocks other threads to dive into
> > a fault handler. This solve the problem clearly, but it introduce
> > performance degradation, because it serialize all fault handling.
> > 
> > Now, I try to remove a hugetlb_instantiation_mutex to get rid of
> > performance degradation. A prerequisite is that other thread should
> > not get a SIGBUS if they are ensured to have enough reserved pages.
> > 
> > For this purpose, if we fail to allocate a new hugepage with use_reserve,
> > we return just 0, instead of VM_FAULT_SIGBUS. use_reserve
> > represent that this user is legimate one who are ensured to have enough
> > reserved pages. This prevent these thread not to get a SIGBUS signal and
> > make these thread retrying fault handling.
> 
> Not sufficient, since it can happen without reserved pages.

Ditto.

> 
> Also, I think there are edge cases where even reserved mappings can
> run out, in particular with the interaction between MAP_PRIVATE,
> fork() and reservations.  In this case, when you have a genuine out of
> memory condition, you will spin forever on the fault.

If there are edge cases, we can fix it. It doesn't matter.
If you find it, please tell me in more detail.

Thanks.

> 
> > 
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@....com>
> > 
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 6a9ec69..909075b 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -2623,7 +2623,10 @@ retry_avoidcopy:
> >  			WARN_ON_ONCE(1);
> >  		}
> >  
> > -		ret = VM_FAULT_SIGBUS;
> > +		if (use_reserve)
> > +			ret = 0;
> > +		else
> > +			ret = VM_FAULT_SIGBUS;
> >  		goto out_lock;
> >  	}
> >  
> > @@ -2741,7 +2744,10 @@ retry:
> >  
> >  		page = alloc_huge_page(vma, address, use_reserve);
> >  		if (IS_ERR(page)) {
> > -			ret = VM_FAULT_SIGBUS;
> > +			if (use_reserve)
> > +				ret = 0;
> > +			else
> > +				ret = VM_FAULT_SIGBUS;
> >  			goto out;
> >  		}
> >  		clear_huge_page(page, address, pages_per_huge_page(h));
> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ