linux-kernel - Re: [PATCH v3 13/14] mm, hugetlb: retry if failed to allocate and there is concurrent user

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20131219170202.0df2d82a2adefa3ab616bdaa@linux-foundation.org>
Date:	Thu, 19 Dec 2013 17:02:02 -0800
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Joonsoo Kim <iamjoonsoo.kim@....com>
Cc:	Rik van Riel <riel@...hat.com>, Mel Gorman <mgorman@...e.de>,
	Michal Hocko <mhocko@...e.cz>,
	"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Hugh Dickins <hughd@...gle.com>,
	Davidlohr Bueso <davidlohr.bueso@...com>,
	David Gibson <david@...son.dropbear.id.au>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, Joonsoo Kim <js1304@...il.com>,
	Wanpeng Li <liwanp@...ux.vnet.ibm.com>,
	Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
	Hillf Danton <dhillf@...il.com>
Subject: Re: [PATCH v3 13/14] mm, hugetlb: retry if failed to allocate and
 there is concurrent user

On Wed, 18 Dec 2013 15:53:59 +0900 Joonsoo Kim <iamjoonsoo.kim@....com> wrote:

> If parallel fault occur, we can fail to allocate a hugepage,
> because many threads dequeue a hugepage to handle a fault of same address.
> This makes reserved pool shortage just for a little while and this cause
> faulting thread who can get hugepages to get a SIGBUS signal.
> 
> To solve this problem, we already have a nice solution, that is,
> a hugetlb_instantiation_mutex. This blocks other threads to dive into
> a fault handler. This solve the problem clearly, but it introduce
> performance degradation, because it serialize all fault handling.
> 
> Now, I try to remove a hugetlb_instantiation_mutex to get rid of
> performance degradation.

So the whole point of the patch is to improve performance, but the
changelog doesn't include any performance measurements!

Please, run some quantitative tests and include a nice summary of the
results in the changelog.

This is terribly important, because if the performance benefit is
infinitesimally small or negative, the patch goes into the bit bucket ;)

> For achieving it, at first, we should ensure that
> no one get a SIGBUS if there are enough hugepages.
> 
> For this purpose, if we fail to allocate a new hugepage when there is
> concurrent user, we return just 0, instead of VM_FAULT_SIGBUS. With this,
> these threads defer to get a SIGBUS signal until there is no
> concurrent user, and so, we can ensure that no one get a SIGBUS if there
> are enough hugepages.

So if I'm understanding this correctly...  if N threads all generate a
fault against the same address, they will all dive in and allocate a
hugepage, will then do an enormous memcpy into that page and will then
attempt to instantiate the page in pagetables.  All threads except one
will lose the race and will free the page again!  This sounds terribly
inefficient; it would be useful to write a microbenchmark which
triggers this scenario so we can explore the impact.

I'm wondering if a better solution to all of this would be to make
hugetlb_instantiation_mutex an array of, say, 1024 mutexes and index it
with a hash of the faulting address.  That will 99.9% solve the
performance issue which you believe exists without introducing this new
performance issue?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/