linux-kernel - Re: [PATCH 0/3] Guarantee faults for processes that call mmap(MAP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20080509000249.GA8552@yookeroo.seuss>
Date:	Fri, 9 May 2008 10:02:49 +1000
From:	David Gibson <dwg@....ibm.com>
To:	Andy Whitcroft <apw@...dowen.org>
Cc:	Mel Gorman <mel@....ul.ie>, linux-mm@...ck.org, dean@...tic.org,
	linux-kernel@...r.kernel.org, wli@...omorphy.com,
	andi@...stfloor.org, kenchen@...gle.com, agl@...ibm.com
Subject: Re: [PATCH 0/3] Guarantee faults for processes that call
	mmap(MAP_PRIVATE) on hugetlbfs v2

On Thu, May 08, 2008 at 12:14:08PM +0100, Andy Whitcroft wrote:
> On Thu, May 08, 2008 at 11:48:22AM +1000, David Gibson wrote:
> > On Wed, May 07, 2008 at 08:38:26PM +0100, Mel Gorman wrote:
> > > MAP_SHARED mappings on hugetlbfs reserve huge pages at mmap() time.
> > > This guarantees all future faults against the mapping will succeed.
> > > This allows local allocations at first use improving NUMA locality whilst
> > > retaining reliability.
> > > 
> > > MAP_PRIVATE mappings do not reserve pages. This can result in an application
> > > being SIGKILLed later if a huge page is not available at fault time. This
> > > makes huge pages usage very ill-advised in some cases as the unexpected
> > > application failure cannot be detected and handled as it is immediately fatal.
> > > Although an application may force instantiation of the pages using mlock(),
> > > this may lead to poor memory placement and the process may still be killed
> > > when performing COW.
> > > 
> > > This patchset introduces a reliability guarantee for the process which creates
> > > a private mapping, i.e. the process that calls mmap() on a hugetlbfs file
> > > successfully.  The first patch of the set is purely mechanical code move to
> > > make later diffs easier to read. The second patch will guarantee faults up
> > > until the process calls fork(). After patch two, as long as the child keeps
> > > the mappings, the parent is no longer guaranteed to be reliable. Patch
> > > 3 guarantees that the parent will always successfully COW by unmapping
> > > the pages from the child in the event there are insufficient pages in the
> > > hugepage pool in allocate a new page, be it via a static or dynamic pool.
> > 
> > I don't think patch 3 is a good idea.  It's a fair bit of code to
> > implement a pretty bizarre semantic that I really don't think is all
> > that useful.  Patches 1-2 are already sufficient to cover the
> > fork()/exec() case and a fair proportion of fork()/minor
> > frobbing/exit() cases.  If the child also needs to write the hugepage
> > area, chances are it's doing real work and we care about its
> > reliability too.
> 
> Without patch 3 the parent is still vunerable during the period the
> child exists.  Even if that child does nothing with the pages not even
> referencing them, and then execs immediatly.  As soon as we fork any
> reference from the parent will trigger a COW, at which point there may
> be no pages available and the parent will have to be killed.  That is
> regardless of the fact the child is not going to reference the page and
> leave the address space shortly.  With patch 3 on COW if we find no memory
> available the page may be stolen for the parent saving it, and the _risk_
> of reference death moves to the child; the child is killed only should it
> then re-reference the page.

Yes, thinko, sorry.  Forgot that a COW would be triggered even if the
child never wrote the pages.  I see the point of patch 3 now.  Damn,
but it's still a weird semantic to be implementing though.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/