linux-kernel - Re: [PATCH 3/3] Guarantee that COW faults for a process that called mmap(MAP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1211998567.12036.65.camel@localhost.localdomain>
Date:	Wed, 28 May 2008 13:16:07 -0500
From:	Adam Litke <agl@...ibm.com>
To:	Mel Gorman <mel@....ul.ie>
Cc:	akpm@...ux-foundation.org, dean@...tic.org,
	linux-kernel@...r.kernel.org, wli@...omorphy.com, dwg@....ibm.com,
	apw@...dowen.org, linux-mm@...ck.org, andi@...stfloor.org,
	kenchen@...gle.com, abh@...y.com
Subject: Re: [PATCH 3/3] Guarantee that COW faults for a process that
	called mmap(MAP_PRIVATE) on hugetlbfs will succeed

On Tue, 2008-05-27 at 19:51 +0100, Mel Gorman wrote:
> After patch 2 in this series, a process that successfully calls mmap()
> for a MAP_PRIVATE mapping will be guaranteed to successfully fault until a
> process calls fork().  At that point, the next write fault from the parent
> could fail due to COW if the child still has a reference.
> 
> We only reserve pages for the parent but a copy must be made to avoid leaking
> data from the parent to the child after fork(). Reserves could be taken for
> both parent and child at fork time to guarantee faults but if the mapping
> is large it is highly likely we will not have sufficient pages for the
> reservation, and it is common to fork only to exec() immediatly after. A
> failure here would be very undesirable.
> 
> Note that the current behaviour of mainline with MAP_PRIVATE pages is
> pretty bad.  The following situation is allowed to occur today.
> 
> 1. Process calls mmap(MAP_PRIVATE)
> 2. Process calls mlock() to fault all pages and makes sure it succeeds
> 3. Process forks()
> 4. Process writes to MAP_PRIVATE mapping while child still exists
> 5. If the COW fails at this point, the process gets SIGKILLed even though it
>    had taken care to ensure the pages existed
> 
> This patch improves the situation by guaranteeing the reliability of the
> process that successfully calls mmap(). When the parent performs COW, it
> will try to satisfy the allocation without using reserves. If that fails the
> parent will steal the page leaving any children without a page. Faults from
> the child after that point will result in failure. If the child COW happens
> first, an attempt will be made to allocate the page without reserves and
> the child will get SIGKILLed on failure.
> 
> To summarise the new behaviour:
> 
> 1. If the original mapper performs COW on a private mapping with multiple
>    references, it will attempt to allocate a hugepage from the pool or
>    the buddy allocator without using the existing reserves. On fail, VMAs
>    mapping the same area are traversed and the page being COW'd is unmapped
>    where found. It will then steal the original page as the last mapper in
>    the normal way.
> 
> 2. The VMAs the pages were unmapped from are flagged to note that pages
>    with data no longer exist. Future no-page faults on those VMAs will
>    terminate the process as otherwise it would appear that data was corrupted.
>    A warning is printed to the console that this situation occured.
> 
> 2. If the child performs COW first, it will attempt to satisfy the COW
>    from the pool if there are enough pages or via the buddy allocator if
>    overcommit is allowed and the buddy allocator can satisfy the request. If
>    it fails, the child will be killed.
> 
> If the pool is large enough, existing applications will not notice that the
> reserves were a factor. Existing applications depending on the no-reserves
> been set are unlikely to exist as for much of the history of hugetlbfs,
> pages were prefaulted at mmap(), allocating the pages at that point or failing
> the mmap().
> 
> Signed-off-by: Mel Gorman <mel@....ul.ie>

Acked-by: Adam Litke <agl@...ibm.com>

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/