lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.10.1503261250430.9410@chino.kir.corp.google.com>
Date:	Thu, 26 Mar 2015 13:03:12 -0700 (PDT)
From:	David Rientjes <rientjes@...gle.com>
To:	Davide Libenzi <davidel@...ilserver.org>
cc:	Hugh Dickins <hughd@...gle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Joern Engel <joern@...fs.org>,
	Jianguo Wu <wujianguo@...wei.com>,
	Eric B Munson <emunson@...mai.com>, linux-mm@...ck.org,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB
 aligned

On Thu, 26 Mar 2015, Davide Libenzi wrote:

> > Yes, this munmap() behavior of lengths <= hugepage_size - PAGE_SIZE for a 
> > hugetlb vma is long standing and there may be applications that break as a 
> > result of changing the behavior: a database that reserves all allocated 
> > hugetlb memory with mmap() so that it always has exclusive access to those 
> > hugepages, whether they are faulted or not, and maintains its own hugepage 
> > pool (which is common), may test the return value of munmap() and depend 
> > on it returning -EINVAL to determine if it is freeing memory that was 
> > either dynamically allocated or mapped from the hugetlb reserved pool.
> 
> You went a long way to create such a case.
> But, in your case, that application will erroneously considering hugepage 
> mmaped memory, as dynamically allocated, since it will always get EINVAL, 
> unless it passes an aligned size. Aligned size, which a fix like the one 
> posted in the patch will still leave as success.

There was a patch proposed last week to add reserved pools to the 
hugetlbfs mount option specifically for the case where a large database 
wants sole reserved access to the hugepage pool.  This is why hugetlbfs 
pages become reserved on mmap().  In that case, the database never wants 
to do munmap() and instead maintains its own hugepage pool.

That makes the usual database case, mmap() all necessary hugetlb pages to 
reserve them, even easier since they have historically had to maintain 
this pool amongst various processes.

Is there a process out there that tests for munmap(ptr) == EINVAL and, if 
true, returns ptr to its hugepage pool?  I can't say for certain that none 
exist, that's why the potential for breakage exists.

> OTOH, an application, which might be more common than the one you posted,
> which calls munmap() to release a pointer which it validly got from a 
> previous mmap(), will leak huge pages as all the issued munmaps will fail.
> 

That application would have to be ignoring an EINVAL return value.

> > If we were to go back in time and decide this when the munmap() behavior 
> > for hugetlb vmas was originally introduced, that would be valid.  The 
> > problem is that it could lead to userspace breakage and that's a 
> > non-starter.
> > 
> > What we can do is improve the documentation and man-page to clearly 
> > specify the long-standing behavior so that nobody encounters unexpected 
> > results in the future.
> 
> This way you will leave the mmap API with broken semantics.
> In any case, I am done arguing.
> I will leave to Andrew to sort it out, and to Michael Kerrisk to update 
> the mmap man pages with the new funny behaviour.
> 

The behavior is certainly not new, it has always been the case for 
munmap() on hugetlb vmas.

In a strict POSIX interpretation, it refers only to pages in the sense of
what is returned by sysconf(_SC_PAGESIZE).  Such vmas are not backed by 
any pages of size sysconf(_SC_PAGESIZE), so this behavior is undefined.  
It would be best to modify the man page to explicitly state this for 
MAP_HUGETLB.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ