lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 1 Nov 2012 09:50:52 +0900
From:	Minchan Kim <minchan@...nel.org>
To:	Paul Turner <pjt@...gle.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	John Stultz <john.stultz@...aro.org>,
	Christoph Lameter <cl@...ux.com>,
	Android Kernel Team <kernel-team@...roid.com>,
	Robert Love <rlove@...gle.com>, Mel Gorman <mel@....ul.ie>,
	Hugh Dickins <hughd@...gle.com>,
	Dave Hansen <dave@...ux.vnet.ibm.com>,
	Rik van Riel <riel@...hat.com>,
	Dave Chinner <david@...morbit.com>, Neil Brown <neilb@...e.de>,
	Mike Hommey <mh@...ndium.org>, Taras Glek <tglek@...illa.com>,
	KOSAKI Motohiro <kosaki.motohiro@...il.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	sanjay@...gle.com, David Rientjes <rientjes@...gle.com>
Subject: Re: [RFC v2] Support volatile range for anon vma

Hello,

On Wed, Oct 31, 2012 at 02:59:07PM -0700, Paul Turner wrote:
> On Wed, Oct 31, 2012 at 2:35 PM, Andrew Morton
> <akpm@...ux-foundation.org> wrote:
> >
> > On Tue, 30 Oct 2012 10:29:54 +0900
> > Minchan Kim <minchan@...nel.org> wrote:
> >
> > > This patch introudces new madvise behavior MADV_VOLATILE and
> > > MADV_NOVOLATILE for anonymous pages. It's different with
> > > John Stultz's version which considers only tmpfs while this patch
> > > considers only anonymous pages so this cannot cover John's one.
> > > If below idea is proved as reasonable, I hope we can unify both
> > > concepts by madvise/fadvise.
> > >
> > > Rationale is following as.
> > > Many allocators call munmap(2) when user call free(3) if ptr is
> > > in mmaped area. But munmap isn't cheap because it have to clean up
> > > all pte entries and unlinking a vma so overhead would be increased
> > > linearly by mmaped area's size.
> >
> > Presumably the userspace allocator will internally manage memory in
> > large chunks, so the munmap() call frequency will be much lower than
> > the free() call frequency.  So the performance gains from this change
> > might be very small.
> 
> I don't think I strictly understand the motivation from a
> malloc-standpoint here.
> 
> These days we (tcmalloc) use madvise(..., MADV_DONTNEED) when we want
> to perform discards on Linux.    For any reasonable allocator (short
> of binding malloc --> mmap, free --> unmap) this seems a better
> choice.
> 
> Note also from a performance stand-point I doubt any allocator (which
> case about performance) is going to want to pay the cost of even a
> null syscall about typical malloc/free usage (consider: a tcmalloc

Good point.

> malloc/free pairis currently <20ns).  Given then that this cost is
> amortized once you start doing discards on larger blocks MADV_DONTNEED
> seems a preferable interface:
> - You don't need to reconstruct an arena when you do want to allocate
> since there's no munmap/mmap for the region to change about
> - There are no syscalls involved in later reallocating the block.

Above benefits are applied on MADV_VOLATILE, too.
But as you pointed out, there is a little bit overhead than DONTNEED
because allocator should call madvise(MADV_NOVOLATILE) before allocation.
For mavise(NOVOLATILE) does just mark vma flag, it does need mmap_sem
and could be a problem on parallel malloc/free workload as KOSAKI pointed out.

In such case, we can change semantic so malloc doesn't need to call
madivse(NOVOLATILE) before allocating. Then, page fault handler have to
check whether this page fault happen by access of volatile vma. If so,
it could return zero page instead of SIGBUS and mark the vma isn't volatile
any more.

> 
> The only real additional cost is address-space.  Are you strongly
> concerned about the 32-bit case?

No. I believe allocators have a logic to clean up them once address space is
almost full.

Thanks, Paul.

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ