linux-kernel - Re: missing madvise functionality

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20070403135154.61e1b5f3.akpm@linux-foundation.org>
Date:	Tue, 3 Apr 2007 13:51:54 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Ulrich Drepper <drepper@...hat.com>
Cc:	Andi Kleen <andi@...stfloor.org>, Rik van Riel <riel@...hat.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	Jakub Jelinek <jakub@...hat.com>, linux-mm@...ck.org,
	Hugh Dickins <hugh@...itas.com>
Subject: Re: missing madvise functionality

On Tue, 03 Apr 2007 13:17:09 -0700
Ulrich Drepper <drepper@...hat.com> wrote:

> Andrew Morton wrote:
> > Ulrich, could you suggest a little test app which would demonstrate this
> > behaviour?
> 
> It's not really reliably possible to demonstrate this with a small
> program using malloc.  You'd need something like this mysql test case
> which Rik said is not hard to run by yourself.
> 
> If somebody adds a kernel interface I can easily produce a glibc patch
> so that the test can be run in the new environment.
> 
> But it's of course easy enough to simulate the specific problem in a
> micro benchmark.  If you want that let me know.
> 
> 
> > Question:
> > 
> >>   - if an access to a page in the range happens in the future it must
> >>     succeed.  The old page content can be provided or a new, empty page
> >>    can be provided
> > 
> > How important is this "use the old page if it is available" feature?  If we
> > were to simply implement a fast unconditional-free-the-page, so that
> > subsequent accesses always returned a new, zeroed page, do we expect that
> > this will be a 90%-good-enough thing, or will it be significantly
> > inefficient?
> 
> My guess is that the page fault you'd get for every single page is a
> huge part of the problem.  If you don't free the pages and just leave
> them in the process processes which quickly reuse the memory pool will
> experience no noticeable slowdown.  The only difference between not
> freeing the memory and and doing it is that one madvise() syscall.
> 
> If you unconditionally free the page you we have later mprotect() call
> (one mmap_sem lock saved).  But does every page fault then later
> requires the semaphore?  Even if not, the additional kernel entry is a
> killer.

Oh.  I was assuming that we'd want to unmap these pages from pagetables and
mark then super-easily-reclaimable.  So a later touch would incur a minor
fault.

But you think that we should leave them mapped into pagetables so no such
fault occurs.

I guess we can still do that - if we follow the "this is just like clean
swapcache" concept, things should just work.

Leaving the pages mapped into pagetables means that they are considerably
less likely to be reclaimed.

But whatever we do, with the current MM design we need to at least take the
mmap_sem for reading so we can descend the vma tree and locate the
pageframes.  And if that locking is the main problem then none of this is
likely to help.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/