linux-kernel - Re: missing madvise functionality

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4612B645.7030902@redhat.com>
Date:	Tue, 03 Apr 2007 13:17:09 -0700
From:	Ulrich Drepper <drepper@...hat.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
CC:	Andi Kleen <andi@...stfloor.org>, Rik van Riel <riel@...hat.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	Jakub Jelinek <jakub@...hat.com>, linux-mm@...ck.org,
	Hugh Dickins <hugh@...itas.com>
Subject: Re: missing madvise functionality

Andrew Morton wrote:
> Ulrich, could you suggest a little test app which would demonstrate this
> behaviour?

It's not really reliably possible to demonstrate this with a small
program using malloc.  You'd need something like this mysql test case
which Rik said is not hard to run by yourself.

If somebody adds a kernel interface I can easily produce a glibc patch
so that the test can be run in the new environment.

But it's of course easy enough to simulate the specific problem in a
micro benchmark.  If you want that let me know.

> Question:
> 
>>   - if an access to a page in the range happens in the future it must
>>     succeed.  The old page content can be provided or a new, empty page
>>    can be provided
> 
> How important is this "use the old page if it is available" feature?  If we
> were to simply implement a fast unconditional-free-the-page, so that
> subsequent accesses always returned a new, zeroed page, do we expect that
> this will be a 90%-good-enough thing, or will it be significantly
> inefficient?

My guess is that the page fault you'd get for every single page is a
huge part of the problem.  If you don't free the pages and just leave
them in the process processes which quickly reuse the memory pool will
experience no noticeable slowdown.  The only difference between not
freeing the memory and and doing it is that one madvise() syscall.

If you unconditionally free the page you we have later mprotect() call
(one mmap_sem lock saved).  But does every page fault then later
requires the semaphore?  Even if not, the additional kernel entry is a
killer.

> So perhaps we can do something like chop swapper_space in half: the lower
> 50% represent offsets which have a swap mapping and the upper 50% are fake
> swapcache pages which don't actually consume swapspace.  These pages are
> unmapped from pagetables, marked clean, added to the fake part of
> swapper_space and are deactivated.  Teach the low-level swap code to ignore
> the request to free physical swapspace when these pages are released.

Sounds good to me.

> This would all halve the maximum amount of swap which can be used.  iirc
> i386 supports 27 bits of swapcache indexing, and 26 bits is 274GB, which
> is hopefully enough..

Boo hoo, poor 32-bit machines.  People with demands of > 274G should get
a real machine instead.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

Download attachment "signature.asc" of type "application/pgp-signature" (252 bytes)