[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 04 Apr 2007 18:04:30 +1000
From: Nick Piggin <nickpiggin@...oo.com.au>
To: Nick Piggin <nickpiggin@...oo.com.au>
CC: Ulrich Drepper <drepper@...hat.com>,
Rik van Riel <riel@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Linux Kernel <linux-kernel@...r.kernel.org>,
Jakub Jelinek <jakub@...hat.com>,
Linux Memory Management <linux-mm@...ck.org>
Subject: Re: missing madvise functionality
Nick Piggin wrote:
> Ulrich Drepper wrote:
>
>> People might remember the thread about mysql not scaling and pointing
>> the finger quite happily at glibc. Well, the situation is not like that.
>>
>> The problem is glibc has to work around kernel limitations. If the
>> malloc implementation detects that a large chunk of previously allocated
>> memory is now free and unused it wants to return the memory to the
>> system. What we currently have to do is this:
>>
>> to free: mmap(PROT_NONE) over the area
>> to reuse: mprotect(PROT_READ|PROT_WRITE)
>>
>> Yep, that's expensive, both operations need to get locks preventing
>> other threads from doing the same.
>>
>> Some people were quick to suggest that we simply avoid the freeing in
>> many situations (that's what the patch submitted by Yanmin Zhang
>> basically does). That's no solution. One of the very good properties
>> of the current allocator is that it does not use much memory.
>
>
> Does mmap(PROT_NONE) actually free the memory?
>
>
>> A solution for this problem is a madvise() operation with the following
>> property:
>>
>> - the content of the address range can be discarded
>>
>> - if an access to a page in the range happens in the future it must
>> succeed. The old page content can be provided or a new, empty page
>> can be provided
>>
>> That's it. The current MADV_DONTNEED doesn't cut it because it zaps the
>> pages, causing *all* future reuses to create page faults. This is what
>> I guess happens in the mysql test case where the pages where unused and
>> freed but then almost immediately reused. The page faults erased all
>> the benefits of using one mprotect() call vs a pair of mmap()/mprotect()
>> calls.
>
>
> Two questions.
>
> In the case of pages being unused then almost immediately reused, why is
> it a bad solution to avoid freeing? Is it that you want to avoid
> heuristics because in some cases they could fail and end up using memory?
>
> Secondly, why is MADV_DONTNEED bad? How much more expensive is a pagefault
> than a syscall? (including the cost of the TLB fill for the memory access
> after the syscall, of course).
>
> zapping the pages puts them on a nice LIFO cache hot list of pages that
> can be quickly used when the next fault comes in, or used for any other
> allocation in the kernel. Putting them on some sort of reclaim list seems
> a bit pointless.
>
> Oh, also: something like this patch would help out MADV_DONTNEED, as it
> means it can run concurrently with page faults. I think the locking will
> work (but needs forward porting).
BTW. and this way it becomes much more attractive than using mmap/mprotect
can ever be, because they must take mmap_sem for writing always.
You don't actually need to protect the ranges unless running with use after
free debugging turned on, do you?
--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists