[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a1b36c3a0908121204q1b59df1fk86afec9d05ec16dc@mail.gmail.com>
Date: Wed, 12 Aug 2009 15:04:51 -0400
From: Bill Speirs <bill.speirs@...il.com>
To: Hugh Dickins <hugh.dickins@...cali.co.uk>
Cc: Nick Piggin <npiggin@...e.de>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org
Subject: Re: vma_merge issue
On Wed, Aug 12, 2009 at 2:26 PM, Hugh Dickins<hugh.dickins@...cali.co.uk> wrote:
> On Mon, 10 Aug 2009, Bill Speirs wrote:
>>
>> I came across an issue where adjacent pages are not properly coalesced
>> together when changing protections on them. This can be shown by doing
>> the following:
>>
>> 1) Map 3 pages with PROT_NONE and MAP_PRIVATE | MAP_ANONYMOUS
>> 2) Set the middle page's protection to PROT_READ | PROT_WRITE
>> 3) Set the middle page's protection back to PROT_NONE
>>
>> You are left with 3 entries in /proc/self/map where you should only
>> have 1. If you only change the protection to PROT_READ in step 2, then
>> it is properly merged together. I noticed in mprotect.c the following
>> comment in the function mprotect_fixup; I'm not sure if it applies or
>> not:
>> /*
>> * If we make a private mapping writable we increase our commit;
>> * but (without finer accounting) cannot reduce our commit if we
>> * make it unwritable again.
> [ the following lines of the comment are not relevant here so I'll delete ]
>> */
>>
>> I think this only applies to setting charged = nrpages; however,
>> VM_ACCOUNT is also added to newflags. Could it be that the adjacent
>> blocks don't have VM_ACCOUNT and so the call to vma_merge cannot merge
>> because the flags for the adjacent vma are not the same?
>
> That's right, and it is working as intended.
>
> To allow people to set up enormous PROT_READ,MAP_PRIVATE mappings
> "for free", we don't account those initially, but only as parts
> are mprotected writable later: at that point they're accounted,
> and marked VM_ACCOUNT so that we know it's been done (and don't
> double account later on).
>
> So your middle page has been accounted (one page added to
> /proc/meminfo's Committed_AS, which isn't allowed to exceed CommitLimit
> if /proc/sys/vm/overcommit_memory is 2 to disable overcommit), but the
> neighbouring pages have not been accounted: so we need separate vmas
> for them, I'm afraid, since that accounting is done per vma.
>
>>
>> Can anyone shed some light on this? While it isn't an issue for 3
>> pages, I'm mmaping 200K+ pages and changing the perms on random pages
>> throughout and then back but I quickly run into the max_map_count when
>> I don't actually need that many mappings.
>
> But that's easily dealt with: just make your mmap PROT_READ|PROT_WRITE,
> which will account for the whole extent; then mprotect it all PROT_NONE,
> which will take you to your previous starting position; then proceed as
> before - the vmas should get merged as they are reset back to PROT_NONE.
> That works, doesn't it?
Unfortunately, that doesn't work. When I mmap pages as PROT_WRITE it
is checked against the CommitLimit and returns with ENOMEM as I'm
mmaping a lot of pages. However, I don't actually want to be charged
for that memory, as I won't be using all of it. This is why I mmap as
PROT_NONE as I'm not charged for it. Then when I set a page to
PROT_WRITE I get charged (which is expected and OK), but then going
back to PROT_NONE I don't get "uncharged". This makes sense as I could
simply PROT_WRITE that page again and I should be charged. However, I
have no way (that I know of) to tell the kernel "I'm done with this
page, don't charge me for it, and set it's protection to PROT_NONE."
I've tried madvise with MADV_DONTNEED but that doesn't seem to remove
the VM_ACCOUNT flag.
I have seen an mm patch that introduces MADV_FREE, which I believe
removes the VM_ACCOUNT flag and decrements the commit charge. Does it
make sense to have this type of functionality? Can I get this same
type of functionality (start without being charged for a page, use it,
then un-use it and remove the charge for it?) currently?
> (I must offer a big thank you: replying to your mail just after writing
> a mail about the ZERO_PAGE, brings me to realize - if I'm not mistaken -
> that we broke the accounting of initially non-writable anonymous areas
> when we stopped using the ZERO_PAGE there, but marked readfaulted pages
> as dirty. Looks like another argument to bring them back.)
I'm not 100% sure what you're talking about with respect to ZERO_PAGE,
but I'm happy to help :-)
Bill-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists