[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrVHgvhAN3neoOpJEk94uM7QKm2izZpp+=1UA6qieaQiTQ@mail.gmail.com>
Date: Tue, 30 Sep 2014 10:49:36 -0700
From: Andy Lutomirski <luto@...capital.net>
To: Daniel Micay <danielmicay@...il.com>
Cc: "linux-mm@...ck.org" <linux-mm@...ck.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Jason Evans <jasone@...onware.com>,
Linux API <linux-api@...r.kernel.org>
Subject: Re: [PATCH v3] mm: add mremap flag for preserving the old mapping
On Sep 30, 2014 2:36 AM, "Daniel Micay" <danielmicay@...il.com> wrote:
>
> On 30/09/14 01:53 AM, Andy Lutomirski wrote:
> > On Mon, Sep 29, 2014 at 9:55 PM, Daniel Micay <danielmicay@...il.com> wrote:
> >> This introduces the MREMAP_RETAIN flag for preserving the source mapping
> >> when MREMAP_MAYMOVE moves the pages to a new destination. Accesses to
> >> the source location will fault and cause fresh pages to be mapped in.
> >>
> >> For consistency, the old_len >= new_len case could decommit the pages
> >> instead of unmapping. However, userspace can accomplish the same thing
> >> via madvise and a coherent definition of the flag is possible without
> >> the extra complexity.
> >
> > IMO this needs very clear documentation of exactly what it does.
>
> Agreed, and thanks for the review. I'll post a slightly modified version
> of the patch soon (mostly more commit message changes).
>
> > Does it preserve the contents of the source pages? (If so, why?
> > Aren't you wasting a bunch of time on page faults and possibly
> > unnecessary COWs?)
>
> The source will act as if it was just created. For an anonymous memory
> mapping, it will fault on any accesses and bring in new zeroed pages.
>
> In jemalloc, it replaces an enormous memset(dst, src, size) followed by
> madvise(src, size, MADV_DONTNEED) with mremap. Using mremap also ends up
> eliding page faults from writes at the destination.
>
> TCMalloc has nearly the same page allocation design, although it tries
> to throttle the purging so it won't always gain as much.
>
> > Does it work on file mappings? Can it extend file mappings while it moves them?
>
> It works on file mappings. If a move occurs, there will be the usual
> extended destination mapping but with the source mapping left intact.
>
> It wouldn't be useful with existing allocators, but in theory a general
> purpose allocator could expose an MMIO API in order to reuse the same
> address space via MAP_FIXED/MREMAP_FIXED to reduce VM fragmentation.
>
> > If you MREMAP_RETAIN a partially COWed private mapping, what happens?
>
> The original mapping is zeroed in the following test, as it would be
> without fork:
>
> #define _GNU_SOURCE
>
> #include <string.h>
> #include <stdlib.h>
> #include <sys/mman.h>
> #include <unistd.h>
> #include <sys/wait.h>
>
> int main(void) {
> size_t size = 1024 * 1024;
> char *orig = mmap(NULL, size, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
> memset(orig, 5, size);
> int pid = fork();
> if (pid == -1)
> return 1;
> if (pid == 0) {
> memset(orig, 5, 1024);
> char *new = mremap(orig, size, size * 128, MREMAP_MAYMOVE|4);
> if (new == orig) return 1;
> for (size_t i = 0; i < size; i++)
> if (new[i] != 5)
> return 1;
> for (size_t i = 0; i < size; i++)
> if (orig[i] != 0)
> return 1;
> return 0;
> }
> int status;
> if (wait(&status) < -1) return 1;
> if (WIFEXITED(status))
> return WEXITSTATUS(status);
> return 1;
> }
>
> Hopefully this is the case you're referring to. :)
What about private file mappings?
>
> > Does it work on special mappings? If so, please prevent it from doing
> > so. mremapping x86's vdso is a thing, and duplicating x86's vdso
> > should not become a thing, because x86_32 in particular will become
> > extremely confused.
>
> I'll add a check for arch_vma_name(vma) == NULL.
Careful! That function is deprecated in favor of vm_ops->name.
I think it might pay to add an explicit vm_op to authorize
duplication, especially for non-cow mappings. IOW this kind of
extension seems quite magical for anything that doesn't have the
normal COW semantics, including for plain old read-only mappings.
>
> There's an existing check for VM_DONTEXPAND | VM_PFNMAP when expanding
> allocations (the only case this flag impacts). Are there other kinds of
> special mappings that you're referring to?
I was referring to special mappings in the install_special_mapping
sense. Those may or may not have VM_PFNMAP set.
If VM_DONTEXPAND blocks this new feature entirely, that's probably good.
--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists