linux-kernel - Re: [PATCH v3] mm: add mremap flag for preserving the old mapping

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrVHgvhAN3neoOpJEk94uM7QKm2izZpp+=1UA6qieaQiTQ@mail.gmail.com>
Date:	Tue, 30 Sep 2014 10:49:36 -0700
From:	Andy Lutomirski <luto@...capital.net>
To:	Daniel Micay <danielmicay@...il.com>
Cc:	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Jason Evans <jasone@...onware.com>,
	Linux API <linux-api@...r.kernel.org>
Subject: Re: [PATCH v3] mm: add mremap flag for preserving the old mapping

On Sep 30, 2014 2:36 AM, "Daniel Micay" <danielmicay@...il.com> wrote:
>
> On 30/09/14 01:53 AM, Andy Lutomirski wrote:
> > On Mon, Sep 29, 2014 at 9:55 PM, Daniel Micay <danielmicay@...il.com> wrote:
> >> This introduces the MREMAP_RETAIN flag for preserving the source mapping
> >> when MREMAP_MAYMOVE moves the pages to a new destination. Accesses to
> >> the source location will fault and cause fresh pages to be mapped in.
> >>
> >> For consistency, the old_len >= new_len case could decommit the pages
> >> instead of unmapping. However, userspace can accomplish the same thing
> >> via madvise and a coherent definition of the flag is possible without
> >> the extra complexity.
> >
> > IMO this needs very clear documentation of exactly what it does.
>
> Agreed, and thanks for the review. I'll post a slightly modified version
> of the patch soon (mostly more commit message changes).
>
> > Does it preserve the contents of the source pages?  (If so, why?
> > Aren't you wasting a bunch of time on page faults and possibly
> > unnecessary COWs?)
>
> The source will act as if it was just created. For an anonymous memory
> mapping, it will fault on any accesses and bring in new zeroed pages.
>
> In jemalloc, it replaces an enormous memset(dst, src, size) followed by
> madvise(src, size, MADV_DONTNEED) with mremap. Using mremap also ends up
> eliding page faults from writes at the destination.
>
> TCMalloc has nearly the same page allocation design, although it tries
> to throttle the purging so it won't always gain as much.
>
> > Does it work on file mappings?  Can it extend file mappings while it moves them?
>
> It works on file mappings. If a move occurs, there will be the usual
> extended destination mapping but with the source mapping left intact.
>
> It wouldn't be useful with existing allocators, but in theory a general
> purpose allocator could expose an MMIO API in order to reuse the same
> address space via MAP_FIXED/MREMAP_FIXED to reduce VM fragmentation.
>
> > If you MREMAP_RETAIN a partially COWed private mapping, what happens?
>
> The original mapping is zeroed in the following test, as it would be
> without fork:
>
> #define _GNU_SOURCE
>
> #include <string.h>
> #include <stdlib.h>
> #include <sys/mman.h>
> #include <unistd.h>
> #include <sys/wait.h>
>
> int main(void) {
>   size_t size = 1024 * 1024;
>   char *orig = mmap(NULL, size, PROT_READ|PROT_WRITE,
>                     MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
>   memset(orig, 5, size);
>   int pid = fork();
>   if (pid == -1)
>     return 1;
>   if (pid == 0) {
>     memset(orig, 5, 1024);
>     char *new = mremap(orig, size, size * 128, MREMAP_MAYMOVE|4);
>     if (new == orig) return 1;
>     for (size_t i = 0; i < size; i++)
>       if (new[i] != 5)
>         return 1;
>     for (size_t i = 0; i < size; i++)
>       if (orig[i] != 0)
>         return 1;
>     return 0;
>   }
>   int status;
>   if (wait(&status) < -1) return 1;
>   if (WIFEXITED(status))
>     return WEXITSTATUS(status);
>   return 1;
> }
>
> Hopefully this is the case you're referring to. :)

What about private file mappings?

>
> > Does it work on special mappings?  If so, please prevent it from doing
> > so.  mremapping x86's vdso is a thing, and duplicating x86's vdso
> > should not become a thing, because x86_32 in particular will become
> > extremely confused.
>
> I'll add a check for arch_vma_name(vma) == NULL.

Careful!  That function is deprecated in favor of vm_ops->name.

I think it might pay to add an explicit vm_op to authorize
duplication, especially for non-cow mappings.  IOW this kind of
extension seems quite magical for anything that doesn't have the
normal COW semantics, including for plain old read-only mappings.

>
> There's an existing check for VM_DONTEXPAND | VM_PFNMAP when expanding
> allocations (the only case this flag impacts). Are there other kinds of
> special mappings that you're referring to?

I was referring to special mappings in the install_special_mapping
sense.  Those may or may not have VM_PFNMAP set.

If VM_DONTEXPAND blocks this new feature entirely, that's probably good.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/