[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1502483265.6577.52.camel@redhat.com>
Date: Fri, 11 Aug 2017 16:27:45 -0400
From: Rik van Riel <riel@...hat.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Michal Hocko <mhocko@...nel.org>,
Mike Kravetz <mike.kravetz@...cle.com>,
linux-mm <linux-mm@...ck.org>,
Florian Weimer <fweimer@...hat.com>, colm@...costs.net,
Andrew Morton <akpm@...ux-foundation.org>,
Kees Cook <keescook@...omium.org>,
Andy Lutomirski <luto@...capital.net>,
Will Drewry <wad@...omium.org>, Ingo Molnar <mingo@...nel.org>,
"Kirill A. Shutemov" <kirill@...temov.name>,
Dave Hansen <dave.hansen@...el.com>,
Linux API <linux-api@...r.kernel.org>,
Matthew Wilcox <willy@...radead.org>
Subject: Re: [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK
On Fri, 2017-08-11 at 12:42 -0700, Linus Torvalds wrote:
> On Fri, Aug 11, 2017 at 12:19 PM, <riel@...hat.com> wrote:
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 0e517be91a89..f9b0ad7feb57 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -1134,6 +1134,16 @@ int copy_page_range(struct mm_struct
> > *dst_mm, struct mm_struct *src_mm,
> > !vma->anon_vma)
> > return 0;
> >
> > + /*
> > + * With VM_WIPEONFORK, the child inherits the VMA from the
> > + * parent, but not its contents.
> > + *
> > + * A child accessing VM_WIPEONFORK memory will see all
> > zeroes;
> > + * a child accessing VM_DONTCOPY memory receives a
> > segfault.
> > + */
> > + if (vma->vm_flags & VM_WIPEONFORK)
> > + return 0;
> > +
>
> Is this right?
>
> Yes, you don't do the page table copies. Fine. But you leave vma with
> the the anon_vma pointer - doesn't that mean that it's still
> connected
> to the original anonvma chain, and we might end up swapping something
> in?
Swapping something in would require there to be a swap entry in
the page table entries, which we are not copying, so this should
not be a correctness issue.
> And even if that ends up not being an issue, I'd expect that you'd
> want to break the anon_vma chain just to not make it grow
> unnecessarily.
This is a good point. I can send a v4 that skips the anon_vma_fork()
call if VM_WIPEONFORK, and calls anon_vma_prepare(), instead.
> So my gut feel is that doing this in "copy_page_range()" is wrong,
> and
> the logic should be moved up to dup_mmap(), where we can also
> short-circuit the anon_vma chain entirely.
>
> No?
There is another test in copy_page_range already which ends up
skipping the page table copy when it should not be done.
If you want, I can move that test into a should_copy_page_range()
function, and call that from dup_mmap(), skipping the call to
copy_page_range() if should_copy_page_range() returns false.
Having only one of the two sets of tests in dup_mmap(), and
the other in copy_page_range() seems wrong.
Just let me know what you prefer, and I'll put that in v4.
> The madvice() interface looks fine to me.
That was the main reason for adding you to the thread :)
kind regards,
Rik
Powered by blists - more mailing lists