[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200918164032.GA5962@xz-x1>
Date: Fri, 18 Sep 2020 12:40:32 -0400
From: Peter Xu <peterx@...hat.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Jason Gunthorpe <jgg@...pe.ca>, John Hubbard <jhubbard@...dia.com>,
Leon Romanovsky <leonro@...dia.com>,
Linux-MM <linux-mm@...ck.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
"Maya B . Gokhale" <gokhale2@...l.gov>,
Yang Shi <yang.shi@...ux.alibaba.com>,
Marty Mcfadden <mcfadden8@...l.gov>,
Kirill Shutemov <kirill@...temov.name>,
Oleg Nesterov <oleg@...hat.com>, Jann Horn <jannh@...gle.com>,
Jan Kara <jack@...e.cz>, Kirill Tkhai <ktkhai@...tuozzo.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Christoph Hellwig <hch@....de>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH 1/4] mm: Trial do_wp_page() simplification
On Thu, Sep 17, 2020 at 12:51:49PM -0700, Linus Torvalds wrote:
> On Thu, Sep 17, 2020 at 12:38 PM Jason Gunthorpe <jgg@...pe.ca> wrote:
> >
> > Looking for awhile, this now looks reasonable and
> > doable. page_maybe_dma_pinned() was created for exactly this kind of
> > case.
> >
> > I've attached a dumb sketch for the pte level (surely wrong! I have
> > never looked at this part of the mm before!) at the end of this
> > message.
>
> This looks conceptually fine to me.
>
> But as mentioned, I think I'd be even happier if we added a "thsi vma
> has seen a page pin event" flag to the vma flags, and didn't rely
> _just_ on the page_maybe_dma_pinned() check, which migth be triggered
> by those fork-happy loads.
>
> Side note: I wonder if that COW mapping check could be entirely within
> that vm_normal_page() path.
>
> Because how could a non-normal page be a COW page and not already
> write-protected?
I feel like we still need to keep those to cover !page case for PFNMAP.
I tried to draft a patch, however I do see some other issues, so I'd like to
discuss here before mindlessly going on with it.
Firstly in the draft patch mm->has_pinned is introduced and it's written to 1
as long as FOLL_GUP is called once. It's never reset after set.
One more thing (I think) we need to do is to pass the new vma from
copy_page_range() down into the end because if we want to start cow during
fork() then we need to operate on that new vma too when new page linked to it
rather than the parent's.
One issue is when we charge for cgroup we probably can't do that onto the new
mm/task, since copy_namespaces() is called after copy_mm(). I don't know
enough about cgroup, I thought the child will inherit the parent's, but I'm not
sure. Or, can we change that order of copy_namespaces() && copy_mm()? I don't
see a problem so far but I'd like to ask first..
The other thing is on how to fail. E.g., when COW failed due to either
charging of cgroup or ENOMEM, ideally we should fail fork() too. Though that
might need more changes - current patch silently kept the shared page for
simplicity.
The draft patch is attached for reference. It should cover the small pages,
but at least it needs to cover huge pages and also a well written commit log.
Thanks,
--
Peter Xu
View attachment "0001-mm-Do-early-cow-for-FOLL_PIN-pages-during-fork.patch" of type "text/plain" (10674 bytes)
Powered by blists - more mailing lists