[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200915145040.GA2949@xz-x1>
Date: Tue, 15 Sep 2020 10:50:40 -0400
From: Peter Xu <peterx@...hat.com>
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Leon Romanovsky <leonro@...dia.com>,
Linux-MM <linux-mm@...ck.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
"Maya B . Gokhale" <gokhale2@...l.gov>,
Yang Shi <yang.shi@...ux.alibaba.com>,
Marty Mcfadden <mcfadden8@...l.gov>,
Kirill Shutemov <kirill@...temov.name>,
Oleg Nesterov <oleg@...hat.com>, Jann Horn <jannh@...gle.com>,
Jan Kara <jack@...e.cz>, Kirill Tkhai <ktkhai@...tuozzo.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Christoph Hellwig <hch@....de>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH 1/4] mm: Trial do_wp_page() simplification
Hi, all,
I prepared another version of the FOLL_PIN enforced cow patch attached, just in
case it would still be anything close to useful (though now I highly doubt it
considering below...). I took care of !USERFAULTFD as suggested by Leon, and
also the fast gup path.
However...
On Mon, Sep 14, 2020 at 08:28:51PM -0300, Jason Gunthorpe wrote:
> Yes, this stuff does pin_user_pages_fast() and MADV_DONTFORK
> together. It sets FOLL_FORCE and FOLL_WRITE to get an exclusive copy
> of the page and MADV_DONTFORK was needed to ensure that a future fork
> doesn't establish a COW that would break the DMA by moving the
> physical page over to the fork. DMA should stay with the process that
> called pin_user_pages_fast() (Is MADV_DONTFORK still needed with
> recent years work to GUP/etc? It is a pretty terrible ancient thing)
... Now I'm more confused on what has happened.
If we're with FORCE|WRITE, iiuc it should guarantee that the page will trigger
COW during gup even if it is shared, so no problem on the gup side. Then I'm
quite confused on why the write bit is not set when cow triggered.
E.g., in wp_page_copy(), if I'm not wrong, the write bit is only controlled by
(besides the fix patch, though I believe the rdma test should have nothing to
do with uffd-wp after all so it should be the same anyways):
entry = maybe_mkwrite(pte_mkdirty(entry), vma);
It means, as long as the rdma region has VM_WRITE set (which I think of no
reason on why it shouldn't...), then it should have the write bit in the COWed
page entry. If so, the page should be stable and I don't undersdand why
another COW could even trigger and how the code path in the "trial cow" patch
is triggered.
Or, the VMA is without VM_WRITE due to some reason? Sorry I probably know
nothing about RDMA, more information on that side might help too. E.g., is the
hardware going to walk the software process page table too when doing RDMA (or
is IOMMU page table used, or none)?
Thanks,
--
Peter Xu
Powered by blists - more mailing lists