[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200917112538.GD8409@ziepe.ca>
Date: Thu, 17 Sep 2020 08:25:38 -0300
From: Jason Gunthorpe <jgg@...pe.ca>
To: Peter Xu <peterx@...hat.com>
Cc: John Hubbard <jhubbard@...dia.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Leon Romanovsky <leonro@...dia.com>,
Linux-MM <linux-mm@...ck.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
"Maya B . Gokhale" <gokhale2@...l.gov>,
Yang Shi <yang.shi@...ux.alibaba.com>,
Marty Mcfadden <mcfadden8@...l.gov>,
Kirill Shutemov <kirill@...temov.name>,
Oleg Nesterov <oleg@...hat.com>, Jann Horn <jannh@...gle.com>,
Jan Kara <jack@...e.cz>, Kirill Tkhai <ktkhai@...tuozzo.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Christoph Hellwig <hch@....de>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH 1/4] mm: Trial do_wp_page() simplification
On Wed, Sep 16, 2020 at 02:46:19PM -0400, Peter Xu wrote:
> My understanding is this may only work for the case when the fork()ed child
> quitted before we reach here (so we still have mapcount==1 for the
> page).
Yes
> What if not? Then mapcount will be greater than 1, and cow will
> still trigger. Is that what we want?
That doesn't work today anyhow, so it is fine continuing to be broken.
> Another problem is that, aiui, one of the major change previous patch proposed
> is to avoid using lock_page() so that we never block in this path.
I saw you mention this before, but it looks like the change was to
lift some of the atomc_reads out of the lock and avoid the lock if
they indicate failure, checking also for page_maybe_dma_pinned()
outside the lock just means the rare case of FOLL_PIN we will take the
lock again.
> Maybe even more complicated, because "correctness" should be even harder
> than "best effort reuse" since it can cause data corruption if we didn't do it
> right...
The only correct way is for the application to avoid write protect on
FOLL_PIN pages. The purpose here is to allow applications that hadn't
hit "bad luck" and failed to keep working.
Another thought is to insert a warning print here as well that the
program is working improperly? At least it would give a transition
period to evaluate the extent of the problem.
We are thinking it is going to be a notable regression.
I botched the last version of the patch, here is something a bit
better.
Does it seem like it could be OK? I know very little about this part
of the kernel
Thanks,
Jason
diff --git a/mm/memory.c b/mm/memory.c
index 469af373ae76e1..332de777854f8b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2889,6 +2889,24 @@ static vm_fault_t wp_page_shared(struct vm_fault *vmf)
return ret;
}
+static bool cow_needed(struct vm_fault *vmf)
+{
+ int total_map_swapcount;
+
+ if (!reuse_swap_page(vmf->page, &total_map_swapcount))
+ return true;
+
+ if (total_map_swapcount == 1) {
+ /*
+ * The page is all ours. Move it to our anon_vma so the rmap
+ * code will not search our parent or siblings. Protected
+ * against the rmap code by the page lock.
+ */
+ page_move_anon_rmap(vmf->page, vmf->vma);
+ }
+ return false;
+}
+
/*
* This routine handles present pages, when users try to write
* to a shared page. It is done by copying the page to a new address
@@ -2942,13 +2960,27 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
struct page *page = vmf->page;
/* PageKsm() doesn't necessarily raise the page refcount */
- if (PageKsm(page) || page_count(page) != 1)
+ if (PageKsm(page))
goto copy;
+ if (page_count(page) != 1) {
+ /*
+ * If the page is DMA pinned we can't rely on the
+ * above to know if there are other CPU references as
+ * page_count() will be elevated by the
+ * pin. Needlessly copying the page will cause the DMA
+ * pin to break, try harder to avoid that.
+ */
+ if (!page_maybe_dma_pinned(page))
+ goto copy;
+ }
+
if (!trylock_page(page))
goto copy;
if (PageKsm(page) || page_mapcount(page) != 1 || page_count(page) != 1) {
- unlock_page(page);
- goto copy;
+ if (cow_needed(vmf)) {
+ unlock_page(page);
+ goto copy;
+ }
}
/*
* Ok, we've got the only map reference, and the only
Powered by blists - more mailing lists