[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200922175415.GI19098@xz-x1>
Date: Tue, 22 Sep 2020 13:54:15 -0400
From: Peter Xu <peterx@...hat.com>
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: John Hubbard <jhubbard@...dia.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>,
Jan Kara <jack@...e.cz>, Michal Hocko <mhocko@...e.com>,
Kirill Tkhai <ktkhai@...tuozzo.com>,
Kirill Shutemov <kirill@...temov.name>,
Hugh Dickins <hughd@...gle.com>,
Christoph Hellwig <hch@....de>,
Andrea Arcangeli <aarcange@...hat.com>,
Oleg Nesterov <oleg@...hat.com>,
Leon Romanovsky <leonro@...dia.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Jann Horn <jannh@...gle.com>
Subject: Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
On Tue, Sep 22, 2020 at 01:10:46PM -0300, Jason Gunthorpe wrote:
> On Tue, Sep 22, 2020 at 11:17:36AM -0400, Peter Xu wrote:
>
> > > But it's admittedly a cosmetic point, combined with my perennial fear that
> > > I'm missing something when I look at a READ_ONCE()/WRITE_ONCE() pair. :)
> >
> > Yeah but I hope I'm using it right.. :) I used READ_ONCE/WRITE_ONCE explicitly
> > because I think they're cheaper than atomic operations, (which will, iiuc, lock
> > the bus).
>
> It is worth thinking a bit about racing fork with
> pin_user_pages(). The desired outcome is:
>
> If fork wins the page is write protected, and pin_user_pages_fast()
> will COW it.
>
> If pin_user_pages_fast() wins then fork must see the READ_ONCE and
> the pin.
>
> As get_user_pages_fast() is lockless it looks like the ordering has to
> be like this:
>
> pin_user_pages_fast() fork()
> atomic_set(has_pinned, 1);
> [..]
> atomic_add(page->_refcount)
> ordered check write protect()
> ordered set write protect()
> atomic_read(page->_refcount)
> atomic_read(has_pinned)
>
> Such that in all the degenerate racy cases the outcome is that both
> sides COW, never neither.
>
> Thus I think it does have to be atomics purely from an ordering
> perspective, observing an increased _refcount requires that has_pinned
> != 0 if we are pinning.
>
> So, to make this 100% this ordering will need to be touched up too.
Thanks for spotting this. So something like below should work, right?
diff --git a/mm/memory.c b/mm/memory.c
index 8f3521be80ca..6591f3f33299 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -888,8 +888,8 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
* Because we'll need to release the locks before doing cow,
* pass this work to upper layer.
*/
- if (READ_ONCE(src_mm->has_pinned) && wp &&
- page_maybe_dma_pinned(page)) {
+ if (wp && page_maybe_dma_pinned(page) &&
+ READ_ONCE(src_mm->has_pinned)) {
/* We've got the page already; we're safe */
data->cow_old_page = page;
data->cow_oldpte = *src_pte;
I can also add some more comment to emphasize this.
I think the WRITE_ONCE/READ_ONCE can actually be kept, because atomic ops
should contain proper memory barriers already so the memory access orders
should be guaranteed (e.g., atomic_add() will have an implicit wmb(); rmb() for
the other side). However maybe it's even simpler to change has_pinned into
atomic as John suggested. Thanks,
--
Peter Xu
Powered by blists - more mailing lists