linux-kernel - Re: [PATCH 1/5] mm: Introduce mm_struct.has

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200922175415.GI19098@xz-x1>
Date:   Tue, 22 Sep 2020 13:54:15 -0400
From:   Peter Xu <peterx@...hat.com>
To:     Jason Gunthorpe <jgg@...pe.ca>
Cc:     John Hubbard <jhubbard@...dia.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Jan Kara <jack@...e.cz>, Michal Hocko <mhocko@...e.com>,
        Kirill Tkhai <ktkhai@...tuozzo.com>,
        Kirill Shutemov <kirill@...temov.name>,
        Hugh Dickins <hughd@...gle.com>,
        Christoph Hellwig <hch@....de>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Oleg Nesterov <oleg@...hat.com>,
        Leon Romanovsky <leonro@...dia.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Jann Horn <jannh@...gle.com>
Subject: Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned

On Tue, Sep 22, 2020 at 01:10:46PM -0300, Jason Gunthorpe wrote:
> On Tue, Sep 22, 2020 at 11:17:36AM -0400, Peter Xu wrote:
> 
> > > But it's admittedly a cosmetic point, combined with my perennial fear that
> > > I'm missing something when I look at a READ_ONCE()/WRITE_ONCE() pair. :)
> > 
> > Yeah but I hope I'm using it right.. :) I used READ_ONCE/WRITE_ONCE explicitly
> > because I think they're cheaper than atomic operations, (which will, iiuc, lock
> > the bus).
> 
> It is worth thinking a bit about racing fork with
> pin_user_pages(). The desired outcome is:
> 
>   If fork wins the page is write protected, and pin_user_pages_fast()
>   will COW it.
> 
>   If pin_user_pages_fast() wins then fork must see the READ_ONCE and
>   the pin.
> 
> As get_user_pages_fast() is lockless it looks like the ordering has to
> be like this:
> 
>   pin_user_pages_fast()                   fork()
>    atomic_set(has_pinned, 1);
>    [..]
>    atomic_add(page->_refcount)
>    ordered check write protect()
>                                           ordered set write protect()
>                                           atomic_read(page->_refcount)
>                                           atomic_read(has_pinned)
> 
> Such that in all the degenerate racy cases the outcome is that both
> sides COW, never neither.
> 
> Thus I think it does have to be atomics purely from an ordering
> perspective, observing an increased _refcount requires that has_pinned
> != 0 if we are pinning.
> 
> So, to make this 100% this ordering will need to be touched up too.

Thanks for spotting this.  So something like below should work, right?

diff --git a/mm/memory.c b/mm/memory.c
index 8f3521be80ca..6591f3f33299 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -888,8 +888,8 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
                 * Because we'll need to release the locks before doing cow,
                 * pass this work to upper layer.
                 */
-               if (READ_ONCE(src_mm->has_pinned) && wp &&
-                   page_maybe_dma_pinned(page)) {
+               if (wp && page_maybe_dma_pinned(page) &&
+                   READ_ONCE(src_mm->has_pinned)) {
                        /* We've got the page already; we're safe */
                        data->cow_old_page = page;
                        data->cow_oldpte = *src_pte;

I can also add some more comment to emphasize this.

I think the WRITE_ONCE/READ_ONCE can actually be kept, because atomic ops
should contain proper memory barriers already so the memory access orders
should be guaranteed (e.g., atomic_add() will have an implicit wmb(); rmb() for
the other side).  However maybe it's even simpler to change has_pinned into
atomic as John suggested.  Thanks,

-- 
Peter Xu