[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201221195357.GI6640@xz-x1>
Date: Mon, 21 Dec 2020 14:53:57 -0500
From: Peter Xu <peterx@...hat.com>
To: Nadav Amit <nadav.amit@...il.com>
Cc: Yu Zhao <yuzhao@...gle.com>,
Andrea Arcangeli <aarcange@...hat.com>,
linux-mm <linux-mm@...ck.org>,
lkml <linux-kernel@...r.kernel.org>,
Pavel Emelyanov <xemul@...nvz.org>,
Mike Kravetz <mike.kravetz@...cle.com>,
Mike Rapoport <rppt@...ux.vnet.ibm.com>,
stable@...r.kernel.org, minchan@...nel.org,
Andy Lutomirski <luto@...nel.org>,
Will Deacon <will@...nel.org>,
Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect
On Mon, Dec 21, 2020 at 10:31:57AM -0800, Nadav Amit wrote:
> > On Dec 21, 2020, at 9:27 AM, Peter Xu <peterx@...hat.com> wrote:
> >
> > Hi, Nadav,
> >
> > On Sun, Dec 20, 2020 at 12:06:38AM -0800, Nadav Amit wrote:
> >
> > [...]
> >
> >> So to correct myself, I think that what I really encountered was actually
> >> during MM_CP_UFFD_WP_RESOLVE (i.e., when the protection is removed). The
> >> problem was that in this case the “write”-bit was removed during unprotect.
> >> Sorry for the strange formatting to fit within 80 columns:
> >
> > I assume I can ignore the race mentioned in the commit message but only refer
> > to this one below. However I'm still confused. Please see below.
> >
> >> [ Start: PTE is writable ]
> >>
> >> cpu0 cpu1 cpu2
> >> ---- ---- ----
> >> [ Writable PTE
> >> cached in TLB ]
> >
> > Here cpu2 got writable pte in tlb. But why?
> >
> > If below is an unprotect, it means it must have been protected once by
> > userfaultfd, right? If so, the previous change_protection_range() which did
> > the wr-protect should have done a tlb flush already before it returns (since
> > pages>0 - we protected one pte at least). Then I can't see why cpu2 tlb has
> > stall data.
>
> Thanks, Peter. Just as you can munprotect() a region which was not protected
> before, you can ufff-unprotect a region that was not protected before. It
> might be that the user tried to unprotect a large region, which was
> partially protected and partially unprotected.
>
> The selftest obviously blindly unprotect some regions to check for bugs.
>
> So to your question - it was not write-protected (think about initial copy
> without write-protecting).
If that's the only case, how about we don't touch the ptes at all? Instead of
playing with preserve_write, I'm thinking something like this right before
ptep_modify_prot_start(), even for uffd_wp==true:
if (uffd_wp && pte_uffd_wp(old_pte)) {
WARN_ON_ONCE(pte_write(old_pte));
continue;
}
if (uffd_wp_resolve && !pte_uffd_wp(old_pte))
continue;
Then we can also avoid the heavy operations on changing ptes back and forth.
>
> > If I assume cpu2 doesn't have that cached tlb, then "write to old page" won't
> > happen either, because cpu1/cpu2 will all go through the cow path and pgtable
> > lock should serialize them.
> >
> >> userfaultfd_writeprotect()
> >> [ write-*unprotect* ]
> >> mwriteprotect_range()
> >> mmap_read_lock()
> >> change_protection()
> >>
> >> change_protection_range()
> >> ...
> >> change_pte_range()
> >> [ *clear* “write”-bit ]
> >> [ defer TLB flushes]
> >> [ page-fault ]
> >> …
> >> wp_page_copy()
> >> cow_user_page()
> >> [ copy page ]
> >> [ write to old
> >> page ]
> >> …
> >> set_pte_at_notify()
> >>
> >> [ End: cpu2 write not copied form old to new page. ]
> >
> > Could you share how to reproduce the problem? I would be glad to give it a
> > shot as well.
>
> You can run the selftests/userfaultfd with my small patch [1]. I ran it with
> the following parameters: “ ./userfaultfd anon 100 100 “. I think that it is
> more easily reproducible with “mitigations=off idle=poll” as kernel
> parameters.
>
> [1] https://lore.kernel.org/patchwork/patch/1346386/
Thanks.
>
> >
> >> [1] https://lore.kernel.org/patchwork/patch/1346386
> >
> > PS: Sorry to not have read the other series of yours. It seems to need some
> > chunk of time so I postponed it a bit due to other things; but I'll read at
> > least the fixes very soon.
>
> Thanks again, I will post RFCv2 with some numbers soon.
I read the patch 1/3 of the series. Would it be better to post them separately
just in case Andrew would like to pick them earlier?
Since you seem to be heavily working on uffd-wp - I do still have a few uffd-wp
fixes locally even for anonymous. I think they're related to some corner cases
like either thp or migration entry convertions, but anyway I'll see whether I
should post them even earlier (I planned to add smap/pagemap support for
uffd-wp so maybe I can even write some test case to verify some of them). Just
a FYI...
Thanks,
--
Peter Xu
Powered by blists - more mailing lists