[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <X/3MWWYY0FlBpH9r@hirez.programming.kicks-ass.net>
Date: Tue, 12 Jan 2021 17:20:41 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Andrea Arcangeli <aarcange@...hat.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Andy Lutomirski <luto@...nel.org>,
Peter Xu <peterx@...hat.com>,
Nadav Amit <nadav.amit@...il.com>, Yu Zhao <yuzhao@...gle.com>,
linux-mm <linux-mm@...ck.org>,
lkml <linux-kernel@...r.kernel.org>,
Pavel Emelyanov <xemul@...nvz.org>,
Mike Kravetz <mike.kravetz@...cle.com>,
Mike Rapoport <rppt@...ux.vnet.ibm.com>,
stable <stable@...r.kernel.org>,
Minchan Kim <minchan@...nel.org>, Will Deacon <will@...nel.org>
Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect
On Tue, Jan 05, 2021 at 01:03:48PM -0500, Andrea Arcangeli wrote:
> On Tue, Jan 05, 2021 at 04:37:27PM +0100, Peter Zijlstra wrote:
> > (your other email clarified this point; the COW needs to copy while
> > holding the PTL and we need TLBI under PTL if we're to change this)
>
> The COW doesn't need to hold the PT lock, the TLBI broadcast doesn't
> need to be delivered under PT lock either.
>
> Simply there need to be a TLBI broadcast before the copy. The patch I
> sent here https://lkml.kernel.org/r/X+QLr1WmGXMs33Ld@redhat.com that
> needs to be cleaned up with some abstraction and better commentary
> also misses a smp_mb() in the case flush_tlb_page is not called, but
> that's a small detail.
That's horrific crap. All of that tlb-pending stuff is batshit, and this
makes it worse.
> > And I'm thinking the speculative page fault series steps right into all
> > this, it fundamentally avoids mmap_sem and entirely relies on the PTL.
>
> I thought about that but that only applies to some kind of "anon" page
> fault.
That must be something new; it used to handle all faults. I specifically
spend quite a bit of time getting the file crud right (which Linus
initially fingered for being horrible broken).
SPF fundamentally elides the mmap_sem, which Linus said must serialize
faults.
> Here the problem isn't just the page fault, the problem is not to
> regress clear_refs to block on page fault I/O, and all
IIRC we do the actual reads without any locks held, just like
VM_FAULT_RETRY does today. You take the fault, find you need IO, drop
locks, do IO, retake fault.
> MAP_PRIVATE/MAP_SHARED filebacked faults bitting the disk to read
> /usr/ will still prevent clear_refs from running (and the other way
> around) if it has to take the mmap_sem for writing.
>
> I don't look at the speculative page fault for a while but last I
> checked there was nothing there that can tame the above major
> regression from CPU speed to disk I/O speed that would be inflicted on
> both clear_refs on huge mm and on uffd-wp.
All of the clear_refs nonsense is immaterial to SPF. Also, who again
cares about clear_refs? Why is it important?
Powered by blists - more mailing lists