[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <3A6A1049-24C6-4B2D-8C59-21B549F742B4@gmail.com>
Date: Wed, 23 Dec 2020 19:09:10 -0800
From: Nadav Amit <nadav.amit@...il.com>
To: Andrea Arcangeli <aarcange@...hat.com>
Cc: Andy Lutomirski <luto@...capital.net>, Yu Zhao <yuzhao@...gle.com>,
Andy Lutomirski <luto@...nel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Peter Xu <peterx@...hat.com>, linux-mm <linux-mm@...ck.org>,
lkml <linux-kernel@...r.kernel.org>,
Pavel Emelyanov <xemul@...nvz.org>,
Mike Kravetz <mike.kravetz@...cle.com>,
Mike Rapoport <rppt@...ux.vnet.ibm.com>,
stable <stable@...r.kernel.org>,
Minchan Kim <minchan@...nel.org>,
Will Deacon <will@...nel.org>,
Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect
> On Dec 23, 2020, at 6:00 PM, Andrea Arcangeli <aarcange@...hat.com> wrote:
>
> On Wed, Dec 23, 2020 at 05:21:43PM -0800, Andy Lutomirski wrote:
>> I don’t love this as a long term fix. AFAICT we can have mm_tlb_flush_pending set for quite a while — mprotect seems like it can wait in IO while splitting a huge page, for example. That gives us a window in which every write fault turns into a TLB flush.
>
> mprotect can't run concurrently with a page fault in the first place.
>
> One other near zero cost improvement easy to add if this would be "if
> (vma->vm_flags & (VM_SOFTDIRTY|VM_UFFD_WP))" and it could be made
> conditional to the two config options too.
>
> Still I don't mind doing it in some other way, uffd-wp has much easier
> time doing it in another way in fact.
>
> Whatever performs better is fine, but queuing up pending invalidate
> ranges don't look very attractive since it'd be a fixed cost that we'd
> always have to pay even when there's no fault (and there can't be any
> fault at least for mprotect).
I think there are other cases in which Andy’s concern is relevant
(MADV_PAGEOUT).
Perhaps holding some small bitmap based on part of the deferred flushed
pages (e.g., bits 12-17 of the address or some other kind of a single
hash-function bloom-filter) would be more performant to avoid (most)
unnecessary TLB flushes. It will be cleared before a TLB flush and set while
holding the PTL.
Checking if a flush is needed, under the PTL, would require a single memory
access (although potentially cache miss). It will however require one atomic
operation for each page-table whose PTEs’ flushes are deferred - in contrast
to the current scheme which requires two atomic operations for the *entire*
operation.
Powered by blists - more mailing lists