linux-kernel - Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <X+QMKC7jPEeThjB1@google.com>
Date:   Wed, 23 Dec 2020 20:34:00 -0700
From:   Yu Zhao <yuzhao@...gle.com>
To:     Nadav Amit <nadav.amit@...il.com>
Cc:     Andrea Arcangeli <aarcange@...hat.com>,
        Andy Lutomirski <luto@...capital.net>,
        Andy Lutomirski <luto@...nel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Peter Xu <peterx@...hat.com>, linux-mm <linux-mm@...ck.org>,
        lkml <linux-kernel@...r.kernel.org>,
        Pavel Emelyanov <xemul@...nvz.org>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Mike Rapoport <rppt@...ux.vnet.ibm.com>,
        stable <stable@...r.kernel.org>,
        Minchan Kim <minchan@...nel.org>,
        Will Deacon <will@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

On Wed, Dec 23, 2020 at 07:09:10PM -0800, Nadav Amit wrote:
> > On Dec 23, 2020, at 6:00 PM, Andrea Arcangeli <aarcange@...hat.com> wrote:
> > 
> > On Wed, Dec 23, 2020 at 05:21:43PM -0800, Andy Lutomirski wrote:
> >> I don’t love this as a long term fix. AFAICT we can have mm_tlb_flush_pending set for quite a while — mprotect seems like it can wait in IO while splitting a huge page, for example. That gives us a window in which every write fault turns into a TLB flush.
> > 
> > mprotect can't run concurrently with a page fault in the first place.
> > 
> > One other near zero cost improvement easy to add if this would be "if
> > (vma->vm_flags & (VM_SOFTDIRTY|VM_UFFD_WP))" and it could be made
> > conditional to the two config options too.
> > 
> > Still I don't mind doing it in some other way, uffd-wp has much easier
> > time doing it in another way in fact.
> > 
> > Whatever performs better is fine, but queuing up pending invalidate
> > ranges don't look very attractive since it'd be a fixed cost that we'd
> > always have to pay even when there's no fault (and there can't be any
> > fault at least for mprotect).
> 
> I think there are other cases in which Andy’s concern is relevant
> (MADV_PAGEOUT).

That patch only demonstrate a rough idea and I should have been
elaborate: if we ever decide to go that direction, we only need to
worry about "jumping through hoops", because the final patch (set)
I have in mind would not only have the build time optimization Andrea
suggested but also include runtime optimizations like skipping
do_swap_page() path and (!PageAnon() || page_mapcount > 1). Rest
assured, the performance impact on do_wp_page() from occasionally an
additional TLB flush on top of a page copy is negligible.

> Perhaps holding some small bitmap based on part of the deferred flushed
> pages (e.g., bits 12-17 of the address or some other kind of a single
> hash-function bloom-filter) would be more performant to avoid (most)
> unnecessary TLB flushes. It will be cleared before a TLB flush and set while
> holding the PTL.
> 
> Checking if a flush is needed, under the PTL, would require a single memory
> access (although potentially cache miss). It will however require one atomic
> operation for each page-table whose PTEs’ flushes are deferred - in contrast
> to the current scheme which requires two atomic operations for the *entire*
> operation.
>