lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 23 Dec 2020 20:37:33 -0800 From: Nadav Amit <nadav.amit@...il.com> To: Yu Zhao <yuzhao@...gle.com> Cc: Andrea Arcangeli <aarcange@...hat.com>, Andy Lutomirski <luto@...capital.net>, Andy Lutomirski <luto@...nel.org>, Linus Torvalds <torvalds@...ux-foundation.org>, Peter Xu <peterx@...hat.com>, linux-mm <linux-mm@...ck.org>, lkml <linux-kernel@...r.kernel.org>, Pavel Emelyanov <xemul@...nvz.org>, Mike Kravetz <mike.kravetz@...cle.com>, Mike Rapoport <rppt@...ux.vnet.ibm.com>, stable <stable@...r.kernel.org>, Minchan Kim <minchan@...nel.org>, Will Deacon <will@...nel.org>, Peter Zijlstra <peterz@...radead.org> Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect > On Dec 23, 2020, at 7:34 PM, Yu Zhao <yuzhao@...gle.com> wrote: > > On Wed, Dec 23, 2020 at 07:09:10PM -0800, Nadav Amit wrote: >>> On Dec 23, 2020, at 6:00 PM, Andrea Arcangeli <aarcange@...hat.com> wrote: >>> >>> On Wed, Dec 23, 2020 at 05:21:43PM -0800, Andy Lutomirski wrote: >>>> I don’t love this as a long term fix. AFAICT we can have mm_tlb_flush_pending set for quite a while — mprotect seems like it can wait in IO while splitting a huge page, for example. That gives us a window in which every write fault turns into a TLB flush. >>> >>> mprotect can't run concurrently with a page fault in the first place. >>> >>> One other near zero cost improvement easy to add if this would be "if >>> (vma->vm_flags & (VM_SOFTDIRTY|VM_UFFD_WP))" and it could be made >>> conditional to the two config options too. >>> >>> Still I don't mind doing it in some other way, uffd-wp has much easier >>> time doing it in another way in fact. >>> >>> Whatever performs better is fine, but queuing up pending invalidate >>> ranges don't look very attractive since it'd be a fixed cost that we'd >>> always have to pay even when there's no fault (and there can't be any >>> fault at least for mprotect). >> >> I think there are other cases in which Andy’s concern is relevant >> (MADV_PAGEOUT). > > That patch only demonstrate a rough idea and I should have been > elaborate: if we ever decide to go that direction, we only need to > worry about "jumping through hoops", because the final patch (set) > I have in mind would not only have the build time optimization Andrea > suggested but also include runtime optimizations like skipping > do_swap_page() path and (!PageAnon() || page_mapcount > 1). Rest > assured, the performance impact on do_wp_page() from occasionally an > additional TLB flush on top of a page copy is negligible. I agree with you to a certain extent, since there is anyhow another TLB flush in this path when the PTE is set after copying. Yet, I think that having a combined and efficient central mechanism for pending TLB flushes is important even for robustness: to prevent the development of new independent deferred flushing schemes. I specifically do not like tlb_flush_batched which every time that I look at gets me confused. For example the following code completely confuses me: void flush_tlb_batched_pending(struct mm_struct *mm) { if (data_race(mm->tlb_flush_batched)) { flush_tlb_mm(mm); /* * Do not allow the compiler to re-order the clearing of * tlb_flush_batched before the tlb is flushed. */ barrier(); mm->tlb_flush_batched = false; } } … and then I ask myself (no answer): 1. What prevents concurrent flush_tlb_batched_pending() which is called by madvise_free_pte_range(), for instance from madvise_free_pte_range(), from clearing new deferred flush indication that was just set by set_tlb_ubc_flush_pending()? Can it cause a missed TLB flush later? 2. Why the write to tlb_flush_batched is not done with WRITE_ONCE()? 3. Should we have smp_wmb() instead of barrier()? (probably the barrier() is not needed at all since flush_tlb_mm() serializes if a flush is needed). 4. Why do we need 2 deferred TLB flushing mechanisms?
Powered by blists - more mailing lists