lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 23 Dec 2020 20:37:33 -0800
From:   Nadav Amit <>
To:     Yu Zhao <>
Cc:     Andrea Arcangeli <>,
        Andy Lutomirski <>,
        Andy Lutomirski <>,
        Linus Torvalds <>,
        Peter Xu <>, linux-mm <>,
        lkml <>,
        Pavel Emelyanov <>,
        Mike Kravetz <>,
        Mike Rapoport <>,
        stable <>,
        Minchan Kim <>,
        Will Deacon <>,
        Peter Zijlstra <>
Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

> On Dec 23, 2020, at 7:34 PM, Yu Zhao <> wrote:
> On Wed, Dec 23, 2020 at 07:09:10PM -0800, Nadav Amit wrote:
>>> On Dec 23, 2020, at 6:00 PM, Andrea Arcangeli <> wrote:
>>> On Wed, Dec 23, 2020 at 05:21:43PM -0800, Andy Lutomirski wrote:
>>>> I don’t love this as a long term fix. AFAICT we can have mm_tlb_flush_pending set for quite a while — mprotect seems like it can wait in IO while splitting a huge page, for example. That gives us a window in which every write fault turns into a TLB flush.
>>> mprotect can't run concurrently with a page fault in the first place.
>>> One other near zero cost improvement easy to add if this would be "if
>>> (vma->vm_flags & (VM_SOFTDIRTY|VM_UFFD_WP))" and it could be made
>>> conditional to the two config options too.
>>> Still I don't mind doing it in some other way, uffd-wp has much easier
>>> time doing it in another way in fact.
>>> Whatever performs better is fine, but queuing up pending invalidate
>>> ranges don't look very attractive since it'd be a fixed cost that we'd
>>> always have to pay even when there's no fault (and there can't be any
>>> fault at least for mprotect).
>> I think there are other cases in which Andy’s concern is relevant
> That patch only demonstrate a rough idea and I should have been
> elaborate: if we ever decide to go that direction, we only need to
> worry about "jumping through hoops", because the final patch (set)
> I have in mind would not only have the build time optimization Andrea
> suggested but also include runtime optimizations like skipping
> do_swap_page() path and (!PageAnon() || page_mapcount > 1). Rest
> assured, the performance impact on do_wp_page() from occasionally an
> additional TLB flush on top of a page copy is negligible.

I agree with you to a certain extent, since there is anyhow another TLB
flush in this path when the PTE is set after copying.

Yet, I think that having a combined and efficient central mechanism for
pending TLB flushes is important even for robustness: to prevent the
development of new independent deferred flushing schemes. I specifically do
not like tlb_flush_batched which every time that I look at gets me confused.
For example the following code completely confuses me:

  void flush_tlb_batched_pending(struct mm_struct *mm)
        if (data_race(mm->tlb_flush_batched)) {

                 * Do not allow the compiler to re-order the clearing of
                 * tlb_flush_batched before the tlb is flushed.
                mm->tlb_flush_batched = false;

… and then I ask myself (no answer):

1. What prevents concurrent flush_tlb_batched_pending() which is called by
madvise_free_pte_range(), for instance from madvise_free_pte_range(), from
clearing new deferred flush indication that was just set by
set_tlb_ubc_flush_pending()? Can it cause a missed TLB flush later?

2. Why the write to tlb_flush_batched is not done with WRITE_ONCE()?

3. Should we have smp_wmb() instead of barrier()? (probably the barrier() is
not needed at all since flush_tlb_mm() serializes if a flush is needed).

4. Why do we need 2 deferred TLB flushing mechanisms?

Powered by blists - more mailing lists