lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 23 Dec 2020 14:45:59 -0800
From:   Nadav Amit <nadav.amit@...il.com>
To:     Andrea Arcangeli <aarcange@...hat.com>
Cc:     Yu Zhao <yuzhao@...gle.com>, Peter Zijlstra <peterz@...radead.org>,
        Minchan Kim <minchan@...nel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Peter Xu <peterx@...hat.com>, linux-mm <linux-mm@...ck.org>,
        lkml <linux-kernel@...r.kernel.org>,
        Pavel Emelyanov <xemul@...nvz.org>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Mike Rapoport <rppt@...ux.vnet.ibm.com>,
        stable <stable@...r.kernel.org>,
        Andy Lutomirski <luto@...nel.org>,
        Will Deacon <will@...nel.org>
Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

> On Dec 23, 2020, at 2:05 PM, Andrea Arcangeli <aarcange@...hat.com> wrote:
> 
> On Tue, Dec 22, 2020 at 04:40:32AM -0800, Nadav Amit wrote:
>>> On Dec 21, 2020, at 1:24 PM, Yu Zhao <yuzhao@...gle.com> wrote:
>>> 
>>> On Mon, Dec 21, 2020 at 12:26:22PM -0800, Linus Torvalds wrote:
>>>> On Mon, Dec 21, 2020 at 12:23 PM Nadav Amit <nadav.amit@...il.com> wrote:
>>>>> Using mmap_write_lock() was my initial fix and there was a strong pushback
>>>>> on this approach due to its potential impact on performance.
>>>> 
>>>> From whom?
>>>> 
>>>> Somebody who doesn't understand that correctness is more important
>>>> than performance? And that userfaultfd is not the most important part
>>>> of the system?
>>>> 
>>>> The fact is, userfaultfd is CLEARLY BUGGY.
>>>> 
>>>>         Linus
>>> 
>>> Fair enough.
>>> 
>>> Nadav, for your patch (you might want to update the commit message).
>>> 
>>> Reviewed-by: Yu Zhao <yuzhao@...gle.com>
>>> 
>>> While we are all here, there is also clear_soft_dirty() that could
>>> use a similar fix…
>> 
>> Just an update as for why I have still not sent v2: I fixed
>> clear_soft_dirty(), created a reproducer, and the reproducer kept failing.
>> 
>> So after some debugging, it appears that clear_refs_write() does not flush
>> the TLB. It indeed calls tlb_finish_mmu() but since 0758cd830494
>> ("asm-generic/tlb: avoid potential double flush”), tlb_finish_mmu() does not
>> flush the TLB since there is clear_refs_write() does not call to
>> __tlb_adjust_range() (unless there are nested TLBs are pending).
>> 
>> So I have a patch for this issue too: arguably the tlb_gather interface is
>> not the right one for clear_refs_write() that does not clear PTEs but
>> changes them.
>> 
>> Yet, sadly, my reproducer keeps falling (less frequently, but still). So I
>> will keep debugging to see what goes wrong. I will send v2 once I figure out
>> what the heck is wrong in the code or my reproducer.
> 
> If you put the page_mapcount check back in do_wp_page instead of
> page_count, it'll stop reproducing but the bug is still very much
> there...

I know. I designed the (re)producer just to be able to hit the bug without
changing the kernel (and test my fix), but I am fully aware that it would
not work on older kernels although the bug is still there.

> And then we'll have to enforce uffd-wp cannot be registered if
> VM_SOFTDIRTY is set or the other way around so that VM_UFFD* is
> mutually exclusive with VM_SOFTDIRTY. So then we can also unify the
> bit so they all use the same software bit in the pgtable (that's
> something I considered anyway originally since it doesn't make whole
> lot of sense to use the two features on the same vma at the same
> time).

I think it may be reasonable.

Just a proposal: At some point we can also ask ourselves whether the
“artificial" limitation of the number of software bits per PTE should really
limit us, or do we want to hold some additional metadata per-PTE by either
putting it in an adjacent page (holding 64-bits of additional software-bits
per PTE) or by finding some place in the page-struct to link to this
metadata (and have the liberty of number of bits per PTE). One of the PTE
software-bits can be repurposed to say whether there is “extra-metadata”
associated with the PTE.

I am fully aware that there will be some overhead associated, but it
can be limited to less-common use-cases.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ