linux-kernel - Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <9E301C7C-882A-4E0F-8D6D-1170E792065A@gmail.com>
Date:   Mon, 21 Dec 2020 12:23:06 -0800
From:   Nadav Amit <nadav.amit@...il.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Yu Zhao <yuzhao@...gle.com>, Peter Xu <peterx@...hat.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        linux-mm <linux-mm@...ck.org>,
        lkml <linux-kernel@...r.kernel.org>,
        Pavel Emelyanov <xemul@...nvz.org>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Mike Rapoport <rppt@...ux.vnet.ibm.com>,
        stable <stable@...r.kernel.org>,
        Minchan Kim <minchan@...nel.org>,
        Andy Lutomirski <luto@...nel.org>,
        Will Deacon <will@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

> On Dec 21, 2020, at 11:55 AM, Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> 
> On Mon, Dec 21, 2020 at 11:16 AM Yu Zhao <yuzhao@...gle.com> wrote:
>> Nadav Amit found memory corruptions when running userfaultfd test above.
>> It seems to me the problem is related to commit 09854ba94c6a ("mm:
>> do_wp_page() simplification"). Can you please take a look? Thanks.
>> 
>> TL;DR: it may not safe to make copies of singly mapped (non-COW) pages
>> when it's locked or has additional ref count because concurrent
>> clear_soft_dirty or change_pte_range may have removed pte_write but yet
>> to flush tlb.
> 
> Hmm. The TLB flush shouldn't actually matter, because anything that
> changes the writable bit had better be serialized by the page table
> lock.
> 
> Yes, we often load the page table value without holding the page table
> lock (in order to know what we are going to do), but then before we
> finalize the operation, we then re-check - undet the page table lock -
> that the value we loaded still matches.
> 
> But I think I see what *MAY* be going on.  The userfaultfd
> mwriteprotect_range() code takes the mm lock for _reading_. Which
> means that you can have
> 
> Thread A     Thread B
> 
> - fault starts. Sees write-protected pte, allocates memory, copies data
> 
>                   - userfaultfd makes the regions writable
> 
>                   - usefaultfd case writes to the region
> 
>                   - userfaultfd makes region non-writable
> 
> - fault continues, gets the page table lock, sees that the pte is the
> same, uses old copied data
> 
> But if this is what's happening, I think it's a userfaultfd bug. I
> think the mmap_read_lock(dst_mm) in mwriteprotect_range() needs to be
> a mmap_write_lock().
> 
> mprotect() does this right, it looks like userfaultfd does not. You
> cannot just change the writability of a page willy-nilly without the
> correct locking.
> 
> Maybe there are other causes, but this one stands out to me as one
> possible cause.
> 
> Comments?

Using mmap_write_lock() was my initial fix and there was a strong pushback
on this approach due to its potential impact on performance.

So I think there are 4 possible directions for solutions that I thought of
or others have raised/hinted:

1. mmap_write_lock()

2. Copy the page in cow_user_page() while holding the PTL and after flushing
   has been done. I am not sure if there are potential problems with
   special-pages (2 flushes might be necessary for special pages).

3. Always reuse the page and never COW on userfaultfd/soft-dirty.
   I do not know if it is feasible.

4. Retry the page-fault if mm->tlb_flush_pending is set.

Hopefully I did not miss any other suggestion.