linux-kernel - Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wg_UBuo7ro1fpEGkMyFKA1+PxrE85f9J_AhUfr-nJPpLQ@mail.gmail.com>
Date:   Mon, 21 Dec 2020 11:55:02 -0800
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Yu Zhao <yuzhao@...gle.com>
Cc:     Peter Xu <peterx@...hat.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        linux-mm <linux-mm@...ck.org>,
        lkml <linux-kernel@...r.kernel.org>,
        Pavel Emelyanov <xemul@...nvz.org>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Mike Rapoport <rppt@...ux.vnet.ibm.com>,
        stable <stable@...r.kernel.org>,
        Minchan Kim <minchan@...nel.org>,
        Andy Lutomirski <luto@...nel.org>,
        Will Deacon <will@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Nadav Amit <nadav.amit@...il.com>
Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

On Mon, Dec 21, 2020 at 11:16 AM Yu Zhao <yuzhao@...gle.com> wrote:
>
> Nadav Amit found memory corruptions when running userfaultfd test above.
> It seems to me the problem is related to commit 09854ba94c6a ("mm:
> do_wp_page() simplification"). Can you please take a look? Thanks.
>
> TL;DR: it may not safe to make copies of singly mapped (non-COW) pages
> when it's locked or has additional ref count because concurrent
> clear_soft_dirty or change_pte_range may have removed pte_write but yet
> to flush tlb.

Hmm. The TLB flush shouldn't actually matter, because anything that
changes the writable bit had better be serialized by the page table
lock.

Yes, we often load the page table value without holding the page table
lock (in order to know what we are going to do), but then before we
finalize the operation, we then re-check - undet the page table lock -
that the value we loaded still matches.

But I think I see what *MAY* be going on.  The userfaultfd
mwriteprotect_range() code takes the mm lock for _reading_. Which
means that you can have

Thread A     Thread B

 - fault starts. Sees write-protected pte, allocates memory, copies data

                   - userfaultfd makes the regions writable

                   - usefaultfd case writes to the region

                   - userfaultfd makes region non-writable

 - fault continues, gets the page table lock, sees that the pte is the
same, uses old copied data

But if this is what's happening, I think it's a userfaultfd bug. I
think the mmap_read_lock(dst_mm) in mwriteprotect_range() needs to be
a mmap_write_lock().

mprotect() does this right, it looks like userfaultfd does not. You
cannot just change the writability of a page willy-nilly without the
correct locking.

Maybe there are other causes, but this one stands out to me as one
possible cause.

Comments?

              Linus