linux-kernel - Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wjNedWcjAfPW7pdPTM0-gSXABsv9AA+wCebXbh3tuRTRQ@mail.gmail.com>
Date:   Tue, 22 Dec 2020 16:20:47 -0800
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Yu Zhao <yuzhao@...gle.com>
Cc:     Andrea Arcangeli <aarcange@...hat.com>,
        Andy Lutomirski <luto@...nel.org>,
        Peter Xu <peterx@...hat.com>,
        Nadav Amit <nadav.amit@...il.com>,
        linux-mm <linux-mm@...ck.org>,
        lkml <linux-kernel@...r.kernel.org>,
        Pavel Emelyanov <xemul@...nvz.org>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Mike Rapoport <rppt@...ux.vnet.ibm.com>,
        stable <stable@...r.kernel.org>,
        Minchan Kim <minchan@...nel.org>,
        Will Deacon <will@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

On Tue, Dec 22, 2020 at 3:50 PM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> The rule is that the TLB flush has to be done before the page table
> lock is released.

I take that back. I guess it's ok as long as the mmap_sem is held for
writing. Then the TLB flush can be delayed until just before releasing
the mmap_sem. I think.

The stale TLB entries still mean that somebody else can write through
them in another thread, but as long as anybody who actually unmaps the
page (and frees it - think rmap etc) is being careful, mprotect()
itself can probably afford to be a bit laissez-faire.

So mprotect() itself should be ok, I think, because it takes things for writing.

Even with the mmap_sem held for writing, truncate and friends can see
the read-only page table entries (because they can look things up
using the file i_mmap thing instead), but then they rely on the page
table lock and they'll also be careful if they then change that PTE
and will force their own TLB flushes.

So I think a pending TLB flush outside the page table lock is fine -
but once again only if you hold the mmap_sem for writing. Not for
reading, because then the page tables need to be synchronized with the
TLB so that other readers don't see the not-yet-synchronized state.

It once again looks like it's just userfaultfd that would trigger this
due to the read-lock on the mmap_sem. And mprotect() itself is fine.

Am I missing something?

But apparently Nadav sees problems even with that lock changed to a
write lock. Navad?

           Linus