linux-kernel - Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <X+PE38s2Egq4nzKv@google.com>
Date:   Wed, 23 Dec 2020 15:29:51 -0700
From:   Yu Zhao <yuzhao@...gle.com>
To:     Andrea Arcangeli <aarcange@...hat.com>
Cc:     Andy Lutomirski <luto@...nel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Peter Xu <peterx@...hat.com>,
        Nadav Amit <nadav.amit@...il.com>,
        linux-mm <linux-mm@...ck.org>,
        lkml <linux-kernel@...r.kernel.org>,
        Pavel Emelyanov <xemul@...nvz.org>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Mike Rapoport <rppt@...ux.vnet.ibm.com>,
        stable <stable@...r.kernel.org>,
        Minchan Kim <minchan@...nel.org>,
        Will Deacon <will@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

On Wed, Dec 23, 2020 at 04:39:00PM -0500, Andrea Arcangeli wrote:
> On Tue, Dec 22, 2020 at 08:36:04PM -0700, Yu Zhao wrote:
> > Thanks for the details.
> 
> I hope we can find a way put the page_mapcount back where there's a
> page_count right now.
> 
> If you're so worried about having to maintain a all defined well
> documented (or to be documented even better if you ACK it)
> marker/catcher for userfaultfd_writeprotect, I can't see how you could
> consider to maintain the page fault safe against any random code
> leaving too permissive TLB entries out of sync of the more restrictive
> pte permissions as it was happening with clear_refs_write, which
> worked by luck until page_mapcount was changed to page_count.
> 
> page_count is far from optimal, but it is a feature it finally allowed
> us to notice that various code (clear_refs_write included apparently
> even after the fix) leaves stale too permissive TLB entries when it
> shouldn't.
> 
> The question is only which way you prefer to fix clear_refs_write and
> I don't think we can deviate from those 3 methods that already exist
> today. So clear_refs_write will have to pick one of those and
> currently it's not falling in the same category with mprotect even
> after the fix.
> 
> I think if clear_refs_write starts to take the mmap_write_lock and
> really start to operate like mprotect, only then we can consider to
> make userfaultfd_writeprotect also operate like mprotect.
> 
> Even then I'd hope we can at least be allowed to make it operate like
> KSM write_protect_page for len <= HPAGE_PMD_SIZE, something that
> clear_refs_write cannot do since it works in O(N) and tends to scan
> everything at once, so there would be no point to optimize not to
> defer the flush, for a process with a tiny amount of virtual memory
> mapped.
> 
> vm86 also should be fixed to fall in the same category with mprotect,
> since performance there is irrelevant.

I was hesitant to suggest the following because it isn't that straight
forward. But since you seem to be less concerned with the complexity,
I'll just bring it on the table -- it would take care of both ufd and
clear_refs_write, wouldn't it?

diff --git a/mm/memory.c b/mm/memory.c
index 5e9ca612d7d7..af38c5ee327e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4403,8 +4403,11 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
 		goto unlock;
 	}
 	if (vmf->flags & FAULT_FLAG_WRITE) {
-		if (!pte_write(entry))
+		if (!pte_write(entry)) {
+			if (mm_tlb_flush_pending(vmf->vma->vm_mm))
+				flush_tlb_page(vmf->vma, vmf->address);
 			return do_wp_page(vmf);
+		}
 		entry = pte_mkdirty(entry);
 	}
 	entry = pte_mkyoung(entry);