lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 23 Dec 2020 15:29:51 -0700
From:   Yu Zhao <>
To:     Andrea Arcangeli <>
Cc:     Andy Lutomirski <>,
        Linus Torvalds <>,
        Peter Xu <>,
        Nadav Amit <>,
        linux-mm <>,
        lkml <>,
        Pavel Emelyanov <>,
        Mike Kravetz <>,
        Mike Rapoport <>,
        stable <>,
        Minchan Kim <>,
        Will Deacon <>,
        Peter Zijlstra <>
Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

On Wed, Dec 23, 2020 at 04:39:00PM -0500, Andrea Arcangeli wrote:
> On Tue, Dec 22, 2020 at 08:36:04PM -0700, Yu Zhao wrote:
> > Thanks for the details.
> I hope we can find a way put the page_mapcount back where there's a
> page_count right now.
> If you're so worried about having to maintain a all defined well
> documented (or to be documented even better if you ACK it)
> marker/catcher for userfaultfd_writeprotect, I can't see how you could
> consider to maintain the page fault safe against any random code
> leaving too permissive TLB entries out of sync of the more restrictive
> pte permissions as it was happening with clear_refs_write, which
> worked by luck until page_mapcount was changed to page_count.
> page_count is far from optimal, but it is a feature it finally allowed
> us to notice that various code (clear_refs_write included apparently
> even after the fix) leaves stale too permissive TLB entries when it
> shouldn't.
> The question is only which way you prefer to fix clear_refs_write and
> I don't think we can deviate from those 3 methods that already exist
> today. So clear_refs_write will have to pick one of those and
> currently it's not falling in the same category with mprotect even
> after the fix.
> I think if clear_refs_write starts to take the mmap_write_lock and
> really start to operate like mprotect, only then we can consider to
> make userfaultfd_writeprotect also operate like mprotect.
> Even then I'd hope we can at least be allowed to make it operate like
> KSM write_protect_page for len <= HPAGE_PMD_SIZE, something that
> clear_refs_write cannot do since it works in O(N) and tends to scan
> everything at once, so there would be no point to optimize not to
> defer the flush, for a process with a tiny amount of virtual memory
> mapped.
> vm86 also should be fixed to fall in the same category with mprotect,
> since performance there is irrelevant.

I was hesitant to suggest the following because it isn't that straight
forward. But since you seem to be less concerned with the complexity,
I'll just bring it on the table -- it would take care of both ufd and
clear_refs_write, wouldn't it?

diff --git a/mm/memory.c b/mm/memory.c
index 5e9ca612d7d7..af38c5ee327e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4403,8 +4403,11 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
 		goto unlock;
 	if (vmf->flags & FAULT_FLAG_WRITE) {
-		if (!pte_write(entry))
+		if (!pte_write(entry)) {
+			if (mm_tlb_flush_pending(vmf->vma->vm_mm))
+				flush_tlb_page(vmf->vma, vmf->address);
 			return do_wp_page(vmf);
+		}
 		entry = pte_mkdirty(entry);
 	entry = pte_mkyoung(entry);

Powered by blists - more mailing lists