linux-kernel - Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <55080C83-4047-4AA9-B6A3-785A1E4E17B1@gmail.com>
Date:   Tue, 22 Dec 2020 13:14:58 -0800
From:   Nadav Amit <nadav.amit@...il.com>
To:     Andrea Arcangeli <aarcange@...hat.com>
Cc:     Andy Lutomirski <luto@...nel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Peter Xu <peterx@...hat.com>, Yu Zhao <yuzhao@...gle.com>,
        linux-mm <linux-mm@...ck.org>,
        lkml <linux-kernel@...r.kernel.org>,
        Pavel Emelyanov <xemul@...nvz.org>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Mike Rapoport <rppt@...ux.vnet.ibm.com>,
        stable <stable@...r.kernel.org>,
        Minchan Kim <minchan@...nel.org>,
        Will Deacon <will@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

> On Dec 22, 2020, at 11:31 AM, Andrea Arcangeli <aarcange@...hat.com> wrote:
> 
> From 4ace4d1b53f5cb3b22a5c2dc33facc4150b112d6 Mon Sep 17 00:00:00 2001
> From: Andrea Arcangeli <aarcange@...hat.com>
> Date: Tue, 22 Dec 2020 14:30:16 -0500
> Subject: [PATCH 1/1] mm: userfaultfd: avoid leaving stale TLB after
> userfaultfd_writeprotect()
> 
> change_protection() is called by userfaultfd_writeprotect() with the
> mmap_lock_read like in change_prot_numa().
> 
> The page fault code in wp_copy_page() rightfully assumes if the CPU
> issued a write fault and the write bit in the pagetable is not set, no
> CPU can write to the page. That's wrong assumption after
> userfaultfd_writeprotect(). That's also wrong assumption after
> change_prot_numa() where the regular page fault code also would assume
> that if the present bit is not set and the page fault is running,
> there should be no stale TLB entry, but there is still.
> 
> So to stay safe, the page fault code must be prevented to run as long
> as long as the TLB flush remains pending. That is already achieved by
> the do_numa_page() path for change_prot_numa() and by the
> userfaultfd_pte_wp() path for userfaultfd_writeprotect().
> 
> The problem that needs fixing is that an un-wrprotect
> (i.e. userfaultfd_writeprotect() with UFFDIO_WRITEPROTECT_MODE_WP not
> set) could run in between the original wrprotect
> (i.e. userfaultfd_writeprotect() with UFFDIO_WRITEPROTECT_MODE_WP set)
> and wp_copy_page, while the TLB flush remains pending.

I may need to read your patch more carefully, but in general I do not like
the approach. You are much more experienced than I am, but IMHO the TLB
flushing logic needs to be further simplified and generalized and not the
other way around.

The complexity is already too high. We have tlb_flush_batched and
tlb_flush_pending, which I think should be (somehow) combined.
tlb_gather_mmu() is designed for zapping, but can’t it be modified to suit
protection changes to avoid buggy use-cases (as the wrong use in
clear_refs_write() ) ?

So adding new userfaultfd specific code, which potentially does not address
all the interactions (now or the future), is concerning.

In this regard, a similar problem to the one in userfaultfd
(mmap_read_lock() while write-protecting) already exists with soft-dirty
clearing, so any solution should also address the soft-dirty issue.