linux-kernel - Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20201221172711.GE6640@xz-x1>
Date:   Mon, 21 Dec 2020 12:27:11 -0500
From:   Peter Xu <peterx@...hat.com>
To:     Nadav Amit <nadav.amit@...il.com>
Cc:     Yu Zhao <yuzhao@...gle.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        linux-mm <linux-mm@...ck.org>,
        lkml <linux-kernel@...r.kernel.org>,
        Pavel Emelyanov <xemul@...nvz.org>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Mike Rapoport <rppt@...ux.vnet.ibm.com>,
        stable@...r.kernel.org, minchan@...nel.org,
        Andy Lutomirski <luto@...nel.org>,
        Will Deacon <will@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

Hi, Nadav,

On Sun, Dec 20, 2020 at 12:06:38AM -0800, Nadav Amit wrote:

[...]

> So to correct myself, I think that what I really encountered was actually
> during MM_CP_UFFD_WP_RESOLVE (i.e., when the protection is removed). The
> problem was that in this case the “write”-bit was removed during unprotect.
> Sorry for the strange formatting to fit within 80 columns:

I assume I can ignore the race mentioned in the commit message but only refer
to this one below.  However I'm still confused.  Please see below.

> 
> 
> [ Start: PTE is writable ]
> 
> cpu0				cpu1			cpu2
> ----				----			----
> 							[ Writable PTE 
> 							  cached in TLB ]

Here cpu2 got writable pte in tlb.  But why?

If below is an unprotect, it means it must have been protected once by
userfaultfd, right?  If so, the previous change_protection_range() which did
the wr-protect should have done a tlb flush already before it returns (since
pages>0 - we protected one pte at least).  Then I can't see why cpu2 tlb has
stall data.

If I assume cpu2 doesn't have that cached tlb, then "write to old page" won't
happen either, because cpu1/cpu2 will all go through the cow path and pgtable
lock should serialize them.

> userfaultfd_writeprotect()				
> [ write-*unprotect* ]
> mwriteprotect_range()
> mmap_read_lock()
> change_protection()
> 
> change_protection_range()
>  ...
>  change_pte_range()
>  [ *clear* “write”-bit ]
>  [ defer TLB flushes]
> 				[ page-fault ]
> 				…
> 				wp_page_copy()
> 				 cow_user_page()
> 				  [ copy page ]
> 							[ write to old
> 							  page ]
> 				…
> 				 set_pte_at_notify()
> 
> [ End: cpu2 write not copied form old to new page. ]

Could you share how to reproduce the problem?  I would be glad to give it a
shot as well.

> [1] https://lore.kernel.org/patchwork/patch/1346386

PS: Sorry to not have read the other series of yours.  It seems to need some
chunk of time so I postponed it a bit due to other things; but I'll read at
least the fixes very soon.

Thanks,

-- 
Peter Xu