lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ba208d76-7992-4c70-be8f-49082001f194@suse.cz>
Date: Thu, 5 Jun 2025 09:33:24 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: Jann Horn <jannh@...gle.com>, Andrew Morton <akpm@...ux-foundation.org>,
 David Hildenbrand <david@...hat.com>,
 Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
 "Liam R. Howlett" <Liam.Howlett@...cle.com>, Mike Rapoport
 <rppt@...nel.org>, Suren Baghdasaryan <surenb@...gle.com>,
 Michal Hocko <mhocko@...e.com>, linux-mm@...ck.org,
 Pedro Falcato <pfalcato@...e.de>
Cc: Peter Xu <peterx@...hat.com>, linux-kernel@...r.kernel.org,
 stable@...r.kernel.org
Subject: Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory
 snapshot

On 6/3/25 20:21, Jann Horn wrote:
> When fork() encounters possibly-pinned pages, those pages are immediately
> copied instead of just marking PTEs to make CoW happen later. If the parent
> is multithreaded, this can cause the child to see memory contents that are
> inconsistent in multiple ways:
> 
> 1. We are copying the contents of a page with a memcpy() while userspace
>    may be writing to it. This can cause the resulting data in the child to
>    be inconsistent.
> 2. After we've copied this page, future writes to other pages may
>    continue to be visible to the child while future writes to this page are
>    no longer visible to the child.
> 
> This means the child could theoretically see incoherent states where
> allocator freelists point to objects that are actually in use or stuff like
> that. A mitigating factor is that, unless userspace already has a deadlock
> bug, userspace can pretty much only observe such issues when fancy lockless
> data structures are used (because if another thread was in the middle of
> mutating data during fork() and the post-fork child tried to take the mutex
> protecting that data, it might wait forever).
> 
> On top of that, this issue is only observable when pages are either
> DMA-pinned or appear false-positive-DMA-pinned due to a page having >=1024
> references and the parent process having used DMA-pinning at least once
> before.

Seems the changelog seems to be missing the part describing what it's doing
to fix the issue? Some details are not immediately obvious (the writing
threads become blocked in page fault) as the conversation has shown.

> Fixes: 70e806e4e645 ("mm: Do early cow for pinned pages during fork() for ptes")
> Cc: stable@...r.kernel.org
> Signed-off-by: Jann Horn <jannh@...gle.com>

Given how the fix seems to be localized to the already rare slowpath and
doesn't require us to pessimize every trivial fork(), it seems reasonable to
me even if don't have a concrete example of a sane code in the wild that's
broken by the current behavior, so:

Acked-by: Vlastimil Babka <vbabka@...e.cz>

> ---
>  mm/memory.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 49199410805c..b406dfda976b 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -917,7 +917,25 @@ copy_present_page(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma
>  	/*
>  	 * We have a prealloc page, all good!  Take it
>  	 * over and copy the page & arm it.
> +	 *
> +	 * One nasty aspect is that we could be in a multithreaded process or
> +	 * such, where another thread is in the middle of writing to memory
> +	 * while this thread is forking. As long as we're just marking PTEs as
> +	 * read-only to make copy-on-write happen *later*, that's easy; we just
> +	 * need to do a single TLB flush before dropping the mmap/VMA locks, and
> +	 * that's enough to guarantee that the child gets a coherent snapshot of
> +	 * memory.
> +	 * But here, where we're doing an immediate copy, we must ensure that
> +	 * threads in the parent process can no longer write into the page being
> +	 * copied until we're done forking.
> +	 * This means that we still need to mark the source PTE as read-only,
> +	 * with an immediate TLB flush.
> +	 * (To make the source PTE writable again after fork() is done, we can
> +	 * rely on the page fault handler to do that lazily, thanks to
> +	 * PageAnonExclusive().)
>  	 */
> +	ptep_set_wrprotect(src_vma->vm_mm, addr, src_pte);
> +	flush_tlb_page(src_vma, addr);
>  
>  	if (copy_mc_user_highpage(&new_folio->page, page, addr, src_vma))
>  		return -EHWPOISON;
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ