linux-kernel - Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <db2268f0-7885-471d-94a3-8ae4641ba2e5@redhat.com>
Date: Tue, 3 Jun 2025 20:37:29 +0200
From: David Hildenbrand <david@...hat.com>
To: Matthew Wilcox <willy@...radead.org>, Jann Horn <jannh@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
 Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
 "Liam R. Howlett" <Liam.Howlett@...cle.com>, Vlastimil Babka
 <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
 Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
 linux-mm@...ck.org, Peter Xu <peterx@...hat.com>,
 linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory
 snapshot

On 03.06.25 20:29, Matthew Wilcox wrote:
> On Tue, Jun 03, 2025 at 08:21:02PM +0200, Jann Horn wrote:
>> When fork() encounters possibly-pinned pages, those pages are immediately
>> copied instead of just marking PTEs to make CoW happen later. If the parent
>> is multithreaded, this can cause the child to see memory contents that are
>> inconsistent in multiple ways:
>>
>> 1. We are copying the contents of a page with a memcpy() while userspace
>>     may be writing to it. This can cause the resulting data in the child to
>>     be inconsistent.
>> 2. After we've copied this page, future writes to other pages may
>>     continue to be visible to the child while future writes to this page are
>>     no longer visible to the child.
>>
>> This means the child could theoretically see incoherent states where
>> allocator freelists point to objects that are actually in use or stuff like
>> that. A mitigating factor is that, unless userspace already has a deadlock
>> bug, userspace can pretty much only observe such issues when fancy lockless
>> data structures are used (because if another thread was in the middle of
>> mutating data during fork() and the post-fork child tried to take the mutex
>> protecting that data, it might wait forever).
> 
> Um, OK, but isn't that expected behaviour?  POSIX says:
> 
> : A process shall be created with a single thread. If a multi-threaded
> : process calls fork(), the new process shall contain a replica of the
> : calling thread and its entire address space, possibly including the
> : states of mutexes and other resources. Consequently, the application
> : shall ensure that the child process only executes async-signal-safe
> : operations until such time as one of the exec functions is successful.
> 
> It's always been my understanding that you really, really shouldn't call
> fork() from a multithreaded process.

I have the same recollection, but rather because of concurrent O_DIRECT 
and locking (pthread_atfork ...).

Using the allocator above example: what makes sure that no other thread 
is halfway through modifying allocator state? You really have to sync 
somehow before calling fork() -- e.g., grabbing allocator locks in 
pthread_atfork().

For Linux we document in the man page

"After  a  fork() in a multithreaded program, the child can safely call 
only async-signal-safe functions (see signal-safety(7)) until such time 
as it calls execve(2)."

-- 
Cheers,

David / dhildenb