linux-kernel - Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ddad0a93-e9f2-4b3b-afa9-53f0c8315ac1@redhat.com>
Date: Tue, 3 Jun 2025 22:17:49 +0200
From: David Hildenbrand <david@...hat.com>
To: Jann Horn <jannh@...gle.com>
Cc: Matthew Wilcox <willy@...radead.org>,
 Andrew Morton <akpm@...ux-foundation.org>,
 Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
 "Liam R. Howlett" <Liam.Howlett@...cle.com>, Vlastimil Babka
 <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
 Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
 linux-mm@...ck.org, Peter Xu <peterx@...hat.com>,
 linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory
 snapshot

On 03.06.25 21:09, Jann Horn wrote:
> On Tue, Jun 3, 2025 at 8:37 PM David Hildenbrand <david@...hat.com> wrote:
>> On 03.06.25 20:29, Matthew Wilcox wrote:
>>> On Tue, Jun 03, 2025 at 08:21:02PM +0200, Jann Horn wrote:
>>>> When fork() encounters possibly-pinned pages, those pages are immediately
>>>> copied instead of just marking PTEs to make CoW happen later. If the parent
>>>> is multithreaded, this can cause the child to see memory contents that are
>>>> inconsistent in multiple ways:
>>>>
>>>> 1. We are copying the contents of a page with a memcpy() while userspace
>>>>      may be writing to it. This can cause the resulting data in the child to
>>>>      be inconsistent.
>>>> 2. After we've copied this page, future writes to other pages may
>>>>      continue to be visible to the child while future writes to this page are
>>>>      no longer visible to the child.
>>>>
>>>> This means the child could theoretically see incoherent states where
>>>> allocator freelists point to objects that are actually in use or stuff like
>>>> that. A mitigating factor is that, unless userspace already has a deadlock
>>>> bug, userspace can pretty much only observe such issues when fancy lockless
>>>> data structures are used (because if another thread was in the middle of
>>>> mutating data during fork() and the post-fork child tried to take the mutex
>>>> protecting that data, it might wait forever).
>>>
>>> Um, OK, but isn't that expected behaviour?  POSIX says:
>>>
>>> : A process shall be created with a single thread. If a multi-threaded
>>> : process calls fork(), the new process shall contain a replica of the
>>> : calling thread and its entire address space, possibly including the
>>> : states of mutexes and other resources. Consequently, the application
>>> : shall ensure that the child process only executes async-signal-safe
>>> : operations until such time as one of the exec functions is successful.
>>>
>>> It's always been my understanding that you really, really shouldn't call
>>> fork() from a multithreaded process.
>>
>> I have the same recollection, but rather because of concurrent O_DIRECT
>> and locking (pthread_atfork ...).
>>
>> Using the allocator above example: what makes sure that no other thread
>> is halfway through modifying allocator state? You really have to sync
>> somehow before calling fork() -- e.g., grabbing allocator locks in
>> pthread_atfork().
> 
> Yeah, like what glibc does for its malloc implementation to prevent
> allocator calls from racing with fork(), so that malloc() keeps
> working after fork(), even though POSIX says that the libc doesn't
> have to guarantee that.

I mean, the patch here is simple, and there is already a performance 
penalty when allocating+copying the page, so it's not really the common 
hot path.

Merely a question if this was ever officially supported and warrents a 
"Fixes:".

-- 
Cheers,

David / dhildenb