lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG48ez3cgG-PikyO7a84CFdPFvPY9BSNJOZ7wZVQ7Q9Qju_6Ng@mail.gmail.com>
Date: Wed, 4 Jun 2025 20:11:08 +0200
From: Jann Horn <jannh@...gle.com>
To: Peter Xu <peterx@...hat.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, David Hildenbrand <david@...hat.com>, 
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, "Liam R. Howlett" <Liam.Howlett@...cle.com>, 
	Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>, Suren Baghdasaryan <surenb@...gle.com>, 
	Michal Hocko <mhocko@...e.com>, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] mm/memory: Document how we make a coherent memory snapshot

On Wed, Jun 4, 2025 at 7:04 PM Peter Xu <peterx@...hat.com> wrote:
> On Tue, Jun 03, 2025 at 08:21:03PM +0200, Jann Horn wrote:
> > It is not currently documented that the child of fork() should receive a
> > coherent snapshot of the parent's memory, or how we get such a snapshot.
> > Add a comment block to explain this.
> >
> > Signed-off-by: Jann Horn <jannh@...gle.com>
> > ---
> >  kernel/fork.c | 34 ++++++++++++++++++++++++++++++++++
> >  1 file changed, 34 insertions(+)
> >
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 85afccfdf3b1..f78f5df596a9 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -604,6 +604,40 @@ static void dup_mm_exe_file(struct mm_struct *mm, struct mm_struct *oldmm)
> >  }
> >
> >  #ifdef CONFIG_MMU
> > +/*
> > + * Anonymous memory inherited by the child MM must, on success, contain a
> > + * coherent snapshot of corresponding anonymous memory in the parent MM.
>
> Should we better define what is a coherent snapshot?  Or maybe avoid using
> this term which seems to apply to the whole mm?
>
> I think it's at least not a snapshot of whole mm at a specific time,
> because as long as there can be more than one concurrent writers (hence, it
> needs to be at least 3 threads in the parent process, 1 in charge of fork),
> this can happen:
>
>   parent writer 1      parent writer 2    parent fork thr
>   ---------------      ---------------    ---------------
>                                           wr-protect P1
>   write P1                                                  <---- T1
>   (trapped, didn't happen)
>                        write PN                             <---- T2
>                        (went through)
>                                           ...
>                                           wr-protect PN
>
> The result of above would be that child process will see a mixture of old
> P1 (at timestamp T1) but updated P2 (timestamp T2).  I don't think it's
> impossible that the userapp could try to serialize "write P1" and "write
> PN" operations in a way that it would also get a surprise seeing in the
> child PN updated but P1 didn't.

If the write at T1 hits a page fault, then it doesn't actually happen
at T1. The write instruction starts doing something at T1, but it does
not fully retire, and the architectural register state does not
change, and in particular the instruction pointer does not advance
past this instruction; just like when speculative execution is aborted
after a branch misprediction, except that the CPU raises an exception
and we enter the page fault handler. The write actually happens when
the instruction is executed a second time after page fault handling
has completed after the mmap lock is dropped. (Unless something during
page fault handling raises a signal, in which case the instruction
might never architecturally execute.)

(There is a caveat to what I just said, which makes this more complex
but does not fundamentally change the outcome: An instruction that
performs multiple memory writes without specific atomicity guarantees
can successfully do some writes and then fail on a later write. In
this case, after the page fault handler resolves the fault, the entire
instruction will run from the start again, including re-doing the
writes that were already done on the first execution, and this works
because such instructions are designed to be idempotent in this regard
and they don't make atomicity guarantees.)

> I do agree it at least recovered the per-page coherence, though, no matter
> what is the POSIX definition of that.  IIUC an userapp can always fix such
> problem, but maybe it's too complicated in some cases, and if Linux used to
> at least maintain per-page coherency, then it may make sense to recover the
> behavior especially when it only affects pinned.
>
> Said that, maybe we still want to be specific on the goal of the change.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ