[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJuCfpG+0zV3P-P+yr_bnGKJVkNHVznfcVmfcsWbUcW4Bw4LzQ@mail.gmail.com>
Date: Fri, 25 Apr 2025 10:26:47 -0700
From: Suren Baghdasaryan <surenb@...gle.com>
To: Kees Cook <kees@...nel.org>
Cc: "Liam R. Howlett" <Liam.Howlett@...cle.com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Andrew Morton <akpm@...ux-foundation.org>, Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
Pedro Falcato <pfalcato@...e.de>, David Hildenbrand <david@...hat.com>,
Alexander Viro <viro@...iv.linux.org.uk>, Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
linux-mm@...ck.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/4] mm: perform VMA allocation, freeing, duplication in mm
On Fri, Apr 25, 2025 at 10:12 AM Kees Cook <kees@...nel.org> wrote:
>
> On Fri, Apr 25, 2025 at 08:32:48AM -0700, Suren Baghdasaryan wrote:
> > On Fri, Apr 25, 2025 at 6:55 AM Liam R. Howlett <Liam.Howlett@...cle.com> wrote:
> > >
> > > * Lorenzo Stoakes <lorenzo.stoakes@...cle.com> [250425 06:40]:
> > > > On Thu, Apr 24, 2025 at 08:15:26PM -0700, Kees Cook wrote:
> > > > >
> > > > >
> > > > > On April 24, 2025 2:15:27 PM PDT, Lorenzo Stoakes <lorenzo.stoakes@...cle.com> wrote:
> > > > > >+static void vm_area_init_from(const struct vm_area_struct *src,
> > > > > >+ struct vm_area_struct *dest)
> > > > > >+{
> > > > > >+ dest->vm_mm = src->vm_mm;
> > > > > >+ dest->vm_ops = src->vm_ops;
> > > > > >+ dest->vm_start = src->vm_start;
> > > > > >+ dest->vm_end = src->vm_end;
> > > > > >+ dest->anon_vma = src->anon_vma;
> > > > > >+ dest->vm_pgoff = src->vm_pgoff;
> > > > > >+ dest->vm_file = src->vm_file;
> > > > > >+ dest->vm_private_data = src->vm_private_data;
> > > > > >+ vm_flags_init(dest, src->vm_flags);
> > > > > >+ memcpy(&dest->vm_page_prot, &src->vm_page_prot,
> > > > > >+ sizeof(dest->vm_page_prot));
> > > > > >+ /*
> > > > > >+ * src->shared.rb may be modified concurrently when called from
> > > > > >+ * dup_mmap(), but the clone will reinitialize it.
> > > > > >+ */
> > > > > >+ data_race(memcpy(&dest->shared, &src->shared, sizeof(dest->shared)));
> > > > > >+ memcpy(&dest->vm_userfaultfd_ctx, &src->vm_userfaultfd_ctx,
> > > > > >+ sizeof(dest->vm_userfaultfd_ctx));
> > > > > >+#ifdef CONFIG_ANON_VMA_NAME
> > > > > >+ dest->anon_name = src->anon_name;
> > > > > >+#endif
> > > > > >+#ifdef CONFIG_SWAP
> > > > > >+ memcpy(&dest->swap_readahead_info, &src->swap_readahead_info,
> > > > > >+ sizeof(dest->swap_readahead_info));
> > > > > >+#endif
> > > > > >+#ifdef CONFIG_NUMA
> > > > > >+ dest->vm_policy = src->vm_policy;
> > > > > >+#endif
> > > > > >+}
> > > > >
> > > > > I know you're doing a big cut/paste here, but why in the world is this function written this way? Why not just:
> > > > >
> > > > > *dest = *src;
> > > > >
> > > > > And then do any one-off cleanups?
> > > >
> > > > Yup I find it odd, and error prone to be honest. We'll end up with uninitialised
> > > > state for some fields if we miss them here, seems unwise...
> > > >
> > > > Presumably for performance?
> > > >
> > > > This is, as you say, me simply propagating what exists, but I do wonder.
> > >
> > > Two things come to mind:
> > >
> > > 1. How ctors are done. (v3 of Suren's RCU safe patch series, willy made
> > > a comment.. I think)
> > >
> > > 2. Some race that Vlastimil came up with the copy and the RCU safeness.
> > > IIRC it had to do with the ordering of the setting of things?
> > >
> > > Also, looking at it again...
> > >
> > > How is it safe to do dest->anon_name = src->anon_name? Isn't that ref
> > > counted?
> >
> > dest->anon_name = src->anon_name is fine here because right after
> > vm_area_init_from() we call dup_anon_vma_name() which will bump up the
> > refcount. I don't recall why this is done this way but now looking at
> > it I wonder if I could call dup_anon_vma_name() directly instead of
> > this assignment. Might be just an overlooked legacy from the time we
> > memcpy'd the entire structure. I'll need to double-check.
>
> Oh, is "dest" accessible to other CPU threads? I hadn't looked and was
> assuming this was like process creation where everything gets built in
> isolation and then attached to the main process tree. I was thinking
> this was similar.
Yeah, it's process creation time but this structure is created from a
SLAB_TYPESAFE_BY_RCU cache which adds complexity. A newly allocated
object from this cache might be still accessible from another thread
holding a reference to its earlier incarnation. We need an indication
for that other thread to say "this object has been released, so the
reference you are holding is pointing to a freed or reallocated/wrong
object". vm_refcnt in this case is this indication and we are careful
not to override it even temporarily during object initialization.
Well, in truth we override it later with 0 but for the other thread
that will still mean this object is not what it wants.
I suspect you know this already but just in case
https://elixir.bootlin.com/linux/v6.14.3/source/include/linux/slab.h#L101
has more detailed explanation of SLAB_TYPESAFE_BY_RCU.
>
> --
> Kees Cook
Powered by blists - more mailing lists