lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpG+Zypy+_83UMLpFk9xJca2ptAb9ANV6tp7TG497vaGBQ@mail.gmail.com>
Date: Thu, 13 Feb 2025 14:56:01 -0800
From: Suren Baghdasaryan <surenb@...gle.com>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: Wei Yang <richard.weiyang@...il.com>, willy@...radead.org, 
	akpm@...ux-foundation.org, peterz@...radead.org, liam.howlett@...cle.com, 
	lorenzo.stoakes@...cle.com, david.laight.linux@...il.com, mhocko@...e.com, 
	hannes@...xchg.org, mjguzik@...il.com, oliver.sang@...el.com, 
	mgorman@...hsingularity.net, david@...hat.com, peterx@...hat.com, 
	oleg@...hat.com, dave@...olabs.net, paulmck@...nel.org, brauner@...nel.org, 
	dhowells@...hat.com, hdanton@...a.com, hughd@...gle.com, 
	lokeshgidra@...gle.com, minchan@...gle.com, jannh@...gle.com, 
	shakeel.butt@...ux.dev, souravpanda@...gle.com, pasha.tatashin@...een.com, 
	klarasmodin@...il.com, corbet@....net, linux-doc@...r.kernel.org, 
	linux-mm@...ck.org, linux-kernel@...r.kernel.org, kernel-team@...roid.com
Subject: Re: [PATCH v9 16/17] mm: make vma cache SLAB_TYPESAFE_BY_RCU

On Wed, Jan 15, 2025 at 7:10 AM Suren Baghdasaryan <surenb@...gle.com> wrote:
>
> On Tue, Jan 14, 2025 at 11:58 PM Vlastimil Babka <vbabka@...e.cz> wrote:
> >
> > On 1/15/25 04:15, Suren Baghdasaryan wrote:
> > > On Tue, Jan 14, 2025 at 6:27 PM Wei Yang <richard.weiyang@...il.com> wrote:
> > >>
> > >> On Fri, Jan 10, 2025 at 08:26:03PM -0800, Suren Baghdasaryan wrote:
> > >>
> > >> >diff --git a/kernel/fork.c b/kernel/fork.c
> > >> >index 9d9275783cf8..151b40627c14 100644
> > >> >--- a/kernel/fork.c
> > >> >+++ b/kernel/fork.c
> > >> >@@ -449,6 +449,42 @@ struct vm_area_struct *vm_area_alloc(struct mm_struct *mm)
> > >> >       return vma;
> > >> > }
> > >> >
> > >> >+static void vm_area_init_from(const struct vm_area_struct *src,
> > >> >+                            struct vm_area_struct *dest)
> > >> >+{
> > >> >+      dest->vm_mm = src->vm_mm;
> > >> >+      dest->vm_ops = src->vm_ops;
> > >> >+      dest->vm_start = src->vm_start;
> > >> >+      dest->vm_end = src->vm_end;
> > >> >+      dest->anon_vma = src->anon_vma;
> > >> >+      dest->vm_pgoff = src->vm_pgoff;
> > >> >+      dest->vm_file = src->vm_file;
> > >> >+      dest->vm_private_data = src->vm_private_data;
> > >> >+      vm_flags_init(dest, src->vm_flags);
> > >> >+      memcpy(&dest->vm_page_prot, &src->vm_page_prot,
> > >> >+             sizeof(dest->vm_page_prot));
> > >> >+      /*
> > >> >+       * src->shared.rb may be modified concurrently when called from
> > >> >+       * dup_mmap(), but the clone will reinitialize it.
> > >> >+       */
> > >> >+      data_race(memcpy(&dest->shared, &src->shared, sizeof(dest->shared)));
> > >> >+      memcpy(&dest->vm_userfaultfd_ctx, &src->vm_userfaultfd_ctx,
> > >> >+             sizeof(dest->vm_userfaultfd_ctx));
> > >> >+#ifdef CONFIG_ANON_VMA_NAME
> > >> >+      dest->anon_name = src->anon_name;
> > >> >+#endif
> > >> >+#ifdef CONFIG_SWAP
> > >> >+      memcpy(&dest->swap_readahead_info, &src->swap_readahead_info,
> > >> >+             sizeof(dest->swap_readahead_info));
> > >> >+#endif
> > >> >+#ifndef CONFIG_MMU
> > >> >+      dest->vm_region = src->vm_region;
> > >> >+#endif
> > >> >+#ifdef CONFIG_NUMA
> > >> >+      dest->vm_policy = src->vm_policy;
> > >> >+#endif
> > >> >+}
> > >>
> > >> Would this be difficult to maintain? We should make sure not miss or overwrite
> > >> anything.
> > >
> > > Yeah, it is less maintainable than a simple memcpy() but I did not
> > > find a better alternative.
> >
> > Willy knows one but refuses to share it :(
>
> Ah, that reminds me why I dropped this approach :) But to be honest,
> back then we also had vma_clear() and that added to the ugliness. Now
> I could simply to this without all those macros:
>
> static inline void vma_copy(struct vm_area_struct *new,
>                                             struct vm_area_struct *orig)
> {
>         /* Copy the vma while preserving vma->vm_lock */
>         data_race(memcpy(new, orig, offsetof(struct vm_area_struct, vm_lock)));
>         data_race(memcpy(new + offsetofend(struct vm_area_struct, vm_lock),
>                 orig + offsetofend(struct vm_area_struct, vm_lock),
>                 sizeof(struct vm_area_struct) -
>                 offsetofend(struct vm_area_struct, vm_lock));
> }
>
> Would that be better than the current approach?

I discussed proposed alternatives with Willy and he prefers the
current field-by-field copy approach. I also tried using
kmsan_check_memory() to check for uninitialized memory in the
vm_area_struct but unfortunately KMSAN stumbles on the holes in this
structure and there are 4 of them (I attached pahole output at the end
of this email). I tried unpoisoning holes but that gets very ugly very
fast. So, I posted v10
(https://lore.kernel.org/all/20250213224655.1680278-18-surenb@google.com/)
without changing this part.

struct vm_area_struct {
        union {
                struct {
                      unsigned long vm_start;          /*     0     8 */
                        unsigned long vm_end;            /*     8     8 */
                };                                       /*     0    16 */
                freeptr_t          vm_freeptr;           /*     0     8 */
        };                                               /*     0    16 */
        union {
                struct {
                        unsigned long      vm_start;             /*
 0     8 */
                        unsigned long      vm_end;               /*
 8     8 */
                };                                               /*
 0    16 */
                freeptr_t                  vm_freeptr;           /*
 0     8 */
        };

        struct mm_struct *         vm_mm;                /*    16     8 */
        pgprot_t                   vm_page_prot;         /*    24     8 */
        union {
                const vm_flags_t   vm_flags;             /*    32     8 */
                vm_flags_t         __vm_flags;           /*    32     8 */
        };                                               /*    32     8 */
        union {
                const vm_flags_t           vm_flags;             /*
 0     8 */
                vm_flags_t                 __vm_flags;           /*
 0     8 */
        };

        unsigned int               vm_lock_seq;          /*    40     4 */

        /* XXX 4 bytes hole, try to pack */

        struct list_head           anon_vma_chain;       /*    48    16 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        struct anon_vma *          anon_vma;             /*    64     8 */
        const struct vm_operations_struct  * vm_ops;     /*    72     8 */
        unsigned long              vm_pgoff;             /*    80     8 */
        struct file *              vm_file;              /*    88     8 */
        void *                     vm_private_data;      /*    96     8 */
        atomic_long_t              swap_readahead_info;  /*   104     8 */
        struct mempolicy *         vm_policy;            /*   112     8 */

        /* XXX 8 bytes hole, try to pack */

        /* --- cacheline 2 boundary (128 bytes) --- */
        refcount_t                 vm_refcnt
__attribute__((__aligned__(64))); /*   128     4 */

        /* XXX 4 bytes hole, try to pack */

        struct {
                struct rb_node     rb __attribute__((__aligned__(8)));
/*   136    24 */
                unsigned long      rb_subtree_last;      /*   160     8 */
        } shared;                                        /*   136    32 */
        struct {
                struct rb_node             rb
__attribute__((__aligned__(8))); /*     0    24 */
                unsigned long              rb_subtree_last;      /*
24     8 */

        /* size: 32, cachelines: 1, members: 2 */
        /* forced alignments: 1 */
        /* last cacheline: 32 bytes */
        };

        struct vm_userfaultfd_ctx  vm_userfaultfd_ctx;   /*   168     0 */

        /* size: 192, cachelines: 3, members: 16 */
        /* sum members: 152, holes: 3, sum holes: 16 */
        /* padding: 24 */
        /* forced alignments: 1, forced holes: 1, sum forced holes: 8 */
};

>
> >
> > > I added a warning above the struct
> > > vm_area_struct definition to update this function every time we change
> > > that structure. Not sure if there is anything else I can do to help
> > > with this.
> > >
> > >>
> > >> --
> > >> Wei Yang
> > >> Help you, Help me
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ