[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzbOXrbixQA=fpg17QPBv+4myAQrHvCX42hVye0Ww9W2Aw@mail.gmail.com>
Date: Thu, 17 Oct 2024 11:55:35 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Suren Baghdasaryan <surenb@...gle.com>
Cc: Shakeel Butt <shakeel.butt@...ux.dev>, Andrii Nakryiko <andrii@...nel.org>,
linux-trace-kernel@...r.kernel.org, linux-mm@...ck.org, peterz@...radead.org,
oleg@...hat.com, rostedt@...dmis.org, mhiramat@...nel.org,
bpf@...r.kernel.org, linux-kernel@...r.kernel.org, jolsa@...nel.org,
paulmck@...nel.org, willy@...radead.org, akpm@...ux-foundation.org,
mjguzik@...il.com, brauner@...nel.org, jannh@...gle.com, mhocko@...nel.org,
vbabka@...e.cz, hannes@...xchg.org, Liam.Howlett@...cle.com,
lorenzo.stoakes@...cle.com
Subject: Re: [PATCH v3 tip/perf/core 2/4] mm: switch to 64-bit
mm_lock_seq/vm_lock_seq on 64-bit architectures
On Wed, Oct 16, 2024 at 7:02 PM Suren Baghdasaryan <surenb@...gle.com> wrote:
>
> On Sun, Oct 13, 2024 at 12:56 AM Shakeel Butt <shakeel.butt@...ux.dev> wrote:
> >
> > On Thu, Oct 10, 2024 at 01:56:42PM GMT, Andrii Nakryiko wrote:
> > > To increase mm->mm_lock_seq robustness, switch it from int to long, so
> > > that it's a 64-bit counter on 64-bit systems and we can stop worrying
> > > about it wrapping around in just ~4 billion iterations. Same goes for
> > > VMA's matching vm_lock_seq, which is derived from mm_lock_seq.
>
> vm_lock_seq does not need to be long but for consistency I guess that
How come, we literally assign vm_lock_seq from mm_lock_seq and do
direct comparisons. They have to be exactly the same type, no?
> makes sense. While at it, can you please change these seq counters to
> be unsigned?
There is `vma->vm_lock_seq = -1;` in kernel/fork.c, should it be
switched to ULONG_MAX then? In general, unless this is critical for
correctness, I'd very much like stuff like this to be done in the mm
tree afterwards, but it seems trivial enough, so if you insist I'll do
it.
> Also, did you check with pahole if the vm_area_struct layout change
> pushes some members into a difference cacheline or creates new gaps?
>
Just did. We had 3 byte hole after `bool detached;`, it now grew to 7
bytes (so +4) and then vm_lock_seq itself is now 8 bytes (so +4),
which now does push rb and rb_subtree_last into *THE SAME* cache line
(which sounds like an improvement to me). vm_lock_seq and vm_lock stay
in the same cache line. vm_pgoff and vm_file are now in the same cache
line, and given they are probably always accessed together, seems like
a good accidental change as well. See below pahole outputs before and
after.
That singular detached bool looks like a complete waste, tbh. Maybe it
would be better to roll it into vm_flags and save 8 bytes? (not that I
want to do those mm changes in this patch set, of course...).
vm_area_struct is otherwise nicely tightly packed.
tl;dr, seems fine, and detached would be best to get rid of, if
possible (but that's a completely separate thing)
BEFORE
======
struct vm_area_struct {
union {
struct {
long unsigned int vm_start; /* 0 8 */
long unsigned int vm_end; /* 8 8 */
}; /* 0 16 */
struct callback_head vm_rcu; /* 0 16 */
} __attribute__((__aligned__(8))); /* 0 16 */
struct mm_struct * vm_mm; /* 16 8 */
pgprot_t vm_page_prot; /* 24 8 */
union {
const vm_flags_t vm_flags; /* 32 8 */
vm_flags_t __vm_flags; /* 32 8 */
}; /* 32 8 */
bool detached; /* 40 1 */
/* XXX 3 bytes hole, try to pack */
int vm_lock_seq; /* 44 4 */
struct vma_lock * vm_lock; /* 48 8 */
struct {
struct rb_node rb; /* 56 24 */
/* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
long unsigned int rb_subtree_last; /* 80 8 */
} /* 56 32 */
struct list_head anon_vma_chain; /* 88 16 */
struct anon_vma * anon_vma; /* 104 8 */
const struct vm_operations_struct * vm_ops; /* 112 8 */
long unsigned int vm_pgoff; /* 120 8 */
/* --- cacheline 2 boundary (128 bytes) --- */
struct file * vm_file; /* 128 8 */
void * vm_private_data; /* 136 8 */
atomic_long_t swap_readahead_info; /* 144 8 */
struct mempolicy * vm_policy; /* 152 8 */
struct vma_numab_state * numab_state; /* 160 8 */
struct vm_userfaultfd_ctx vm_userfaultfd_ctx; /* 168 8 */
/* size: 176, cachelines: 3, members: 18 */
/* sum members: 173, holes: 1, sum holes: 3 */
/* forced alignments: 2 */
/* last cacheline: 48 bytes */
} __attribute__((__aligned__(8)));
AFTER
=====
struct vm_area_struct {
union {
struct {
long unsigned int vm_start; /* 0 8 */
long unsigned int vm_end; /* 8 8 */
}; /* 0 16 */
struct callback_head vm_rcu; /* 0 16 */
} __attribute__((__aligned__(8))); /* 0 16 */
struct mm_struct * vm_mm; /* 16 8 */
pgprot_t vm_page_prot; /* 24 8 */
union {
const vm_flags_t vm_flags; /* 32 8 */
vm_flags_t __vm_flags; /* 32 8 */
}; /* 32 8 */
bool detached; /* 40 1 */
/* XXX 7 bytes hole, try to pack */
long int vm_lock_seq; /* 48 8 */
struct vma_lock * vm_lock; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct {
struct rb_node rb; /* 64 24 */
long unsigned int rb_subtree_last; /* 88 8 */
} /* 64 32 */
struct list_head anon_vma_chain; /* 96 16 */
struct anon_vma * anon_vma; /* 112 8 */
const struct vm_operations_struct * vm_ops; /* 120 8 */
/* --- cacheline 2 boundary (128 bytes) --- */
long unsigned int vm_pgoff; /* 128 8 */
struct file * vm_file; /* 136 8 */
void * vm_private_data; /* 144 8 */
atomic_long_t swap_readahead_info; /* 152 8 */
struct mempolicy * vm_policy; /* 160 8 */
struct vma_numab_state * numab_state; /* 168 8 */
struct vm_userfaultfd_ctx vm_userfaultfd_ctx; /* 176 8 */
/* size: 184, cachelines: 3, members: 18 */
/* sum members: 177, holes: 1, sum holes: 7 */
/* forced alignments: 2 */
/* last cacheline: 56 bytes */
} __attribute__((__aligned__(8)));
> > >
> > > I didn't use __u64 outright to keep 32-bit architectures unaffected, but
> > > if it seems important enough, I have nothing against using __u64.
> > >
> > > Suggested-by: Jann Horn <jannh@...gle.com>
> > > Signed-off-by: Andrii Nakryiko <andrii@...nel.org>
> >
> > Reviewed-by: Shakeel Butt <shakeel.butt@...ux.dev>
>
Powered by blists - more mailing lists