[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c82d75d1-5795-4401-92f8-58df6ac8dbd3@lucifer.local>
Date: Fri, 21 Nov 2025 17:44:43 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Muchun Song <muchun.song@...ux.dev>, Oscar Salvador <osalvador@...e.de>,
David Hildenbrand <david@...hat.com>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
Axel Rasmussen <axelrasmussen@...gle.com>,
Yuanchu Xie <yuanchu@...gle.com>, Wei Xu <weixugc@...gle.com>,
Peter Xu <peterx@...hat.com>, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
Kees Cook <kees@...nel.org>, Matthew Wilcox <willy@...radead.org>,
Jason Gunthorpe <jgg@...pe.ca>, John Hubbard <jhubbard@...dia.com>,
Leon Romanovsky <leon@...nel.org>, Zi Yan <ziy@...dia.com>,
Baolin Wang <baolin.wang@...ux.alibaba.com>,
Nico Pache <npache@...hat.com>, Ryan Roberts <ryan.roberts@....com>,
Dev Jain <dev.jain@....com>, Barry Song <baohua@...nel.org>,
Lance Yang <lance.yang@...ux.dev>, Xu Xin <xu.xin16@....com.cn>,
Chengming Zhou <chengming.zhou@...ux.dev>,
Jann Horn <jannh@...gle.com>, Matthew Brost <matthew.brost@...el.com>,
Joshua Hahn <joshua.hahnjy@...il.com>, Rakie Kim <rakie.kim@...com>,
Byungchul Park <byungchul@...com>, Gregory Price <gourry@...rry.net>,
Ying Huang <ying.huang@...ux.alibaba.com>,
Alistair Popple <apopple@...dia.com>, Pedro Falcato <pfalcato@...e.de>,
Shakeel Butt <shakeel.butt@...ux.dev>,
David Rientjes <rientjes@...gle.com>, Rik van Riel <riel@...riel.com>,
Harry Yoo <harry.yoo@...cle.com>,
Kemeng Shi <shikemeng@...weicloud.com>,
Kairui Song <kasong@...cent.com>, Nhat Pham <nphamcs@...il.com>,
Baoquan He <bhe@...hat.com>, Chris Li <chrisl@...nel.org>,
Johannes Weiner <hannes@...xchg.org>,
Qi Zheng <zhengqi.arch@...edance.com>, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
Miguel Ojeda <ojeda@...nel.org>, Alex Gaynor <alex.gaynor@...il.com>,
Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>,
Bjorn Roy Baron <bjorn3_gh@...tonmail.com>,
Benno Lossin <lossin@...nel.org>,
Andreas Hindborg <a.hindborg@...nel.org>,
Alice Ryhl <aliceryhl@...gle.com>, Trevor Gross <tmgross@...ch.edu>,
Danilo Krummrich <dakr@...nel.org>, rust-for-linux@...r.kernel.org
Subject: Re: [PATCH v2 4/4] mm: introduce VMA flags bitmap type
As Vlastimil noticed, something has gone fairly horribly wrong here in the
actual commit [0] vs. the patch here for tools/testing/vma/vma_internal.h.
We should only have the delta shown here, let me know if I need to help with a
conflict resolution! :)
Thanks, Lorenzo
[0]: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-stable&id=c3f7c506e8f122a31b9cc01d234e7fcda46b0eca
On Fri, Nov 14, 2025 at 01:26:11PM +0000, Lorenzo Stoakes wrote:
> It is useful to transition to using a bitmap for VMA flags so we can avoid
> running out of flags, especially for 32-bit kernels which are constrained
> to 32 flags, necessitating some features to be limited to 64-bit kernels
> only.
>
> By doing so, we remove any constraint on the number of VMA flags moving
> forwards no matter the platform and can decide in future to extend beyond
> 64 if required.
>
> We start by declaring an opaque types, vma_flags_t (which resembles
> mm_struct flags of type mm_flags_t), setting it to precisely the same size
> as vm_flags_t, and place it in union with vm_flags in the VMA declaration.
>
> We additionally update struct vm_area_desc equivalently placing the new
> opaque type in union with vm_flags.
>
> This change therefore does not impact the size of struct vm_area_struct or
> struct vm_area_desc.
>
> In order for the change to be iterative and to avoid impacting performance,
> we designate VM_xxx declared bitmap flag values as those which must exist
> in the first system word of the VMA flags bitmap.
>
> We therefore declare vma_flags_clear_all(), vma_flags_overwrite_word(),
> vma_flags_overwrite_word(), vma_flags_overwrite_word_once(),
> vma_flags_set_word() and vma_flags_clear_word() in order to allow us to
> update the existing vm_flags_*() functions to utilise these helpers.
>
> This is a stepping stone towards converting users to the VMA flags bitmap
> and behaves precisely as before.
>
> By doing this, we can eliminate the existing private vma->__vm_flags field
> in the vma->vm_flags union and replace it with the newly introduced opaque
> type vma_flags, which we call flags so we refer to the new bitmap field as
> vma->flags.
>
> We update vma_flag_[test, set]_atomic() to account for the change also.
>
> We additionally update the VMA userland test declarations to implement the
> same changes there.
>
> Finally, we update the rust code to reference vma->vm_flags on update
> rather than vma->__vm_flags which has been removed. This is safe for now,
> albeit it is implicitly performing a const cast.
>
> Once we introduce flag helpers we can improve this more.
>
> No functional change intended.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> ---
> include/linux/mm.h | 18 ++--
> include/linux/mm_types.h | 64 +++++++++++++-
> rust/kernel/mm/virt.rs | 2 +-
> tools/testing/vma/vma_internal.h | 143 ++++++++++++++++++++++++++-----
> 4 files changed, 196 insertions(+), 31 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ad000c472bd5..79345c44a350 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -919,7 +919,8 @@ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm)
> static inline void vm_flags_init(struct vm_area_struct *vma,
> vm_flags_t flags)
> {
> - ACCESS_PRIVATE(vma, __vm_flags) = flags;
> + vma_flags_clear_all(&vma->flags);
> + vma_flags_overwrite_word(&vma->flags, flags);
> }
>
> /*
> @@ -938,21 +939,26 @@ static inline void vm_flags_reset_once(struct vm_area_struct *vma,
> vm_flags_t flags)
> {
> vma_assert_write_locked(vma);
> - WRITE_ONCE(ACCESS_PRIVATE(vma, __vm_flags), flags);
> + /*
> + * The user should only be interested in avoiding reordering of
> + * assignment to the first word.
> + */
> + vma_flags_clear_all(&vma->flags);
> + vma_flags_overwrite_word_once(&vma->flags, flags);
> }
>
> static inline void vm_flags_set(struct vm_area_struct *vma,
> vm_flags_t flags)
> {
> vma_start_write(vma);
> - ACCESS_PRIVATE(vma, __vm_flags) |= flags;
> + vma_flags_set_word(&vma->flags, flags);
> }
>
> static inline void vm_flags_clear(struct vm_area_struct *vma,
> vm_flags_t flags)
> {
> vma_start_write(vma);
> - ACCESS_PRIVATE(vma, __vm_flags) &= ~flags;
> + vma_flags_clear_word(&vma->flags, flags);
> }
>
> /*
> @@ -995,12 +1001,14 @@ static inline bool __vma_flag_atomic_valid(struct vm_area_struct *vma,
> static inline void vma_flag_set_atomic(struct vm_area_struct *vma,
> vma_flag_t bit)
> {
> + unsigned long *bitmap = ACCESS_PRIVATE(&vma->flags, __vma_flags);
> +
> /* mmap read lock/VMA read lock must be held. */
> if (!rwsem_is_locked(&vma->vm_mm->mmap_lock))
> vma_assert_locked(vma);
>
> if (__vma_flag_atomic_valid(vma, bit))
> - set_bit((__force int)bit, &ACCESS_PRIVATE(vma, __vm_flags));
> + set_bit((__force int)bit, bitmap);
> }
>
> /*
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 3550672e0f9e..b71625378ce3 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -848,6 +848,15 @@ struct mmap_action {
> bool hide_from_rmap_until_complete :1;
> };
>
> +/*
> + * Opaque type representing current VMA (vm_area_struct) flag state. Must be
> + * accessed via vma_flags_xxx() helper functions.
> + */
> +#define NUM_VMA_FLAG_BITS BITS_PER_LONG
> +typedef struct {
> + DECLARE_BITMAP(__vma_flags, NUM_VMA_FLAG_BITS);
> +} __private vma_flags_t;
> +
> /*
> * Describes a VMA that is about to be mmap()'ed. Drivers may choose to
> * manipulate mutable fields which will cause those fields to be updated in the
> @@ -865,7 +874,10 @@ struct vm_area_desc {
> /* Mutable fields. Populated with initial state. */
> pgoff_t pgoff;
> struct file *vm_file;
> - vm_flags_t vm_flags;
> + union {
> + vm_flags_t vm_flags;
> + vma_flags_t vma_flags;
> + };
> pgprot_t page_prot;
>
> /* Write-only fields. */
> @@ -910,10 +922,12 @@ struct vm_area_struct {
> /*
> * Flags, see mm.h.
> * To modify use vm_flags_{init|reset|set|clear|mod} functions.
> + * Preferably, use vma_flags_xxx() functions.
> */
> union {
> + /* Temporary while VMA flags are being converted. */
> const vm_flags_t vm_flags;
> - vm_flags_t __private __vm_flags;
> + vma_flags_t flags;
> };
>
> #ifdef CONFIG_PER_VMA_LOCK
> @@ -994,6 +1008,52 @@ struct vm_area_struct {
> #endif
> } __randomize_layout;
>
> +/* Clears all bits in the VMA flags bitmap, non-atomically. */
> +static inline void vma_flags_clear_all(vma_flags_t *flags)
> +{
> + bitmap_zero(ACCESS_PRIVATE(flags, __vma_flags), NUM_VMA_FLAG_BITS);
> +}
> +
> +/*
> + * Copy value to the first system word of VMA flags, non-atomically.
> + *
> + * IMPORTANT: This does not overwrite bytes past the first system word. The
> + * caller must account for this.
> + */
> +static inline void vma_flags_overwrite_word(vma_flags_t *flags, unsigned long value)
> +{
> + *ACCESS_PRIVATE(flags, __vma_flags) = value;
> +}
> +
> +/*
> + * Copy value to the first system word of VMA flags ONCE, non-atomically.
> + *
> + * IMPORTANT: This does not overwrite bytes past the first system word. The
> + * caller must account for this.
> + */
> +static inline void vma_flags_overwrite_word_once(vma_flags_t *flags, unsigned long value)
> +{
> + unsigned long *bitmap = ACCESS_PRIVATE(flags, __vma_flags);
> +
> + WRITE_ONCE(*bitmap, value);
> +}
> +
> +/* Update the first system word of VMA flags setting bits, non-atomically. */
> +static inline void vma_flags_set_word(vma_flags_t *flags, unsigned long value)
> +{
> + unsigned long *bitmap = ACCESS_PRIVATE(flags, __vma_flags);
> +
> + *bitmap |= value;
> +}
> +
> +/* Update the first system word of VMA flags clearing bits, non-atomically. */
> +static inline void vma_flags_clear_word(vma_flags_t *flags, unsigned long value)
> +{
> + unsigned long *bitmap = ACCESS_PRIVATE(flags, __vma_flags);
> +
> + *bitmap &= ~value;
> +}
> +
> #ifdef CONFIG_NUMA
> #define vma_policy(vma) ((vma)->vm_policy)
> #else
> diff --git a/rust/kernel/mm/virt.rs b/rust/kernel/mm/virt.rs
> index a1bfa4e19293..da21d65ccd20 100644
> --- a/rust/kernel/mm/virt.rs
> +++ b/rust/kernel/mm/virt.rs
> @@ -250,7 +250,7 @@ unsafe fn update_flags(&self, set: vm_flags_t, unset: vm_flags_t) {
> // SAFETY: This is not a data race: the vma is undergoing initial setup, so it's not yet
> // shared. Additionally, `VmaNew` is `!Sync`, so it cannot be used to write in parallel.
> // The caller promises that this does not set the flags to an invalid value.
> - unsafe { (*self.as_ptr()).__bindgen_anon_2.__vm_flags = flags };
> + unsafe { (*self.as_ptr()).__bindgen_anon_2.vm_flags = flags };
> }
>
> /// Set the `VM_MIXEDMAP` flag on this vma.
> diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h
> index 18659214e262..13ee825bdfcf 100644
> --- a/tools/testing/vma/vma_internal.h
> +++ b/tools/testing/vma/vma_internal.h
> @@ -528,6 +528,15 @@ typedef struct {
> __private DECLARE_BITMAP(__mm_flags, NUM_MM_FLAG_BITS);
> } mm_flags_t;
>
> +/*
> + * Opaque type representing current VMA (vm_area_struct) flag state. Must be
> + * accessed via vma_flags_xxx() helper functions.
> + */
> +#define NUM_VMA_FLAG_BITS BITS_PER_LONG
> +typedef struct {
> + DECLARE_BITMAP(__vma_flags, NUM_VMA_FLAG_BITS);
> +} __private vma_flags_t;
> +
> struct mm_struct {
> struct maple_tree mm_mt;
> int map_count; /* number of VMAs */
> @@ -612,7 +621,10 @@ struct vm_area_desc {
> /* Mutable fields. Populated with initial state. */
> pgoff_t pgoff;
> struct file *vm_file;
> - vm_flags_t vm_flags;
> + union {
> + vm_flags_t vm_flags;
> + vma_flags_t vma_flags;
> + };
> pgprot_t page_prot;
>
> /* Write-only fields. */
> @@ -658,7 +670,7 @@ struct vm_area_struct {
> */
> union {
> const vm_flags_t vm_flags;
> - vm_flags_t __private __vm_flags;
> + vma_flags_t flags;
> };
>
> #ifdef CONFIG_PER_VMA_LOCK
> @@ -1372,26 +1384,6 @@ static inline bool may_expand_vm(struct mm_struct *mm, vm_flags_t flags,
> return true;
> }
>
> -static inline void vm_flags_init(struct vm_area_struct *vma,
> - vm_flags_t flags)
> -{
> - vma->__vm_flags = flags;
> -}
> -
> -static inline void vm_flags_set(struct vm_area_struct *vma,
> - vm_flags_t flags)
> -{
> - vma_start_write(vma);
> - vma->__vm_flags |= flags;
> -}
> -
> -static inline void vm_flags_clear(struct vm_area_struct *vma,
> - vm_flags_t flags)
> -{
> - vma_start_write(vma);
> - vma->__vm_flags &= ~flags;
> -}
> -
> static inline int shmem_zero_setup(struct vm_area_struct *vma)
> {
> return 0;
> @@ -1548,13 +1540,118 @@ static inline void userfaultfd_unmap_complete(struct mm_struct *mm,
> {
> }
>
> -# define ACCESS_PRIVATE(p, member) ((p)->member)
> +#define ACCESS_PRIVATE(p, member) ((p)->member)
> +
> +#define bitmap_size(nbits) (ALIGN(nbits, BITS_PER_LONG) / BITS_PER_BYTE)
> +
> +static __always_inline void bitmap_zero(unsigned long *dst, unsigned int nbits)
> +{
> + unsigned int len = bitmap_size(nbits);
> +
> + if (small_const_nbits(nbits))
> + *dst = 0;
> + else
> + memset(dst, 0, len);
> +}
>
> static inline bool mm_flags_test(int flag, const struct mm_struct *mm)
> {
> return test_bit(flag, ACCESS_PRIVATE(&mm->flags, __mm_flags));
> }
>
> +/* Clears all bits in the VMA flags bitmap, non-atomically. */
> +static inline void vma_flags_clear_all(vma_flags_t *flags)
> +{
> + bitmap_zero(ACCESS_PRIVATE(flags, __vma_flags), NUM_VMA_FLAG_BITS);
> +}
> +
> +/*
> + * Copy value to the first system word of VMA flags, non-atomically.
> + *
> + * IMPORTANT: This does not overwrite bytes past the first system word. The
> + * caller must account for this.
> + */
> +static inline void vma_flags_overwrite_word(vma_flags_t *flags, unsigned long value)
> +{
> + *ACCESS_PRIVATE(flags, __vma_flags) = value;
> +}
> +
> +/*
> + * Copy value to the first system word of VMA flags ONCE, non-atomically.
> + *
> + * IMPORTANT: This does not overwrite bytes past the first system word. The
> + * caller must account for this.
> + */
> +static inline void vma_flags_overwrite_word_once(vma_flags_t *flags, unsigned long value)
> +{
> + unsigned long *bitmap = ACCESS_PRIVATE(flags, __vma_flags);
> +
> + WRITE_ONCE(*bitmap, value);
> +}
> +
> +/* Update the first system word of VMA flags setting bits, non-atomically. */
> +static inline void vma_flags_set_word(vma_flags_t *flags, unsigned long value)
> +{
> + unsigned long *bitmap = ACCESS_PRIVATE(flags, __vma_flags);
> +
> + *bitmap |= value;
> +}
> +
> +/* Update the first system word of VMA flags clearing bits, non-atomically. */
> +static inline void vma_flags_clear_word(vma_flags_t *flags, unsigned long value)
> +{
> + unsigned long *bitmap = ACCESS_PRIVATE(flags, __vma_flags);
> +
> + *bitmap &= ~value;
> +}
> +
> +
> +/* Use when VMA is not part of the VMA tree and needs no locking */
> +static inline void vm_flags_init(struct vm_area_struct *vma,
> + vm_flags_t flags)
> +{
> + vma_flags_clear_all(&vma->flags);
> + vma_flags_overwrite_word(&vma->flags, flags);
> +}
> +
> +/*
> + * Use when VMA is part of the VMA tree and modifications need coordination
> + * Note: vm_flags_reset and vm_flags_reset_once do not lock the vma and
> + * it should be locked explicitly beforehand.
> + */
> +static inline void vm_flags_reset(struct vm_area_struct *vma,
> + vm_flags_t flags)
> +{
> + vma_assert_write_locked(vma);
> + vm_flags_init(vma, flags);
> +}
> +
> +static inline void vm_flags_reset_once(struct vm_area_struct *vma,
> + vm_flags_t flags)
> +{
> + vma_assert_write_locked(vma);
> + /*
> + * The user should only be interested in avoiding reordering of
> + * assignment to the first word.
> + */
> + vma_flags_clear_all(&vma->flags);
> + vma_flags_overwrite_word_once(&vma->flags, flags);
> +}
> +
> +static inline void vm_flags_set(struct vm_area_struct *vma,
> + vm_flags_t flags)
> +{
> + vma_start_write(vma);
> + vma_flags_set_word(&vma->flags, flags);
> +}
> +
> +static inline void vm_flags_clear(struct vm_area_struct *vma,
> + vm_flags_t flags)
> +{
> + vma_start_write(vma);
> + vma_flags_clear_word(&vma->flags, flags);
> +}
> +
> /*
> * Denies creating a writable executable mapping or gaining executable permissions.
> *
> --
> 2.51.0
>
Powered by blists - more mailing lists