lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251029190228.GS760669@ziepe.ca>
Date: Wed, 29 Oct 2025 16:02:28 -0300
From: Jason Gunthorpe <jgg@...pe.ca>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	Muchun Song <muchun.song@...ux.dev>,
	Oscar Salvador <osalvador@...e.de>,
	David Hildenbrand <david@...hat.com>,
	"Liam R . Howlett" <Liam.Howlett@...cle.com>,
	Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
	Suren Baghdasaryan <surenb@...gle.com>,
	Michal Hocko <mhocko@...e.com>,
	Axel Rasmussen <axelrasmussen@...gle.com>,
	Yuanchu Xie <yuanchu@...gle.com>, Wei Xu <weixugc@...gle.com>,
	Peter Xu <peterx@...hat.com>, Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Juri Lelli <juri.lelli@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>,
	Kees Cook <kees@...nel.org>, Matthew Wilcox <willy@...radead.org>,
	John Hubbard <jhubbard@...dia.com>,
	Leon Romanovsky <leon@...nel.org>, Zi Yan <ziy@...dia.com>,
	Baolin Wang <baolin.wang@...ux.alibaba.com>,
	Nico Pache <npache@...hat.com>, Ryan Roberts <ryan.roberts@....com>,
	Dev Jain <dev.jain@....com>, Barry Song <baohua@...nel.org>,
	Lance Yang <lance.yang@...ux.dev>, Xu Xin <xu.xin16@....com.cn>,
	Chengming Zhou <chengming.zhou@...ux.dev>,
	Jann Horn <jannh@...gle.com>,
	Matthew Brost <matthew.brost@...el.com>,
	Joshua Hahn <joshua.hahnjy@...il.com>, Rakie Kim <rakie.kim@...com>,
	Byungchul Park <byungchul@...com>,
	Gregory Price <gourry@...rry.net>,
	Ying Huang <ying.huang@...ux.alibaba.com>,
	Alistair Popple <apopple@...dia.com>,
	Pedro Falcato <pfalcato@...e.de>,
	Shakeel Butt <shakeel.butt@...ux.dev>,
	David Rientjes <rientjes@...gle.com>,
	Rik van Riel <riel@...riel.com>, Harry Yoo <harry.yoo@...cle.com>,
	Kemeng Shi <shikemeng@...weicloud.com>,
	Kairui Song <kasong@...cent.com>, Nhat Pham <nphamcs@...il.com>,
	Baoquan He <bhe@...hat.com>, Chris Li <chrisl@...nel.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Qi Zheng <zhengqi.arch@...edance.com>, linux-kernel@...r.kernel.org,
	linux-fsdevel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH 1/4] mm: declare VMA flags by bit

On Wed, Oct 29, 2025 at 05:49:35PM +0000, Lorenzo Stoakes wrote:
> We declare a sparse-bitwise type vma_flag_t which ensures that users can't
> pass around invalid VMA flags by accident and prepares for future work
> towards VMA flags being a bitmap where we want to ensure bit values are
> type safe.

Does sparse attach the type to the enum item? Normal C says the enum
item's type is always 'int' if the value fits in int..

And I'm not sure bitwise rules work quite the way you'd like for this
enum, it was ment for things that are |'d..

I have seen an agressively abuse-resistent technique before, I don't
really recommend it, but FYI:

struct vma_bits {
  u8 VMA_READ_BIT;
  u8 VMA_WRITE_BIT;
  ..
};
#define VMA_BIT(bit_name) BIT(offsetof(struct vma_bits, bit_name))

> Finally, we have to update some rather silly if-deffery found in
> mm/task_mmu.c which would otherwise break.
> 
> Additionally, update the VMA userland testing vma_internal.h header to
> include these changes.
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> ---
>  fs/proc/task_mmu.c               |   4 +-
>  include/linux/mm.h               | 286 +++++++++++++++++---------
>  tools/testing/vma/vma_internal.h | 341 +++++++++++++++++++++++++++----

Maybe take the moment to put them in some vma_flags.h and then can
that be included from tools/testing to avoid this copying??

> +/**
> + * vma_flag_t - specifies an individual VMA flag by bit number.
> + *
> + * This value is made type safe by sparse to avoid passing invalid flag values
> + * around.
> + */
> +typedef int __bitwise vma_flag_t;
> +
> +enum {
> +	/* currently active flags */
> +	VMA_READ_BIT = (__force vma_flag_t)0,
> +	VMA_WRITE_BIT = (__force vma_flag_t)1,
> +	VMA_EXEC_BIT = (__force vma_flag_t)2,
> +	VMA_SHARED_BIT = (__force vma_flag_t)3,
> +
> +	/* mprotect() hardcodes VM_MAYREAD >> 4 == VM_READ, and so for r/w/x bits. */
> +	VMA_MAYREAD_BIT = (__force vma_flag_t)4, /* limits for mprotect() etc */
> +	VMA_MAYWRITE_BIT = (__force vma_flag_t)5,
> +	VMA_MAYEXEC_BIT = (__force vma_flag_t)6,
> +	VMA_MAYSHARE_BIT = (__force vma_flag_t)7,
> +
> +	VMA_GROWSDOWN_BIT = (__force vma_flag_t)8, /* general info on the segment */
> +#ifdef CONFIG_MMU
> +	VMA_UFFD_MISSING_BIT = (__force vma_flag_t)9, /* missing pages tracking */
> +#else
> +	/* nommu: R/O MAP_PRIVATE mapping that might overlay a file mapping */
> +	VMA_MAYOVERLAY_BIT = (__force vma_flag_t)9,
> +#endif
> +	/* Page-ranges managed without "struct page", just pure PFN */
> +	VMA_PFNMAP_BIT = (__force vma_flag_t)10,
> +
> +	VMA_MAYBE_GUARD_BIT = (__force vma_flag_t)11,
> +
> +	VMA_UFFD_WP_BIT = (__force vma_flag_t)12, /* wrprotect pages tracking */
> +
> +	VMA_LOCKED_BIT = (__force vma_flag_t)13,
> +	VMA_IO_BIT = (__force vma_flag_t)14, /* Memory mapped I/O or similar */
> +
> +	/* Used by madvise() */
> +	VMA_SEQ_READ_BIT = (__force vma_flag_t)15, /* App will access data sequentially */
> +	VMA_RAND_READ_BIT = (__force vma_flag_t)16, /* App will not benefit from clustered reads */
> +
> +	VMA_DONTCOPY_BIT = (__force vma_flag_t)17, /* Do not copy this vma on fork */
> +	VMA_DONTEXPAND_BIT = (__force vma_flag_t)18, /* Cannot expand with mremap() */
> +	VMA_LOCKONFAULT_BIT = (__force vma_flag_t)19, /* Lock pages covered when faulted in */
> +	VMA_ACCOUNT_BIT = (__force vma_flag_t)20, /* Is a VM accounted object */
> +	VMA_NORESERVE_BIT = (__force vma_flag_t)21, /* should the VM suppress accounting */
> +	VMA_HUGETLB_BIT = (__force vma_flag_t)22, /* Huge TLB Page VM */
> +	VMA_SYNC_BIT = (__force vma_flag_t)23, /* Synchronous page faults */
> +	VMA_ARCH_1_BIT = (__force vma_flag_t)24, /* Architecture-specific flag */
> +	VMA_WIPEONFORK_BIT = (__force vma_flag_t)25, /* Wipe VMA contents in child. */
> +	VMA_DONTDUMP_BIT = (__force vma_flag_t)26, /* Do not include in the core dump */
> +
> +#ifdef CONFIG_MEM_SOFT_DIRTY
> +	VMA_SOFTDIRTY_BIT = (__force vma_flag_t)27, /* Not soft dirty clean area */
> +#endif
> +
> +	VMA_MIXEDMAP_BIT = (__force vma_flag_t)28, /* Can contain struct page and pure PFN pages */
> +	VMA_HUGEPAGE_BIT = (__force vma_flag_t)29, /* MADV_HUGEPAGE marked this vma */
> +	VMA_NOHUGEPAGE_BIT = (__force vma_flag_t)30, /* MADV_NOHUGEPAGE marked this vma */
> +	VMA_MERGEABLE_BIT = (__force vma_flag_t)31, /* KSM may merge identical pages */
> +
> +#ifdef CONFIG_64BIT
> +	/* These bits are reused, we define specific uses below. */
> +#ifdef CONFIG_ARCH_USES_HIGH_VMA_FLAGS
> +	VMA_HIGH_ARCH_0_BIT = (__force vma_flag_t)32,
> +	VMA_HIGH_ARCH_1_BIT = (__force vma_flag_t)33,
> +	VMA_HIGH_ARCH_2_BIT = (__force vma_flag_t)34,
> +	VMA_HIGH_ARCH_3_BIT = (__force vma_flag_t)35,
> +	VMA_HIGH_ARCH_4_BIT = (__force vma_flag_t)36,
> +	VMA_HIGH_ARCH_5_BIT = (__force vma_flag_t)37,
> +	VMA_HIGH_ARCH_6_BIT = (__force vma_flag_t)38,
> +#endif
> +
> +	VMA_ALLOW_ANY_UNCACHED_BIT = (__force vma_flag_t)39,
> +	VMA_DROPPABLE_BIT = (__force vma_flag_t)40,
> +
> +#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR
> +	VMA_UFFD_MINOR_BIT = (__force vma_flag_t)41,
> +#endif
> +
> +	VMA_SEALED_BIT = (__force vma_flag_t)42,
> +#endif /* CONFIG_64BIT */
> +};
> +
> +#define VMA_BIT(bit)	BIT((__force int)bit)

> -/* mprotect() hardcodes VM_MAYREAD >> 4 == VM_READ, and so for r/w/x bits. */
> -#define VM_MAYREAD	0x00000010	/* limits for mprotect() etc */
> -#define VM_MAYWRITE	0x00000020
> -#define VM_MAYEXEC	0x00000040
> -#define VM_MAYSHARE	0x00000080
> +#define VM_MAYREAD	VMA_BIT(VMA_MAYREAD_BIT)
> +#define VM_MAYWRITE	VMA_BIT(VMA_MAYWRITE_BIT)
> +#define VM_MAYEXEC	VMA_BIT(VMA_MAYEXEC_BIT)
> +#define VM_MAYSHARE	VMA_BIT(VMA_MAYSHARE_BIT)

I suggest removing some of this duplication..

#define DECLARE_VMA_BIT(name, bitno) \
    NAME ## _BIT = (__force vma_flag_t)bitno,
    NAME = BIT(bitno),

enum {
   DECLARE_VMA_BIT(VMA_READ, 0),
}

Especially since the #defines and enum need to have matching #ifdefs.

It is OK to abuse the enum like the above, C won't get mad and works
better in gdb/clangd.

Later you can have a variation of the macro for your first sytem
word/second system word idea.

Otherwise I think this is a great thing to do, thanks!

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ