lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f1d67c7b-5e08-43b3-b98c-8a35a5095052@lucifer.local>
Date: Thu, 30 Oct 2025 09:07:19 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
        Muchun Song <muchun.song@...ux.dev>,
        Oscar Salvador <osalvador@...e.de>,
        David Hildenbrand <david@...hat.com>,
        "Liam R . Howlett" <Liam.Howlett@...cle.com>,
        Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
        Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        Yuanchu Xie <yuanchu@...gle.com>, Wei Xu <weixugc@...gle.com>,
        Peter Xu <peterx@...hat.com>, Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
        Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
        Kees Cook <kees@...nel.org>, Matthew Wilcox <willy@...radead.org>,
        John Hubbard <jhubbard@...dia.com>, Leon Romanovsky <leon@...nel.org>,
        Zi Yan <ziy@...dia.com>, Baolin Wang <baolin.wang@...ux.alibaba.com>,
        Nico Pache <npache@...hat.com>, Ryan Roberts <ryan.roberts@....com>,
        Dev Jain <dev.jain@....com>, Barry Song <baohua@...nel.org>,
        Lance Yang <lance.yang@...ux.dev>, Xu Xin <xu.xin16@....com.cn>,
        Chengming Zhou <chengming.zhou@...ux.dev>,
        Jann Horn <jannh@...gle.com>, Matthew Brost <matthew.brost@...el.com>,
        Joshua Hahn <joshua.hahnjy@...il.com>, Rakie Kim <rakie.kim@...com>,
        Byungchul Park <byungchul@...com>, Gregory Price <gourry@...rry.net>,
        Ying Huang <ying.huang@...ux.alibaba.com>,
        Alistair Popple <apopple@...dia.com>, Pedro Falcato <pfalcato@...e.de>,
        Shakeel Butt <shakeel.butt@...ux.dev>,
        David Rientjes <rientjes@...gle.com>, Rik van Riel <riel@...riel.com>,
        Harry Yoo <harry.yoo@...cle.com>,
        Kemeng Shi <shikemeng@...weicloud.com>,
        Kairui Song <kasong@...cent.com>, Nhat Pham <nphamcs@...il.com>,
        Baoquan He <bhe@...hat.com>, Chris Li <chrisl@...nel.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Qi Zheng <zhengqi.arch@...edance.com>, linux-kernel@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH 1/4] mm: declare VMA flags by bit

On Wed, Oct 29, 2025 at 04:02:28PM -0300, Jason Gunthorpe wrote:
> On Wed, Oct 29, 2025 at 05:49:35PM +0000, Lorenzo Stoakes wrote:
> > We declare a sparse-bitwise type vma_flag_t which ensures that users can't
> > pass around invalid VMA flags by accident and prepares for future work
> > towards VMA flags being a bitmap where we want to ensure bit values are
> > type safe.
>
> Does sparse attach the type to the enum item? Normal C says the enum
> item's type is always 'int' if the value fits in int..

It does, have tested this, not sure if due to sparse doing extra work to
make that happen or GNU C doing more there.

You can see an anon enum being used for this in the examples in the sparse
docs for instance (see [0]) so it's kind of a 'thing' it seems.

I also tested this to make sure, when intentionally passing some non-flag
value to the functions which accept vma_flag_t and it got picked up right
away, checked via:

make C=2 -j $(nproc) 2>&1 | grep vma_flag_t

[0]:https://docs.kernel.org/dev-tools/sparse.html

>
> And I'm not sure bitwise rules work quite the way you'd like for this
> enum, it was ment for things that are |'d..
>
> I have seen an agressively abuse-resistent technique before, I don't
> really recommend it, but FYI:
>
> struct vma_bits {
>   u8 VMA_READ_BIT;
>   u8 VMA_WRITE_BIT;
>   ..
> };
> #define VMA_BIT(bit_name) BIT(offsetof(struct vma_bits, bit_name))

Oh my eyes! :P I mean kinda clever but also lord above :)

I don't think we need this afaict. The idea is to catch accidental
instances of e.g.:

	vma_test(vma, VM_WRITE);

Rather than abuse. Doing the above is _very easy_ and so I wanted to
explicitly have the bots moan if people make this mistake.

If only C had a stronger type system...

>
> > Finally, we have to update some rather silly if-deffery found in
> > mm/task_mmu.c which would otherwise break.
> >
> > Additionally, update the VMA userland testing vma_internal.h header to
> > include these changes.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> > ---
> >  fs/proc/task_mmu.c               |   4 +-
> >  include/linux/mm.h               | 286 +++++++++++++++++---------
> >  tools/testing/vma/vma_internal.h | 341 +++++++++++++++++++++++++++----
>
> Maybe take the moment to put them in some vma_flags.h and then can
> that be included from tools/testing to avoid this copying??

It sucks to have this copy/paste yeah. The problem is to make the VMA
userland testing work, we intentionally isolate vma.h/vma.c dependencies
into vma_internal.h in mm/ and also do the same in the userland component,
so we can #include vma.c/h in the userland code.

So we'd have to have a strict requirement that vma_flags.h doesn't import
any other headers or at least none which aren't substituted somehow in the
tools/include directory.

The issue is people might quite reasonably update include/linux/vma_flags.h
to do more later and then break all of the VMA userland testing...

It's a bit of a delicate thing to keep it all

>
> > +/**
> > + * vma_flag_t - specifies an individual VMA flag by bit number.
> > + *
> > + * This value is made type safe by sparse to avoid passing invalid flag values
> > + * around.
> > + */
> > +typedef int __bitwise vma_flag_t;
> > +
> > +enum {
> > +	/* currently active flags */
> > +	VMA_READ_BIT = (__force vma_flag_t)0,
> > +	VMA_WRITE_BIT = (__force vma_flag_t)1,
> > +	VMA_EXEC_BIT = (__force vma_flag_t)2,
> > +	VMA_SHARED_BIT = (__force vma_flag_t)3,
> > +
> > +	/* mprotect() hardcodes VM_MAYREAD >> 4 == VM_READ, and so for r/w/x bits. */
> > +	VMA_MAYREAD_BIT = (__force vma_flag_t)4, /* limits for mprotect() etc */
> > +	VMA_MAYWRITE_BIT = (__force vma_flag_t)5,
> > +	VMA_MAYEXEC_BIT = (__force vma_flag_t)6,
> > +	VMA_MAYSHARE_BIT = (__force vma_flag_t)7,
> > +
> > +	VMA_GROWSDOWN_BIT = (__force vma_flag_t)8, /* general info on the segment */
> > +#ifdef CONFIG_MMU
> > +	VMA_UFFD_MISSING_BIT = (__force vma_flag_t)9, /* missing pages tracking */
> > +#else
> > +	/* nommu: R/O MAP_PRIVATE mapping that might overlay a file mapping */
> > +	VMA_MAYOVERLAY_BIT = (__force vma_flag_t)9,
> > +#endif
> > +	/* Page-ranges managed without "struct page", just pure PFN */
> > +	VMA_PFNMAP_BIT = (__force vma_flag_t)10,
> > +
> > +	VMA_MAYBE_GUARD_BIT = (__force vma_flag_t)11,
> > +
> > +	VMA_UFFD_WP_BIT = (__force vma_flag_t)12, /* wrprotect pages tracking */
> > +
> > +	VMA_LOCKED_BIT = (__force vma_flag_t)13,
> > +	VMA_IO_BIT = (__force vma_flag_t)14, /* Memory mapped I/O or similar */
> > +
> > +	/* Used by madvise() */
> > +	VMA_SEQ_READ_BIT = (__force vma_flag_t)15, /* App will access data sequentially */
> > +	VMA_RAND_READ_BIT = (__force vma_flag_t)16, /* App will not benefit from clustered reads */
> > +
> > +	VMA_DONTCOPY_BIT = (__force vma_flag_t)17, /* Do not copy this vma on fork */
> > +	VMA_DONTEXPAND_BIT = (__force vma_flag_t)18, /* Cannot expand with mremap() */
> > +	VMA_LOCKONFAULT_BIT = (__force vma_flag_t)19, /* Lock pages covered when faulted in */
> > +	VMA_ACCOUNT_BIT = (__force vma_flag_t)20, /* Is a VM accounted object */
> > +	VMA_NORESERVE_BIT = (__force vma_flag_t)21, /* should the VM suppress accounting */
> > +	VMA_HUGETLB_BIT = (__force vma_flag_t)22, /* Huge TLB Page VM */
> > +	VMA_SYNC_BIT = (__force vma_flag_t)23, /* Synchronous page faults */
> > +	VMA_ARCH_1_BIT = (__force vma_flag_t)24, /* Architecture-specific flag */
> > +	VMA_WIPEONFORK_BIT = (__force vma_flag_t)25, /* Wipe VMA contents in child. */
> > +	VMA_DONTDUMP_BIT = (__force vma_flag_t)26, /* Do not include in the core dump */
> > +
> > +#ifdef CONFIG_MEM_SOFT_DIRTY
> > +	VMA_SOFTDIRTY_BIT = (__force vma_flag_t)27, /* Not soft dirty clean area */
> > +#endif
> > +
> > +	VMA_MIXEDMAP_BIT = (__force vma_flag_t)28, /* Can contain struct page and pure PFN pages */
> > +	VMA_HUGEPAGE_BIT = (__force vma_flag_t)29, /* MADV_HUGEPAGE marked this vma */
> > +	VMA_NOHUGEPAGE_BIT = (__force vma_flag_t)30, /* MADV_NOHUGEPAGE marked this vma */
> > +	VMA_MERGEABLE_BIT = (__force vma_flag_t)31, /* KSM may merge identical pages */
> > +
> > +#ifdef CONFIG_64BIT
> > +	/* These bits are reused, we define specific uses below. */
> > +#ifdef CONFIG_ARCH_USES_HIGH_VMA_FLAGS
> > +	VMA_HIGH_ARCH_0_BIT = (__force vma_flag_t)32,
> > +	VMA_HIGH_ARCH_1_BIT = (__force vma_flag_t)33,
> > +	VMA_HIGH_ARCH_2_BIT = (__force vma_flag_t)34,
> > +	VMA_HIGH_ARCH_3_BIT = (__force vma_flag_t)35,
> > +	VMA_HIGH_ARCH_4_BIT = (__force vma_flag_t)36,
> > +	VMA_HIGH_ARCH_5_BIT = (__force vma_flag_t)37,
> > +	VMA_HIGH_ARCH_6_BIT = (__force vma_flag_t)38,
> > +#endif
> > +
> > +	VMA_ALLOW_ANY_UNCACHED_BIT = (__force vma_flag_t)39,
> > +	VMA_DROPPABLE_BIT = (__force vma_flag_t)40,
> > +
> > +#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR
> > +	VMA_UFFD_MINOR_BIT = (__force vma_flag_t)41,
> > +#endif
> > +
> > +	VMA_SEALED_BIT = (__force vma_flag_t)42,
> > +#endif /* CONFIG_64BIT */
> > +};
> > +
> > +#define VMA_BIT(bit)	BIT((__force int)bit)
>
> > -/* mprotect() hardcodes VM_MAYREAD >> 4 == VM_READ, and so for r/w/x bits. */
> > -#define VM_MAYREAD	0x00000010	/* limits for mprotect() etc */
> > -#define VM_MAYWRITE	0x00000020
> > -#define VM_MAYEXEC	0x00000040
> > -#define VM_MAYSHARE	0x00000080
> > +#define VM_MAYREAD	VMA_BIT(VMA_MAYREAD_BIT)
> > +#define VM_MAYWRITE	VMA_BIT(VMA_MAYWRITE_BIT)
> > +#define VM_MAYEXEC	VMA_BIT(VMA_MAYEXEC_BIT)
> > +#define VM_MAYSHARE	VMA_BIT(VMA_MAYSHARE_BIT)
>
> I suggest removing some of this duplication..
>
> #define DECLARE_VMA_BIT(name, bitno) \
>     NAME ## _BIT = (__force vma_flag_t)bitno,
>     NAME = BIT(bitno),
>
> enum {
>    DECLARE_VMA_BIT(VMA_READ, 0),
> }
>
> Especially since the #defines and enum need to have matching #ifdefs.
>
> It is OK to abuse the enum like the above, C won't get mad and works
> better in gdb/clangd.

I think having the enum anon avoids issues I've been concerned about with
named enum's containing flags when used as parameters yes.

>
> Later you can have a variation of the macro for your first sytem
> word/second system word idea.

Well I think we'd probably want to name the macro accordingly.

DECLARE_VMA_BIT_AND_FLAG() maybe? And mention in the comment that it's for
system word siz

>
> Otherwise I think this is a great thing to do, thanks!

Thanks :)

To give due credit - Matthew suggested this a while ago, I've been working
towards it with the mm flags first as an easier case to tackle.

It came out of my assuming that the VM_MAYBE_GUARD stuff didn't have a flag
free to do this in the 32-bit space. As part of this work it became
apparent I was wrong, so I implemented + sent that series yesterday (doh!)
but this change is still useful as it's beyond silly that we're constrained
like this.

I should actually probably put a Suggested-by for this, didn't even think
to, sorry Matthew! :)

>
> Jason

Cheers, Lorenzo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ