[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <cover.1755012943.git.lorenzo.stoakes@oracle.com>
Date: Tue, 12 Aug 2025 16:44:09 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Alexander Gordeev <agordeev@...ux.ibm.com>,
Gerald Schaefer <gerald.schaefer@...ux.ibm.com>,
Heiko Carstens <hca@...ux.ibm.com>, Vasily Gorbik <gor@...ux.ibm.com>,
Christian Borntraeger <borntraeger@...ux.ibm.com>,
Sven Schnelle <svens@...ux.ibm.com>,
"David S . Miller" <davem@...emloft.net>,
Andreas Larsson <andreas@...sler.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Andy Lutomirski <luto@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>, "H . Peter Anvin" <hpa@...or.com>,
Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
Kees Cook <kees@...nel.org>, David Hildenbrand <david@...hat.com>,
Zi Yan <ziy@...dia.com>, Baolin Wang <baolin.wang@...ux.alibaba.com>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
Nico Pache <npache@...hat.com>, Ryan Roberts <ryan.roberts@....com>,
Dev Jain <dev.jain@....com>, Barry Song <baohua@...nel.org>,
Xu Xin <xu.xin16@....com.cn>,
Chengming Zhou <chengming.zhou@...ux.dev>,
Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
David Rientjes <rientjes@...gle.com>,
Shakeel Butt <shakeel.butt@...ux.dev>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Namhyung Kim <namhyung@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>, Ian Rogers <irogers@...gle.com>,
Adrian Hunter <adrian.hunter@...el.com>,
Kan Liang <kan.liang@...ux.intel.com>,
Masami Hiramatsu <mhiramat@...nel.org>,
Oleg Nesterov <oleg@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
Jason Gunthorpe <jgg@...pe.ca>, John Hubbard <jhubbard@...dia.com>,
Peter Xu <peterx@...hat.com>, Jann Horn <jannh@...gle.com>,
Pedro Falcato <pfalcato@...e.de>, Matthew Wilcox <willy@...radead.org>,
Mateusz Guzik <mjguzik@...il.com>, linux-s390@...r.kernel.org,
linux-kernel@...r.kernel.org, sparclinux@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
linux-trace-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org
Subject: [PATCH 00/10] mm: make mm->flags a bitmap and 64-bit on all arches
We are currently in the bizarre situation where we are constrained on the
number of flags we can set in an mm_struct based on whether this is a
32-bit or 64-bit kernel.
This is because mm->flags is an unsigned long field, which is 32-bits on a
32-bit system and 64-bits on a 64-bit system.
In order to keep things functional across both architectures, we do not
permit mm flag bits to be set above flag 31 (i.e. the 32nd bit).
This is a silly situation, especially given how profligate we are in
storing metadata in mm_struct, so let's convert mm->flags into a bitmap and
allow ourselves as many bits as we like.
In order to execute this change, we introduce a new opaque type -
mm_flags_t - which wraps a bitmap.
We go further and mark the bitmap field __private, which forces users to
have to use accessors, which allows us to enforce atomicity rules around
mm->flags (except on those occasions they are not required - fork, etc.)
and makes it far easier to keep track of how mm flags are being utilised.
In order to implement this change sensibly and an an iterative way, we
start by introducing the type with the same bitsize as the current mm flags
(system word size) and place it in union with mm->flags.
We are then able to gradually update users as we go without being forced to
do everything in a single patch.
In the course of working on this series I noticed the MMF_* flag masks
encounter a sign extension bug that, due to the 32-bit limit on mm->flags
thus far, has not caused any issues in practice, but required fixing for
this series.
We must make special dispensation for two cases - coredump and
initailisation on fork, but of which use masks extensively.
Since coredump flags are set in stone, we can safely assume they will
remain in the first 32-bits of the flags. We therefore provide special
non-atomic accessors for this case that access the first system word of
flags, keeping everything there essentially the same.
For mm->flags initialisation on fork, we adjust the logic to ensure all
bits are cleared correctly, and then adjust the existing intialisation
logic, dubbing the implementation utilising flags as legacy.
This means we get the same fast operations as we do now, but in future we
can also choose to update the forking logic to additionally propagate flags
beyond 32-bits across fork.
With this change in place we can, in future, decide to have as many bits as
we please.
Since the size of the bitmap will scale in system word multiples, there
should be no issues with changes in alignment in mm_struct. Additionally,
the really sensitive field (mmap_lock) is located prior to the flags field
so this should have no impact on that either.
Lorenzo Stoakes (10):
mm: add bitmap mm->flags field
mm: convert core mm to mm_flags_*() accessors
mm: convert prctl to mm_flags_*() accessors
mm: convert arch-specific code to mm_flags_*() accessors
mm: convert uprobes to mm_flags_*() accessors
mm: update coredump logic to correctly use bitmap mm flags
mm: correct sign-extension issue in MMF_* flag masks
mm: update fork mm->flags initialisation to use bitmap
mm: convert remaining users to mm_flags_*() accessors
mm: replace mm->flags with bitmap entirely and set to 64 bits
arch/s390/mm/mmap.c | 4 +-
arch/sparc/kernel/sys_sparc_64.c | 4 +-
arch/x86/mm/mmap.c | 4 +-
fs/coredump.c | 4 +-
fs/exec.c | 2 +-
fs/pidfs.c | 7 +++-
fs/proc/array.c | 2 +-
fs/proc/base.c | 12 +++---
fs/proc/task_mmu.c | 2 +-
include/linux/huge_mm.h | 2 +-
include/linux/khugepaged.h | 6 ++-
include/linux/ksm.h | 6 +--
include/linux/mm.h | 34 +++++++++++++++-
include/linux/mm_types.h | 67 +++++++++++++++++++++++++-------
include/linux/mman.h | 2 +-
include/linux/oom.h | 2 +-
include/linux/sched/coredump.h | 21 +++++++++-
kernel/events/uprobes.c | 32 +++++++--------
kernel/fork.c | 9 +++--
kernel/sys.c | 16 ++++----
mm/debug.c | 4 +-
mm/gup.c | 10 ++---
mm/huge_memory.c | 8 ++--
mm/khugepaged.c | 10 ++---
mm/ksm.c | 32 +++++++--------
mm/mmap.c | 8 ++--
mm/oom_kill.c | 26 ++++++-------
mm/util.c | 6 +--
tools/testing/vma/vma_internal.h | 19 ++++++++-
29 files changed, 239 insertions(+), 122 deletions(-)
--
2.50.1
Powered by blists - more mailing lists