[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250313-asi-page-alloc-v1-0-04972e046cea@google.com>
Date: Thu, 13 Mar 2025 18:11:19 +0000
From: Brendan Jackman <jackmanb@...gle.com>
To: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
Andrew Morton <akpm@...ux-foundation.org>, David Rientjes <rientjes@...gle.com>,
Vlastimil Babka <vbabka@...e.cz>, David Hildenbrand <david@...hat.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
Mike Rapoport <rppt@...nel.org>, Junaid Shahid <junaids@...gle.com>, Reiji Watanabe <reijiw@...gle.com>,
Patrick Bellasi <derkling@...gle.com>, Brendan Jackman <jackmanb@...gle.com>,
Yosry Ahmed <yosry.ahmed@...ux.dev>
Subject: [PATCH RFC 00/11] mm: ASI integration for the page allocator
.:: Intro
This code illustrates the idea I'm proposing at LSF/MM/BPF [0].
Sorry it's so close to the conference, I was initially quite ambitious
in what I wanted to show here and tried to implement a more complete
patch series. Now I've run out of time and I've had to reduce the scope
and just hack some minimal stuff together. Now, this series is _only_
supposed to be about page_alloc.c, everything else is just there as
scaffolding so that allocator code can be discussed.
I've marked the most incomplete patches with [HACKS] in the title to
illustrate what aspects are less worthy of attention.
See [0] and also [1] for broader context on the ASI/page_alloc topic.
See [2] for context about ASI itself. For this RFC the most important
fact is: ASI requires creating another kernel address space (the
"restricted address space") that is a subset of that normal one (i.e.
the "unrestricted address space"). That is, an address space just like
the normal one, but with holes in it. Pages that are unmapped from the
restricted address space are called "sensitive".
.:: The Idea
What is sensitive (i.e. where the holes are) is decided at allocation
time. This illustrates an initial implementation of that capability for
the direct map. The basic idea of this implementation is to operate at
pageblock-granularity, and use migratetypes to track sensitivity. The
key advantages of this approach are:
- Migratetypes exist to avoid fragmentation. Using them to index pages
by sensitivity takes advantage of this, so that the physmap doesn't
get fragmented with respect to sensitivity. This means we can use
large TLB entries for the restricted physmap.
- Since pageblocks are never smaller than a PMD mapping, if the
restricted physmap is always made of PMDs, we never have to break down
mappings while changing sensitivity. This means we don't have
difficulties with needing to allocate pagetables in the middle of the
allocator.
- Migratetypes already offer indexing capability - that is, there are
separate freelists for each migratetype. This means when the user
allocates a page with a given sensitivity, all the infrastructure is
already in place to look up a page that is already mapped/unmapped as
needed (if it exists). This minimizes unnecessary TLB flushes.
This differs from Mike Rapoport's work on __GFP_UNMAPPED [3] in that,
instead of having a totally separate free area for the pages that are
unmapped, it aims to pervade the allocator. If it turns out that for all
nonsensitive (or all sensitive, which seems highly unlikely) pages, a
access to the full feature set of the page allocator is not needed for a
performant system, we could certainly do something like Mike's patchset.
But we don't have any reason to expect a correlation between
sensitivity and performance needs.
.:: Patchset overview
- Patch 1 adds a minimal subset of the base ASI framework that was
introduced by the RFCv2 [2].
- Patches 2-5 add the necessary framework for creating and manipulating
the ASI physmap. This is the area where I have had to reduce the scope
of this series, I had hoped to present a proper integration here. But
instead I've had to just hack something together that kinda works.
You can probably skip over this section.
- Patches 6-8 are preparatory hacks and changes to the generic mm code.
- Patches 9-11 are the important bit. The new migratetypes are created.
Then logic is added to create nonsensitive pageblocks when needed.
Then logic is added to change them back to sensitive pageblocks when
needed.
.:: TODOs
- This doesn't let you allocate from MIGRATE_HIGHATOMIC pageblocks
unless you have __GFP_SENSITIVE. We probably need to make the
pageblock type and per-freelist logic more advanced to be able to
account for this.
- When pages transition from sensitive to nonsensitive, they need to be
zeroed to prevent any leftover data being leaked. This series doesn't
address that requirement at all.
- Although I think the abstract design is OK, the actual implementation
of calling asi_map()/asi_unmap() from page_alloc.c is pretty
confusing: asi_map() is implicit when calling
set_pageblock_migratetype() but asi_unmap() is up to the caller. This
requires some refactoring.
- Changes to the unrestricted physmap (page protection changes, memory
hotplug) are not properly mirrored into the restricted physmap.
- There's no integration with CMA. The branch at [4] has some minimal
integration into alloc_contig_range().
.:: References
[0] https://lore.kernel.org/linux-mm/CA+i-1C169s8pyqZDx+iSnFmftmGfssdQA29+pYm-gqySAYWgpg@mail.gmail.com/
[1] Some slides I presented in an earlier discussion of this topic:
https://docs.google.com/presentation/d/1Ozuan7E4z2YWm4V6uE_fe7YoF2BdS3m5jXjDKO7DVy0/edit#slide=id.g32d28ea451a_0_43
[2] https://lore.kernel.org/linux-mm/20250110-asi-rfc-v2-v2-0-8419288bc805@google.com/
[3] https://lore.kernel.org/all/20230308094106.227365-1-rppt@kernel.org/
[5] https://lore.kernel.org/linux-mm/20250129144320.2675822-1-jackmanb@google.com/
This series is available as a branch with some additional testing here:
[4] https://github.com/bjackman/linux/tree/asi/page-alloc-lsfmmbpf25
This applies to mm-unstable.
Signed-off-by: Brendan Jackman <jackmanb@...gle.com>
---
Brendan Jackman (11):
x86/mm: Bare minimum ASI API for page_alloc integration
x86/mm: Factor out phys_pgd_init()
x86/mm: Add lookup_pgtable_in_pgd()
x86/mm/asi: Sync physmap into ASI_GLOBAL_NONSENSITIVE
[RFC HACKS] Add asi_map() and asi_unmap()
mm/page_alloc: Add __GFP_SENSITIVE and always set it
[RFC HACKS] mm/slub: Set __GFP_SENSITIVE for reclaimable slabs
[RFC HACKS] mm/page_alloc: Simplify gfp_migratetype()
mm/page_alloc: Split MIGRATE_UNMOVABLE by sensitivity
mm/page_alloc: Add support for nonsensitive allocations
mm/page_alloc: Add support for ASI-unmapping pages
arch/Kconfig | 14 ++++
arch/x86/Kconfig | 1 +
arch/x86/include/asm/asi.h | 36 ++++++++
arch/x86/include/asm/pgtable_types.h | 2 +
arch/x86/mm/Makefile | 1 +
arch/x86/mm/asi.c | 85 +++++++++++++++++++
arch/x86/mm/init.c | 3 +-
arch/x86/mm/init_64.c | 53 ++++++++++--
arch/x86/mm/pat/set_memory.c | 34 ++++++++
include/linux/asi.h | 20 +++++
include/linux/gfp.h | 30 ++++---
include/linux/gfp_types.h | 15 +++-
include/linux/mmzone.h | 19 ++++-
include/linux/vmalloc.h | 4 +
mm/internal.h | 5 ++
mm/memory_hotplug.c | 2 +-
mm/page_alloc.c | 158 +++++++++++++++++++++++++++++++----
mm/show_mem.c | 13 +--
mm/slub.c | 6 +-
mm/vmalloc.c | 32 ++++---
20 files changed, 475 insertions(+), 58 deletions(-)
---
base-commit: 5ee93e1a769230377c3d44edd4917e8df77be566
change-id: 20250310-asi-page-alloc-80ea1f8307d0
Best regards,
--
Brendan Jackman <jackmanb@...gle.com>
Powered by blists - more mailing lists