lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250313-asi-page-alloc-v1-0-04972e046cea@google.com>
Date: Thu, 13 Mar 2025 18:11:19 +0000
From: Brendan Jackman <jackmanb@...gle.com>
To: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, 
	Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org, 
	Andrew Morton <akpm@...ux-foundation.org>, David Rientjes <rientjes@...gle.com>, 
	Vlastimil Babka <vbabka@...e.cz>, David Hildenbrand <david@...hat.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org, 
	Mike Rapoport <rppt@...nel.org>, Junaid Shahid <junaids@...gle.com>, Reiji Watanabe <reijiw@...gle.com>, 
	Patrick Bellasi <derkling@...gle.com>, Brendan Jackman <jackmanb@...gle.com>, 
	Yosry Ahmed <yosry.ahmed@...ux.dev>
Subject: [PATCH RFC 00/11] mm: ASI integration for the page allocator

.:: Intro

This code illustrates the idea I'm proposing at LSF/MM/BPF [0].
Sorry it's so close to the conference, I was initially quite ambitious
in what I wanted to show here and tried to implement a more complete
patch series. Now I've run out of time and I've had to reduce the scope
and just hack some minimal stuff together. Now, this series is _only_
supposed to be about page_alloc.c, everything else is just there as
scaffolding so that allocator code can be discussed.

I've marked the most incomplete patches with [HACKS] in the title to
illustrate what aspects are less worthy of attention.

See [0] and also [1] for broader context on the ASI/page_alloc topic.
See [2] for context about ASI itself. For this RFC the most important
fact is: ASI requires creating another kernel address space (the
"restricted address space") that is a subset of that normal one (i.e.
the "unrestricted address space"). That is, an address space just like
the normal one, but with holes in it. Pages that are unmapped from the
restricted address space are called "sensitive".

.:: The Idea

What is sensitive (i.e.  where the holes are) is decided at allocation
time. This illustrates an initial implementation of that capability for
the direct map. The basic idea of this implementation is to operate at
pageblock-granularity, and use migratetypes to track sensitivity.  The
key advantages of this approach are:

- Migratetypes exist to avoid fragmentation. Using them to index pages
  by sensitivity takes advantage of this, so that the physmap doesn't
  get fragmented with respect to sensitivity. This means we can use
  large TLB entries for the restricted physmap.

- Since pageblocks are never smaller than a PMD mapping, if the
  restricted physmap is always made of PMDs, we never have to break down
  mappings while changing sensitivity. This means we don't have
  difficulties with needing to allocate pagetables in the middle of the
  allocator.

- Migratetypes already offer indexing capability - that is, there are
  separate freelists for each migratetype. This means when the user
  allocates a page with a given sensitivity, all the infrastructure is
  already in place to look up a page that is already mapped/unmapped as
  needed (if it exists). This minimizes unnecessary TLB flushes.

This differs from Mike Rapoport's work on __GFP_UNMAPPED [3] in that,
instead of having a totally separate free area for the pages that are
unmapped, it aims to pervade the allocator. If it turns out that for all
nonsensitive (or all sensitive, which seems highly unlikely) pages, a
access to the full feature set of the page allocator is not needed for a
performant system, we could certainly do something like Mike's patchset.
But we don't have any reason to expect a correlation between
sensitivity and performance needs.

.:: Patchset overview

- Patch 1 adds a minimal subset of the base ASI framework that was
  introduced by the RFCv2 [2].

- Patches 2-5 add the necessary framework for creating and manipulating
  the ASI physmap. This is the area where I have had to reduce the scope
  of this series, I had hoped to present a proper integration here. But
  instead I've had to just hack something together that kinda works.
  You can probably skip over this section.

- Patches 6-8 are preparatory hacks and changes to the generic mm code.

- Patches 9-11 are the important bit. The new migratetypes are created.
  Then logic is added to create nonsensitive pageblocks when needed.
  Then logic is added to change them back to sensitive pageblocks when
  needed.

.:: TODOs

 - This doesn't let you allocate from MIGRATE_HIGHATOMIC pageblocks
   unless you have __GFP_SENSITIVE. We probably need to make the
   pageblock type and per-freelist logic more advanced to be able to
   account for this.

 - When pages transition from sensitive to nonsensitive, they need to be
   zeroed to prevent any leftover data being leaked. This series doesn't
   address that requirement at all.

 - Although I think the abstract design is OK, the actual implementation
   of calling asi_map()/asi_unmap() from page_alloc.c is pretty
   confusing: asi_map() is implicit when calling
   set_pageblock_migratetype() but asi_unmap() is up to the caller. This
   requires some refactoring.

 - Changes to the unrestricted physmap (page protection changes, memory
   hotplug) are not properly mirrored into the restricted physmap.

 - There's no integration with CMA. The branch at [4] has some minimal
   integration into alloc_contig_range().

.:: References

[0] https://lore.kernel.org/linux-mm/CA+i-1C169s8pyqZDx+iSnFmftmGfssdQA29+pYm-gqySAYWgpg@mail.gmail.com/
[1] Some slides I presented in an earlier discussion of this topic:
    https://docs.google.com/presentation/d/1Ozuan7E4z2YWm4V6uE_fe7YoF2BdS3m5jXjDKO7DVy0/edit#slide=id.g32d28ea451a_0_43
[2] https://lore.kernel.org/linux-mm/20250110-asi-rfc-v2-v2-0-8419288bc805@google.com/
[3] https://lore.kernel.org/all/20230308094106.227365-1-rppt@kernel.org/
[5] https://lore.kernel.org/linux-mm/20250129144320.2675822-1-jackmanb@google.com/

This series is available as a branch with some additional testing here:

[4] https://github.com/bjackman/linux/tree/asi/page-alloc-lsfmmbpf25

This applies to mm-unstable.

Signed-off-by: Brendan Jackman <jackmanb@...gle.com>
---
Brendan Jackman (11):
      x86/mm: Bare minimum ASI API for page_alloc integration
      x86/mm: Factor out phys_pgd_init()
      x86/mm: Add lookup_pgtable_in_pgd()
      x86/mm/asi: Sync physmap into ASI_GLOBAL_NONSENSITIVE
      [RFC HACKS] Add asi_map() and asi_unmap()
      mm/page_alloc: Add __GFP_SENSITIVE and always set it
      [RFC HACKS] mm/slub: Set __GFP_SENSITIVE for reclaimable slabs
      [RFC HACKS] mm/page_alloc: Simplify gfp_migratetype()
      mm/page_alloc: Split MIGRATE_UNMOVABLE by sensitivity
      mm/page_alloc: Add support for nonsensitive allocations
      mm/page_alloc: Add support for ASI-unmapping pages

 arch/Kconfig                         |  14 ++++
 arch/x86/Kconfig                     |   1 +
 arch/x86/include/asm/asi.h           |  36 ++++++++
 arch/x86/include/asm/pgtable_types.h |   2 +
 arch/x86/mm/Makefile                 |   1 +
 arch/x86/mm/asi.c                    |  85 +++++++++++++++++++
 arch/x86/mm/init.c                   |   3 +-
 arch/x86/mm/init_64.c                |  53 ++++++++++--
 arch/x86/mm/pat/set_memory.c         |  34 ++++++++
 include/linux/asi.h                  |  20 +++++
 include/linux/gfp.h                  |  30 ++++---
 include/linux/gfp_types.h            |  15 +++-
 include/linux/mmzone.h               |  19 ++++-
 include/linux/vmalloc.h              |   4 +
 mm/internal.h                        |   5 ++
 mm/memory_hotplug.c                  |   2 +-
 mm/page_alloc.c                      | 158 +++++++++++++++++++++++++++++++----
 mm/show_mem.c                        |  13 +--
 mm/slub.c                            |   6 +-
 mm/vmalloc.c                         |  32 ++++---
 20 files changed, 475 insertions(+), 58 deletions(-)
---
base-commit: 5ee93e1a769230377c3d44edd4917e8df77be566
change-id: 20250310-asi-page-alloc-80ea1f8307d0

Best regards,
-- 
Brendan Jackman <jackmanb@...gle.com>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ