lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250520104138.2734372-9-ardb+git@google.com>
Date: Tue, 20 May 2025 12:41:39 +0200
From: Ard Biesheuvel <ardb+git@...gle.com>
To: linux-kernel@...r.kernel.org
Cc: x86@...nel.org, Ard Biesheuvel <ardb@...nel.org>, Ingo Molnar <mingo@...nel.org>, 
	Linus Torvalds <torvalds@...ux-foundation.org>, Brian Gerst <brgerst@...il.com>, 
	"Kirill A. Shutemov" <kirill@...temov.name>, Borislav Petkov <bp@...en8.de>
Subject: [PATCH v5 0/7] x86: Robustify pgtable_l5_enabled()

From: Ard Biesheuvel <ardb@...nel.org>

This is a follow-up to the discussion at [0], broken out of that series
so we can progress while the SEV changes are being reviewed and tested.

The current implementation of pgtable_l5_enabled() is problematic
because it has two implementations, and source files need to opt into
the correct one if they contain code that might be called very early.
Other related global pseudo-constants exist that assume different values
based on the number of paging levels, and it is hard to reason about
whether or not all memory mapping and page table code is guaranteed to
observe consistent values of all of these at all times during the boot.
Case in point: currently, KASAN needs to be disabled during alternatives
patching because otherwise, it will reliably produce false positive
reports due to such inconsistencies.

This revision of the series still provides a single implementation of
pgtable_l5_enabled(), but no longer based on cpu_feature_enabled(), for
a number of reasons:
- fiddling with the early CPU feature detection code is not risk-free,
  and may cause regressions that are difficult to debug;
- Boris objected to the use of a separate capability flag, and using the
  existing one is trickier, as it gets set and cleared during the boot
  by the feature detection code a couple of times, even if 5-level
  paging is not in use
- by their very nature, manipulations of level 4 and level 5 page
  tables occur rarely compared to lower levels, so it is not obvious
  that the code patching in cpu_feature_enabled() is needed.

So instead, collapse the various 5-level paging related global variables
into a single byte wide pgdir_shift variable, and move it into the cache
hot per-CPU section where it can be accessed cheaply. Set it from asm
code so C will always see the same value, and derive
pgtable_l5_enabled() and PTRS_PER_P4D from it directly, ensuring that
all these quantities are always mutually consistent.

If pgtable_l5_enabled() requires more optimization, we can consider
alternatives, runtime constants, etc. but whether this is actually
necessary is TBD. Suggestions welcome for (micro-)benchmarks that
illustrate the perf delta.

Build and boot tested using QEMU with LA57 emulation.

Changes since v4:
- Add patch to fix MAX_PHYSMEM_BITS (and drop an occurrence of
  pgtable_l5_enabled())
- Re-order the changes and split across more patches so any potential
  performance hit is bisectable.

Changes since v3:
- Drop asm-offsets patch which has been merged already
- Rebase onto tip/x86/core which now carries some related changes by
  Kirill
- Avoid adding new instances of '#ifdef CONFIG_X86_5LEVEL' where
  possible, as it is going to be removed soon
- Move cap override arrays straight to __ro_after_init
- Drop KVM changes entirely - they were wrong and unnecessary
- Drop the new "la57_hw" capability flag for now - we can always add it
  later if there is a need.

Changes since v2:
- Drop first patch which has been merged
- Rename existing "la57" CPU flag to "la57_hw" and use "la57" to
  indicate that 5 level paging is being used
- Move memset() out of identify_cpu()
- Make set/clear cap override arrays ro_after_init
- Split off asm-offsets update

[0] https://lore.kernel.org/all/20250504095230.2932860-28-ardb+git@google.com/

Cc: Ingo Molnar <mingo@...nel.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Brian Gerst <brgerst@...il.com>
Cc: Kirill A. Shutemov <kirill@...temov.name>
Cc: Borislav Petkov <bp@...en8.de>

Ard Biesheuvel (7):
  x86/mm: Decouple MAX_PHYSMEM_BITS from LA57 state
  x86/mm: Use a single cache hot per-CPU variable to record pgdir_shift
  x86/mm: Define PTRS_PER_P4D in terms of pgdir_shift()
  x86/mm: Derive pgtable_l5_enabled() from pgdir_shift()
  x86/boot: Drop USE_EARLY_PGTABLE_L5 definitions
  x86/boot: Drop 5-level paging related global variable
  x86/boot: Remove KASAN workaround for 4/5 level paging switch

 arch/x86/boot/compressed/misc.h         |  8 +++---
 arch/x86/boot/compressed/pgtable_64.c   | 10 --------
 arch/x86/boot/startup/map_kernel.c      | 18 +------------
 arch/x86/boot/startup/sme.c             |  9 -------
 arch/x86/include/asm/page_64_types.h    |  2 +-
 arch/x86/include/asm/pgtable_64_types.h | 27 ++++++++------------
 arch/x86/include/asm/sparsemem.h        |  2 +-
 arch/x86/kernel/alternative.c           | 12 ---------
 arch/x86/kernel/cpu/common.c            |  3 ---
 arch/x86/kernel/head64.c                |  9 -------
 arch/x86/kernel/head_64.S               |  5 ++++
 arch/x86/mm/kasan_init_64.c             |  3 ---
 arch/x86/mm/pgtable.c                   |  4 +++
 13 files changed, 26 insertions(+), 86 deletions(-)


base-commit: 54c2c688cd9305bdbab4883b9da6ff63f4deca5d
-- 
2.49.0.1101.gccaa498523-goog


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ