[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250520104138.2734372-9-ardb+git@google.com>
Date: Tue, 20 May 2025 12:41:39 +0200
From: Ard Biesheuvel <ardb+git@...gle.com>
To: linux-kernel@...r.kernel.org
Cc: x86@...nel.org, Ard Biesheuvel <ardb@...nel.org>, Ingo Molnar <mingo@...nel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>, Brian Gerst <brgerst@...il.com>,
"Kirill A. Shutemov" <kirill@...temov.name>, Borislav Petkov <bp@...en8.de>
Subject: [PATCH v5 0/7] x86: Robustify pgtable_l5_enabled()
From: Ard Biesheuvel <ardb@...nel.org>
This is a follow-up to the discussion at [0], broken out of that series
so we can progress while the SEV changes are being reviewed and tested.
The current implementation of pgtable_l5_enabled() is problematic
because it has two implementations, and source files need to opt into
the correct one if they contain code that might be called very early.
Other related global pseudo-constants exist that assume different values
based on the number of paging levels, and it is hard to reason about
whether or not all memory mapping and page table code is guaranteed to
observe consistent values of all of these at all times during the boot.
Case in point: currently, KASAN needs to be disabled during alternatives
patching because otherwise, it will reliably produce false positive
reports due to such inconsistencies.
This revision of the series still provides a single implementation of
pgtable_l5_enabled(), but no longer based on cpu_feature_enabled(), for
a number of reasons:
- fiddling with the early CPU feature detection code is not risk-free,
and may cause regressions that are difficult to debug;
- Boris objected to the use of a separate capability flag, and using the
existing one is trickier, as it gets set and cleared during the boot
by the feature detection code a couple of times, even if 5-level
paging is not in use
- by their very nature, manipulations of level 4 and level 5 page
tables occur rarely compared to lower levels, so it is not obvious
that the code patching in cpu_feature_enabled() is needed.
So instead, collapse the various 5-level paging related global variables
into a single byte wide pgdir_shift variable, and move it into the cache
hot per-CPU section where it can be accessed cheaply. Set it from asm
code so C will always see the same value, and derive
pgtable_l5_enabled() and PTRS_PER_P4D from it directly, ensuring that
all these quantities are always mutually consistent.
If pgtable_l5_enabled() requires more optimization, we can consider
alternatives, runtime constants, etc. but whether this is actually
necessary is TBD. Suggestions welcome for (micro-)benchmarks that
illustrate the perf delta.
Build and boot tested using QEMU with LA57 emulation.
Changes since v4:
- Add patch to fix MAX_PHYSMEM_BITS (and drop an occurrence of
pgtable_l5_enabled())
- Re-order the changes and split across more patches so any potential
performance hit is bisectable.
Changes since v3:
- Drop asm-offsets patch which has been merged already
- Rebase onto tip/x86/core which now carries some related changes by
Kirill
- Avoid adding new instances of '#ifdef CONFIG_X86_5LEVEL' where
possible, as it is going to be removed soon
- Move cap override arrays straight to __ro_after_init
- Drop KVM changes entirely - they were wrong and unnecessary
- Drop the new "la57_hw" capability flag for now - we can always add it
later if there is a need.
Changes since v2:
- Drop first patch which has been merged
- Rename existing "la57" CPU flag to "la57_hw" and use "la57" to
indicate that 5 level paging is being used
- Move memset() out of identify_cpu()
- Make set/clear cap override arrays ro_after_init
- Split off asm-offsets update
[0] https://lore.kernel.org/all/20250504095230.2932860-28-ardb+git@google.com/
Cc: Ingo Molnar <mingo@...nel.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Brian Gerst <brgerst@...il.com>
Cc: Kirill A. Shutemov <kirill@...temov.name>
Cc: Borislav Petkov <bp@...en8.de>
Ard Biesheuvel (7):
x86/mm: Decouple MAX_PHYSMEM_BITS from LA57 state
x86/mm: Use a single cache hot per-CPU variable to record pgdir_shift
x86/mm: Define PTRS_PER_P4D in terms of pgdir_shift()
x86/mm: Derive pgtable_l5_enabled() from pgdir_shift()
x86/boot: Drop USE_EARLY_PGTABLE_L5 definitions
x86/boot: Drop 5-level paging related global variable
x86/boot: Remove KASAN workaround for 4/5 level paging switch
arch/x86/boot/compressed/misc.h | 8 +++---
arch/x86/boot/compressed/pgtable_64.c | 10 --------
arch/x86/boot/startup/map_kernel.c | 18 +------------
arch/x86/boot/startup/sme.c | 9 -------
arch/x86/include/asm/page_64_types.h | 2 +-
arch/x86/include/asm/pgtable_64_types.h | 27 ++++++++------------
arch/x86/include/asm/sparsemem.h | 2 +-
arch/x86/kernel/alternative.c | 12 ---------
arch/x86/kernel/cpu/common.c | 3 ---
arch/x86/kernel/head64.c | 9 -------
arch/x86/kernel/head_64.S | 5 ++++
arch/x86/mm/kasan_init_64.c | 3 ---
arch/x86/mm/pgtable.c | 4 +++
13 files changed, 26 insertions(+), 86 deletions(-)
base-commit: 54c2c688cd9305bdbab4883b9da6ff63f4deca5d
--
2.49.0.1101.gccaa498523-goog
Powered by blists - more mailing lists