[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMj1kXFuJRsdWxL70S9Hisgye0dci7KOxrSzcLGnFFuUvjk2Mg@mail.gmail.com>
Date: Sun, 4 May 2025 16:46:07 +0200
From: Ard Biesheuvel <ardb@...nel.org>
To: Ingo Molnar <mingo@...nel.org>
Cc: Ard Biesheuvel <ardb+git@...gle.com>, linux-kernel@...r.kernel.org,
linux-efi@...r.kernel.org, x86@...nel.org, Borislav Petkov <bp@...en8.de>,
Dionna Amalie Glaze <dionnaglaze@...gle.com>, Kevin Loughlin <kevinloughlin@...gle.com>,
Tom Lendacky <thomas.lendacky@....com>, Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [RFT PATCH v2 03/23] x86/boot: Drop global variables keeping
track of LA57 state
On Sun, 4 May 2025 at 15:51, Ingo Molnar <mingo@...nel.org> wrote:
>
>
> * Ard Biesheuvel <ardb+git@...gle.com> wrote:
>
> > From: Ard Biesheuvel <ardb@...nel.org>
> >
> > On x86_64, the core kernel is entered in long mode, which implies that
> > paging is enabled. This means that the CR4.LA57 control bit is
> > guaranteed to be in sync with the number of paging levels used by the
> > kernel, and there is no need to store this in a variable.
> >
> > There is also no need to use variables for storing the calculations of
> > pgdir_shift and ptrs_per_p4d, as they are easily determined on the fly.
> >
> > This removes the need for two different sources of truth (i.e., early
> > and late) for determining whether 5-level paging is in use: CR4.LA57
> > always reflects the actual state, and never changes from the point of
> > view of the 64-bit core kernel. It also removes the need for exposing
> > the associated variables to the startup code. The only potential concern
> > is the cost of CR4 accesses, which can be mitigated using alternatives
> > patching based on feature detection.
> >
> > Note that even the decompressor does not manipulate any page tables
> > before updating CR4.LA57, so it can also avoid the associated global
> > variables entirely. However, as it does not implement alternatives
> > patching, the associated ELF sections need to be discarded.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@...nel.org>
> > ---
> > arch/x86/boot/compressed/misc.h | 4 --
> > arch/x86/boot/compressed/pgtable_64.c | 12 ------
> > arch/x86/boot/compressed/vmlinux.lds.S | 1 +
> > arch/x86/boot/startup/map_kernel.c | 12 +-----
> > arch/x86/boot/startup/sme.c | 9 ----
> > arch/x86/include/asm/pgtable_64_types.h | 43 ++++++++++----------
> > arch/x86/kernel/cpu/common.c | 2 -
> > arch/x86/kernel/head64.c | 11 -----
> > arch/x86/mm/kasan_init_64.c | 3 --
> > 9 files changed, 24 insertions(+), 73 deletions(-)
>
> So this patch breaks the build & creates header dependency hell on
> x86-64 allnoconfig:
>
Ugh
...
> Plus I'm not sure I'm happy about this kind of complexity getting
> embedded deep within low level MM primitives:
>
> static __always_inline __pure bool pgtable_l5_enabled(void)
> {
> unsigned long r;
> bool ret;
>
> if (!IS_ENABLED(CONFIG_X86_5LEVEL))
> return false;
>
> asm(ALTERNATIVE_TERNARY(
> "movq %%cr4, %[reg] \n\t btl %[la57], %k[reg]" CC_SET(c),
> %P[feat], "stc", "clc")
> : [reg] "=&r" (r), CC_OUT(c) (ret)
> : [feat] "i" (X86_FEATURE_LA57),
> [la57] "i" (X86_CR4_LA57_BIT)
> : "cc");
>
> return ret;
> }
>
...
>
> Inlined approximately a gazillion times. (449 times on x86 defconfig.
> Yes, I just counted it.)
>
> And it's not even worth it, as it generates horrendous code:
>
> 154: 0f 20 e0 mov %cr4,%rax
> 157: 0f ba e0 0c bt $0xc,%eax
>
> ... while CR4 access might be faster these days, it's certainly not as
> fast as simple percpu access. Plus it clobbers a register (RAX in the
> example above), which is unnecessary for a flag test.
>
It's an alternative, so this will be patched into stc or clc. But it
will still clobber a register.
> Cannot pgtable_l5_enabled() be a single, simple percpu flag or so?
>
We can just drop this patch, and I'll work around it by adding another
couple of SYM_PIC_ALIAS()es for these variables.
> And yes, this creates another layer for these values - but thus
> decouples low level MM from detection & implementation complexities,
> which is a plus ...
>
If you prefer to retain the early vs late distinction, where the late
one is more efficient, we could replace the early variant with the CR4
access. But frankly, if we go down that road, I'd prefer to just share
these early variables with the startup code.
Powered by blists - more mailing lists