lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMj1kXFuJRsdWxL70S9Hisgye0dci7KOxrSzcLGnFFuUvjk2Mg@mail.gmail.com>
Date: Sun, 4 May 2025 16:46:07 +0200
From: Ard Biesheuvel <ardb@...nel.org>
To: Ingo Molnar <mingo@...nel.org>
Cc: Ard Biesheuvel <ardb+git@...gle.com>, linux-kernel@...r.kernel.org, 
	linux-efi@...r.kernel.org, x86@...nel.org, Borislav Petkov <bp@...en8.de>, 
	Dionna Amalie Glaze <dionnaglaze@...gle.com>, Kevin Loughlin <kevinloughlin@...gle.com>, 
	Tom Lendacky <thomas.lendacky@....com>, Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [RFT PATCH v2 03/23] x86/boot: Drop global variables keeping
 track of LA57 state

On Sun, 4 May 2025 at 15:51, Ingo Molnar <mingo@...nel.org> wrote:
>
>
> * Ard Biesheuvel <ardb+git@...gle.com> wrote:
>
> > From: Ard Biesheuvel <ardb@...nel.org>
> >
> > On x86_64, the core kernel is entered in long mode, which implies that
> > paging is enabled. This means that the CR4.LA57 control bit is
> > guaranteed to be in sync with the number of paging levels used by the
> > kernel, and there is no need to store this in a variable.
> >
> > There is also no need to use variables for storing the calculations of
> > pgdir_shift and ptrs_per_p4d, as they are easily determined on the fly.
> >
> > This removes the need for two different sources of truth (i.e., early
> > and late) for determining whether 5-level paging is in use: CR4.LA57
> > always reflects the actual state, and never changes from the point of
> > view of the 64-bit core kernel. It also removes the need for exposing
> > the associated variables to the startup code. The only potential concern
> > is the cost of CR4 accesses, which can be mitigated using alternatives
> > patching based on feature detection.
> >
> > Note that even the decompressor does not manipulate any page tables
> > before updating CR4.LA57, so it can also avoid the associated global
> > variables entirely. However, as it does not implement alternatives
> > patching, the associated ELF sections need to be discarded.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@...nel.org>
> > ---
> >  arch/x86/boot/compressed/misc.h         |  4 --
> >  arch/x86/boot/compressed/pgtable_64.c   | 12 ------
> >  arch/x86/boot/compressed/vmlinux.lds.S  |  1 +
> >  arch/x86/boot/startup/map_kernel.c      | 12 +-----
> >  arch/x86/boot/startup/sme.c             |  9 ----
> >  arch/x86/include/asm/pgtable_64_types.h | 43 ++++++++++----------
> >  arch/x86/kernel/cpu/common.c            |  2 -
> >  arch/x86/kernel/head64.c                | 11 -----
> >  arch/x86/mm/kasan_init_64.c             |  3 --
> >  9 files changed, 24 insertions(+), 73 deletions(-)
>
> So this patch breaks the build & creates header dependency hell on
> x86-64 allnoconfig:
>

Ugh

...
> Plus I'm not sure I'm happy about this kind of complexity getting
> embedded deep within low level MM primitives:
>
>   static __always_inline __pure bool pgtable_l5_enabled(void)
>   {
>         unsigned long r;
>         bool ret;
>
>         if (!IS_ENABLED(CONFIG_X86_5LEVEL))
>                 return false;
>
>         asm(ALTERNATIVE_TERNARY(
>                  "movq %%cr4, %[reg] \n\t btl %[la57], %k[reg]" CC_SET(c),
>                  %P[feat], "stc", "clc")
>                  : [reg] "=&r" (r), CC_OUT(c) (ret)
>                  : [feat] "i"  (X86_FEATURE_LA57),
>                    [la57] "i"  (X86_CR4_LA57_BIT)
>                  : "cc");
>
>         return ret;
>   }
>
...
>
> Inlined approximately a gazillion times. (449 times on x86 defconfig.
> Yes, I just counted it.)
>
> And it's not even worth it, as it generates horrendous code:
>
>    154:   0f 20 e0                mov    %cr4,%rax
>    157:   0f ba e0 0c             bt     $0xc,%eax
>
> ... while CR4 access might be faster these days, it's certainly not as
> fast as simple percpu access. Plus it clobbers a register (RAX in the
> example above), which is unnecessary for a flag test.
>

It's an alternative, so this will be patched into stc or clc. But it
will still clobber a register.

> Cannot pgtable_l5_enabled() be a single, simple percpu flag or so?
>

We can just drop this patch, and I'll work around it by adding another
couple of SYM_PIC_ALIAS()es for these variables.

> And yes, this creates another layer for these values - but thus
> decouples low level MM from detection & implementation complexities,
> which is a plus ...
>

If you prefer to retain the early vs late distinction, where the late
one is more efficient, we could replace the early variant with the CR4
access. But frankly, if we go down that road, I'd prefer to just share
these early variables with the startup code.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ