[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aBdwwR52hI37bW9a@gmail.com>
Date: Sun, 4 May 2025 15:50:57 +0200
From: Ingo Molnar <mingo@...nel.org>
To: Ard Biesheuvel <ardb+git@...gle.com>
Cc: linux-kernel@...r.kernel.org, linux-efi@...r.kernel.org, x86@...nel.org,
Ard Biesheuvel <ardb@...nel.org>, Borislav Petkov <bp@...en8.de>,
Dionna Amalie Glaze <dionnaglaze@...gle.com>,
Kevin Loughlin <kevinloughlin@...gle.com>,
Tom Lendacky <thomas.lendacky@....com>,
Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [RFT PATCH v2 03/23] x86/boot: Drop global variables keeping
track of LA57 state
* Ard Biesheuvel <ardb+git@...gle.com> wrote:
> From: Ard Biesheuvel <ardb@...nel.org>
>
> On x86_64, the core kernel is entered in long mode, which implies that
> paging is enabled. This means that the CR4.LA57 control bit is
> guaranteed to be in sync with the number of paging levels used by the
> kernel, and there is no need to store this in a variable.
>
> There is also no need to use variables for storing the calculations of
> pgdir_shift and ptrs_per_p4d, as they are easily determined on the fly.
>
> This removes the need for two different sources of truth (i.e., early
> and late) for determining whether 5-level paging is in use: CR4.LA57
> always reflects the actual state, and never changes from the point of
> view of the 64-bit core kernel. It also removes the need for exposing
> the associated variables to the startup code. The only potential concern
> is the cost of CR4 accesses, which can be mitigated using alternatives
> patching based on feature detection.
>
> Note that even the decompressor does not manipulate any page tables
> before updating CR4.LA57, so it can also avoid the associated global
> variables entirely. However, as it does not implement alternatives
> patching, the associated ELF sections need to be discarded.
>
> Signed-off-by: Ard Biesheuvel <ardb@...nel.org>
> ---
> arch/x86/boot/compressed/misc.h | 4 --
> arch/x86/boot/compressed/pgtable_64.c | 12 ------
> arch/x86/boot/compressed/vmlinux.lds.S | 1 +
> arch/x86/boot/startup/map_kernel.c | 12 +-----
> arch/x86/boot/startup/sme.c | 9 ----
> arch/x86/include/asm/pgtable_64_types.h | 43 ++++++++++----------
> arch/x86/kernel/cpu/common.c | 2 -
> arch/x86/kernel/head64.c | 11 -----
> arch/x86/mm/kasan_init_64.c | 3 --
> 9 files changed, 24 insertions(+), 73 deletions(-)
So this patch breaks the build & creates header dependency hell on
x86-64 allnoconfig:
starship:~/tip> m kernel/pid.o
DESCEND objtool
CC arch/x86/kernel/asm-offsets.s
INSTALL libsubcmd_headers
In file included from ./arch/x86/include/asm/pgtable_64_types.h:5,
from ./arch/x86/include/asm/pgtable_types.h:283,
from ./arch/x86/include/asm/processor.h:21,
from ./arch/x86/include/asm/cpufeature.h:5,
from ./arch/x86/include/asm/thread_info.h:59,
from ./include/linux/thread_info.h:60,
from ./include/linux/spinlock.h:60,
from ./include/linux/swait.h:7,
from ./include/linux/completion.h:12,
from ./include/linux/crypto.h:15,
from arch/x86/kernel/asm-offsets.c:9:
./arch/x86/include/asm/sparsemem.h:29:34: warning: "pgtable_l5_enabled" is not defined, evaluates to 0 [-Wundef]
29 | # define MAX_PHYSMEM_BITS (pgtable_l5_enabled() ? 52 : 46)
| ^~~~~~~~~~~~~~~~~~
./include/linux/page-flags-layout.h:31:26: note: in expansion of macro ‘MAX_PHYSMEM_BITS’
31 | #define SECTIONS_SHIFT (MAX_PHYSMEM_BITS - SECTION_SIZE_BITS)
Plus I'm not sure I'm happy about this kind of complexity getting
embedded deep within low level MM primitives:
static __always_inline __pure bool pgtable_l5_enabled(void)
{
unsigned long r;
bool ret;
if (!IS_ENABLED(CONFIG_X86_5LEVEL))
return false;
asm(ALTERNATIVE_TERNARY(
"movq %%cr4, %[reg] \n\t btl %[la57], %k[reg]" CC_SET(c),
%P[feat], "stc", "clc")
: [reg] "=&r" (r), CC_OUT(c) (ret)
: [feat] "i" (X86_FEATURE_LA57),
[la57] "i" (X86_CR4_LA57_BIT)
: "cc");
return ret;
}
it's basically everywhere:
arch/x86/include/asm/page_64_types.h:#define __VIRTUAL_MASK_SHIFT (pgtable_l5_enabled() ? 56 : 47)
arch/x86/include/asm/paravirt.h: if (pgtable_l5_enabled()) \
arch/x86/include/asm/paravirt.h: if (pgtable_l5_enabled()) \
arch/x86/include/asm/pgalloc.h: if (!pgtable_l5_enabled())
arch/x86/include/asm/pgalloc.h: if (!pgtable_l5_enabled())
arch/x86/include/asm/pgalloc.h: if (pgtable_l5_enabled())
arch/x86/include/asm/pgtable.h:#define pgd_clear(pgd) (pgtable_l5_enabled() ? native_pgd_clear(pgd) : 0)
arch/x86/include/asm/pgtable.h: if (!pgtable_l5_enabled())
arch/x86/include/asm/pgtable.h: if (!pgtable_l5_enabled())
arch/x86/include/asm/pgtable.h: if (!pgtable_l5_enabled())
arch/x86/include/asm/pgtable.h: if (!pgtable_l5_enabled())
arch/x86/include/asm/pgtable_32_types.h:#define pgtable_l5_enabled() 0
arch/x86/include/asm/pgtable_64.h: return !pgtable_l5_enabled();
arch/x86/include/asm/pgtable_64.h: if (pgtable_l5_enabled() ||
arch/x86/include/asm/pgtable_64_types.h:static __always_inline __pure bool pgtable_l5_enabled(void)
arch/x86/include/asm/pgtable_64_types.h:#define PGDIR_SHIFT (pgtable_l5_enabled() ? 48 : 39)
arch/x86/include/asm/pgtable_64_types.h:#define PTRS_PER_P4D (pgtable_l5_enabled() ? 512 : 1)
arch/x86/include/asm/pgtable_64_types.h:# define VMALLOC_SIZE_TB (pgtable_l5_enabled() ? VMALLOC_SIZE_TB_L5 : VMALLOC_SIZE_TB_L4)
arch/x86/include/asm/sparsemem.h:# define MAX_PHYSMEM_BITS (pgtable_l5_enabled() ? 52 : 46)
Inlined approximately a gazillion times. (449 times on x86 defconfig.
Yes, I just counted it.)
And it's not even worth it, as it generates horrendous code:
154: 0f 20 e0 mov %cr4,%rax
157: 0f ba e0 0c bt $0xc,%eax
... while CR4 access might be faster these days, it's certainly not as
fast as simple percpu access. Plus it clobbers a register (RAX in the
example above), which is unnecessary for a flag test.
Cannot pgtable_l5_enabled() be a single, simple percpu flag or so?
And yes, this creates another layer for these values - but thus
decouples low level MM from detection & implementation complexities,
which is a plus ...
Thanks,
Ingo
Powered by blists - more mailing lists