[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ct3z5lyozftd2tkzfksc6ylbh7ubeonuww2t77voymuy5egyo2@ocqfhd6gnbti>
Date: Thu, 15 May 2025 11:07:04 +0300
From: "Kirill A. Shutemov" <kirill@...temov.name>
To: Ingo Molnar <mingo@...nel.org>
Cc: Ard Biesheuvel <ardb+git@...gle.com>, linux-kernel@...r.kernel.org,
x86@...nel.org, Ard Biesheuvel <ardb@...nel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>, Brian Gerst <brgerst@...il.com>
Subject: Re: [PATCH v3 1/7] x86/cpu: Use a new feature flag for 5 level paging
On Thu, May 15, 2025 at 09:45:52AM +0200, Ingo Molnar wrote:
>
> * Ard Biesheuvel <ardb+git@...gle.com> wrote:
>
> > From: Ard Biesheuvel <ardb@...nel.org>
> >
> > Currently, the LA57 CPU feature flag is taken to mean two different
> > things at once:
> > - whether the CPU implements the LA57 extension, and is therefore
> > capable of supporting 5 level paging;
> > - whether 5 level paging is currently in use.
> >
> > This means the LA57 capability of the hardware is hidden when a LA57
> > capable CPU is forced to run with 4 levels of paging. It also means the
> > the ordinary CPU capability detection code will happily set the LA57
> > capability and it needs to be cleared explicitly afterwards to avoid
> > inconsistencies.
> >
> > Separate the two so that the CPU hardware capability can be identified
> > unambigously in all cases.
> >
> > To avoid breaking existing users that might assume that 5 level paging
> > is being used when the "la57" string is visible in /proc/cpuinfo,
> > repurpose that string to mean that 5-level paging is in use, and add a
> > new string la57_capable to indicate that the CPU feature is implemented
> > by the hardware.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@...nel.org>
> > ---
> > arch/x86/include/asm/cpufeatures.h | 3 ++-
> > arch/x86/include/asm/page_64.h | 2 +-
> > arch/x86/include/asm/pgtable_64_types.h | 2 +-
> > arch/x86/kernel/cpu/common.c | 16 ++--------------
> > arch/x86/kvm/x86.h | 4 ++--
> > drivers/iommu/amd/init.c | 4 ++--
> > drivers/iommu/intel/svm.c | 4 ++--
> > tools/testing/selftests/kvm/x86/set_sregs_test.c | 2 +-
> > 8 files changed, 13 insertions(+), 24 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> > index f67a93fc9391..d59bee5907e7 100644
> > --- a/arch/x86/include/asm/cpufeatures.h
> > +++ b/arch/x86/include/asm/cpufeatures.h
> > @@ -395,7 +395,7 @@
> > #define X86_FEATURE_AVX512_BITALG (16*32+12) /* "avx512_bitalg" Support for VPOPCNT[B,W] and VPSHUF-BITQMB instructions */
> > #define X86_FEATURE_TME (16*32+13) /* "tme" Intel Total Memory Encryption */
> > #define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* "avx512_vpopcntdq" POPCNT for vectors of DW/QW */
> > -#define X86_FEATURE_LA57 (16*32+16) /* "la57" 5-level page tables */
> > +#define X86_FEATURE_LA57 (16*32+16) /* "la57_hw" 5-level page tables */
> > #define X86_FEATURE_RDPID (16*32+22) /* "rdpid" RDPID instruction */
> > #define X86_FEATURE_BUS_LOCK_DETECT (16*32+24) /* "bus_lock_detect" Bus Lock detect */
> > #define X86_FEATURE_CLDEMOTE (16*32+25) /* "cldemote" CLDEMOTE instruction */
> > @@ -483,6 +483,7 @@
> > #define X86_FEATURE_PREFER_YMM (21*32+ 8) /* Avoid ZMM registers due to downclocking */
> > #define X86_FEATURE_APX (21*32+ 9) /* Advanced Performance Extensions */
> > #define X86_FEATURE_INDIRECT_THUNK_ITS (21*32+10) /* Use thunk for indirect branches in lower half of cacheline */
> > +#define X86_FEATURE_5LEVEL_PAGING (21*32+11) /* "la57" Whether 5 levels of page tables are in use */
>
> So there's a new complication here, KVM doesn't like the use of
> synthethic CPU flags, for understandable reasons:
>
> inlined from ‘intel_pmu_set_msr’ at arch/x86/kvm/vmx/pmu_intel.c:369:7:
> ...
> ./arch/x86/kvm/reverse_cpuid.h:102:9: note: in expansion of macro ‘BUILD_BUG_ON’
> 102 | BUILD_BUG_ON(x86_leaf == CPUID_LNX_5);
> | ^~~~~~~~~~~~
>
> (See x86-64 allmodconfig)
>
> Even though previously X86_FEATURE_LA57 was effectively a synthethic
> CPU flag (it got artificially turned off by the Linux kernel if 5-level
> paging was disabled) ...
>
> So I think the most straightforward solution would be to do the change
> below, and pass through LA57 flag if 5-level paging is enabled in the
> host kernel. This is similar to as if the firmware turned off LA57, and
> it doesn't bring in all the early-boot complications bare metal has. It
> should also match the previous behavior I think.
>
> Thoughts?
>
> Thanks,
>
> Ingo
>
> =================>
>
> arch/x86/kvm/cpuid.c | 6 ++++++
> arch/x86/kvm/x86.h | 4 ++--
> 2 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 571c906ffcbf..d951d71aea3b 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -1221,6 +1221,12 @@ void kvm_set_cpu_caps(void)
> kvm_cpu_cap_clear(X86_FEATURE_RDTSCP);
> kvm_cpu_cap_clear(X86_FEATURE_RDPID);
> }
> + /*
> + * Clear the LA57 flag in the guest if the host kernel
> + * does not have 5-level paging support:
> + */
> + if (kvm_cpu_cap_has(X86_FEATURE_LA57) && !pgtable_l5_enabled())
X86_FEATURE_LA57 check seems redundant.
> + kvm_cpu_cap_clear(X86_FEATURE_LA57);
> }
> EXPORT_SYMBOL_GPL(kvm_set_cpu_caps);
>
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index d2c093f17ae5..9dc32a409076 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -243,7 +243,7 @@ static inline u8 vcpu_virt_addr_bits(struct kvm_vcpu *vcpu)
>
> static inline u8 max_host_virt_addr_bits(void)
> {
> - return kvm_cpu_cap_has(X86_FEATURE_5LEVEL_PAGING) ? 57 : 48;
> + return kvm_cpu_cap_has(X86_FEATURE_LA57) ? 57 : 48;
> }
>
> /*
> @@ -603,7 +603,7 @@ static inline bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> __reserved_bits |= X86_CR4_FSGSBASE; \
> if (!__cpu_has(__c, X86_FEATURE_PKU)) \
> __reserved_bits |= X86_CR4_PKE; \
> - if (!__cpu_has(__c, X86_FEATURE_5LEVEL_PAGING)) \
> + if (!__cpu_has(__c, X86_FEATURE_LA57)) \
> __reserved_bits |= X86_CR4_LA57; \
> if (!__cpu_has(__c, X86_FEATURE_UMIP)) \
> __reserved_bits |= X86_CR4_UMIP; \
--
Kiryl Shutsemau / Kirill A. Shutemov
Powered by blists - more mailing lists