lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aCZ3vvKFMcKcc3h-@google.com>
Date: Thu, 15 May 2025 16:24:46 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Ard Biesheuvel <ardb@...nel.org>
Cc: Ingo Molnar <mingo@...nel.org>, Ard Biesheuvel <ardb+git@...gle.com>, linux-kernel@...r.kernel.org, 
	x86@...nel.org, Linus Torvalds <torvalds@...ux-foundation.org>, 
	Brian Gerst <brgerst@...il.com>
Subject: Re: [PATCH v3 1/7] x86/cpu: Use a new feature flag for 5 level paging

On Thu, May 15, 2025, Ard Biesheuvel wrote:
> On Thu, 15 May 2025 at 08:45, Ingo Molnar <mingo@...nel.org> wrote:
> > > diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> > > index f67a93fc9391..d59bee5907e7 100644
> > > --- a/arch/x86/include/asm/cpufeatures.h
> > > +++ b/arch/x86/include/asm/cpufeatures.h
> > > @@ -395,7 +395,7 @@
> > >  #define X86_FEATURE_AVX512_BITALG    (16*32+12) /* "avx512_bitalg" Support for VPOPCNT[B,W] and VPSHUF-BITQMB instructions */
> > >  #define X86_FEATURE_TME                      (16*32+13) /* "tme" Intel Total Memory Encryption */
> > >  #define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* "avx512_vpopcntdq" POPCNT for vectors of DW/QW */
> > > -#define X86_FEATURE_LA57             (16*32+16) /* "la57" 5-level page tables */
> > > +#define X86_FEATURE_LA57             (16*32+16) /* "la57_hw" 5-level page tables */
> > >  #define X86_FEATURE_RDPID            (16*32+22) /* "rdpid" RDPID instruction */
> > >  #define X86_FEATURE_BUS_LOCK_DETECT  (16*32+24) /* "bus_lock_detect" Bus Lock detect */
> > >  #define X86_FEATURE_CLDEMOTE         (16*32+25) /* "cldemote" CLDEMOTE instruction */
> > > @@ -483,6 +483,7 @@
> > >  #define X86_FEATURE_PREFER_YMM               (21*32+ 8) /* Avoid ZMM registers due to downclocking */
> > >  #define X86_FEATURE_APX                      (21*32+ 9) /* Advanced Performance Extensions */
> > >  #define X86_FEATURE_INDIRECT_THUNK_ITS       (21*32+10) /* Use thunk for indirect branches in lower half of cacheline */
> > > +#define X86_FEATURE_5LEVEL_PAGING    (21*32+11) /* "la57" Whether 5 levels of page tables are in use */
> >
> > So there's a new complication here, KVM doesn't like the use of
> > synthethic CPU flags, for understandable reasons:
> >
> >   inlined from ‘intel_pmu_set_msr’ at arch/x86/kvm/vmx/pmu_intel.c:369:7:
> >   ...
> >   ./arch/x86/kvm/reverse_cpuid.h:102:9: note: in expansion of macro ‘BUILD_BUG_ON’
> >     102 |         BUILD_BUG_ON(x86_leaf == CPUID_LNX_5);
> >         |         ^~~~~~~~~~~~
> >
> > (See x86-64 allmodconfig)
> >
> > Even though previously X86_FEATURE_LA57 was effectively a synthethic
> > CPU flag (it got artificially turned off by the Linux kernel if 5-level
> > paging was disabled) ...

KVM doesn't care if the kernel clears CPUID feature flags.  In fact, KVM depends
on it in many cases.  What KVM cares about is that the bit number matches CPUID
(hardware defined bit) for features that are exposed to guests.

> > So I think the most straightforward solution would be to do the change
> > below, and pass through LA57 flag if 5-level paging is enabled in the
> > host kernel. This is similar to as if the firmware turned off LA57, and
> > it doesn't bring in all the early-boot complications bare metal has. It
> > should also match the previous behavior I think.
> >
> > Thoughts?
> >
> > Thanks,
> >
> >         Ingo
> >
> > =================>
> >
> >  arch/x86/kvm/cpuid.c | 6 ++++++
> >  arch/x86/kvm/x86.h   | 4 ++--
> >  2 files changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index 571c906ffcbf..d951d71aea3b 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -1221,6 +1221,12 @@ void kvm_set_cpu_caps(void)
> >                 kvm_cpu_cap_clear(X86_FEATURE_RDTSCP);
> >                 kvm_cpu_cap_clear(X86_FEATURE_RDPID);
> >         }
> > +       /*
> > +        * Clear the LA57 flag in the guest if the host kernel
> > +        * does not have 5-level paging support:

No, KVM very intentionally goes out of it's way to support LA57 in the guest
irrespective of host kernel support.  I.e. KVM already looks at CPUID directly
and supports virtualizing LA57 if it's supported in hardware, the kernel's
feature flags are ignored (for LA57).

> > +        */
> > +       if (kvm_cpu_cap_has(X86_FEATURE_LA57) && !pgtable_l5_enabled())
> > +               kvm_cpu_cap_clear(X86_FEATURE_LA57);
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_set_cpu_caps);
> >
> > diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> > index d2c093f17ae5..9dc32a409076 100644
> > --- a/arch/x86/kvm/x86.h
> > +++ b/arch/x86/kvm/x86.h
> > @@ -243,7 +243,7 @@ static inline u8 vcpu_virt_addr_bits(struct kvm_vcpu *vcpu)
> >
> >  static inline u8 max_host_virt_addr_bits(void)
> >  {
> > -       return kvm_cpu_cap_has(X86_FEATURE_5LEVEL_PAGING) ? 57 : 48;
> > +       return kvm_cpu_cap_has(X86_FEATURE_LA57) ? 57 : 48;
> 
> Based on the comment that documents is_noncanonical_address(), it
> seems to me that this should be using cpu_has(X86_FEATURE_LA57) rather
> than kvm_cpu_cap_has(X86_FEATURE_LA57). Whether the host actually runs
> with 5-level paging should be irrelevant, it only matters whether it
> is capable of doing so.

Ard's patches are completely wrong for KVM, and amusingly, completely unecessary.

Simply drop all of the KVM changes, including the selftest change.  And if you
want to save yourselves some time in the future, use scripts/get_maintainer.pl.

> >  }
> >
> >  /*
> > @@ -603,7 +603,7 @@ static inline bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> >                 __reserved_bits |= X86_CR4_FSGSBASE;    \
> >         if (!__cpu_has(__c, X86_FEATURE_PKU))           \
> >                 __reserved_bits |= X86_CR4_PKE;         \
> > -       if (!__cpu_has(__c, X86_FEATURE_5LEVEL_PAGING))          \
> > +       if (!__cpu_has(__c, X86_FEATURE_LA57))          \
> >                 __reserved_bits |= X86_CR4_LA57;        \
> 
> This could be
> 
> if (!__cpu_has(__c, X86_FEATURE_LA57) || !pgtable_l5_enabled())
>     __reserved_bits |= X86_CR4_LA57;

No, all of this is wrong.  __kvm_is_valid_cr4() is used check guest CR4 against
KVM's supported set of features CPUID, and also to check guest CR4 against the
vCPU's supported set of features.

> 
> >         if (!__cpu_has(__c, X86_FEATURE_UMIP))          \
> >                 __reserved_bits |= X86_CR4_UMIP;        \

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ