[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z-09TLXNWv-msJ4O@google.com>
Date: Wed, 2 Apr 2025 06:36:12 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Jon Kohler <jon@...anix.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
"H. Peter Anvin" <hpa@...or.com>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
Emanuele Giuseppe Esposito <eesposit@...hat.com>, Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>
Subject: Re: [PATCH] KVM: x86: Expose ARCH_CAP_FB_CLEAR when invulnerable to MDS
On Mon, Mar 31, 2025, Jon Kohler wrote:
> Expose FB_CLEAR in arch_capabilities for certain MDS-invulnerable cases
> to support live migration from older hardware (e.g., Cascade Lake, Ice
> Lake) to newer hardware (e.g., Sapphire Rapids or higher). This ensures
> compatibility when user space has previously configured vCPUs to see
> FB_CLEAR (ARCH_CAPABILITIES Bit 17).
>
> Newer hardware sets the following bits but does not set FB_CLEAR, which
> can prevent user space from configuring a matching setup:
I looked at this again right after PUCK, and KVM does NOT actually prevent
userspace from matching the original, pre-SPR configuration. KVM effectively
treats ARCH_CAPABILITIES like a CPUID leaf, and lets userspace shove in any
value. I.e. userspace can still migrate+stuff FB_CLEAR irrespective of hardware
support, and thus there is no need for KVM to lie to userspace.
So in effect, this is a userspace problem where it's being too aggressive in its
sanity checks.
FWIW, even if KVM did reject unsupported ARCH_CAPABILITIES bits, I would still
say this is userspace's problem to solve. E.g. by using MSR filtering to
intercept and emulate RDMSR(ARCH_CAPABILITIES) in userspace.
> ARCH_CAP_MDS_NO
> ARCH_CAP_TAA_NO
> ARCH_CAP_PSDP_NO
> ARCH_CAP_FBSDP_NO
> ARCH_CAP_SBDR_SSDP_NO
>
> This change has minimal impact, as these bit combinations already mark
> the host as MMIO immune (via arch_cap_mmio_immune()) and set
> disable_fb_clear in vmx_update_fb_clear_dis(), resulting in no
> additional overhead.
>
> Cc: Emanuele Giuseppe Esposito <eesposit@...hat.com>
> Cc: Paolo Bonzini <pbonzini@...hat.com>
> Cc: Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>
> Signed-off-by: Jon Kohler <jon@...anix.com>
>
> ---
> arch/x86/kvm/x86.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c841817a914a..2a4337aa78cd 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1641,6 +1641,20 @@ static u64 kvm_get_arch_capabilities(void)
> if (!boot_cpu_has_bug(X86_BUG_GDS) || gds_ucode_mitigated())
> data |= ARCH_CAP_GDS_NO;
>
> + /*
> + * User space might set FB_CLEAR when starting a vCPU on a system
> + * that does not enumerate FB_CLEAR but is also invulnerable to
> + * other various MDS related bugs. To allow live migration from
> + * hosts that do implement FB_CLEAR, leave it enabled.
> + */
> + if ((data & ARCH_CAP_MDS_NO) &&
> + (data & ARCH_CAP_TAA_NO) &&
> + (data & ARCH_CAP_PSDP_NO) &&
> + (data & ARCH_CAP_FBSDP_NO) &&
> + (data & ARCH_CAP_SBDR_SSDP_NO)) {
> + data |= ARCH_CAP_FB_CLEAR;
> + }
> +
> return data;
> }
>
> --
> 2.43.0
>
Powered by blists - more mailing lists