linux-kernel - Re: [PATCH] KVM: x86: Expose ARCH_CAP_FB

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z-09TLXNWv-msJ4O@google.com>
Date: Wed, 2 Apr 2025 06:36:12 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Jon Kohler <jon@...anix.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, Thomas Gleixner <tglx@...utronix.de>, 
	Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, 
	Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org, 
	"H. Peter Anvin" <hpa@...or.com>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Emanuele Giuseppe Esposito <eesposit@...hat.com>, Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>
Subject: Re: [PATCH] KVM: x86: Expose ARCH_CAP_FB_CLEAR when invulnerable to MDS

On Mon, Mar 31, 2025, Jon Kohler wrote:
> Expose FB_CLEAR in arch_capabilities for certain MDS-invulnerable cases 
> to support live migration from older hardware (e.g., Cascade Lake, Ice 
> Lake) to newer hardware (e.g., Sapphire Rapids or higher). This ensures 
> compatibility when user space has previously configured vCPUs to see 
> FB_CLEAR (ARCH_CAPABILITIES Bit 17).
> 
> Newer hardware sets the following bits but does not set FB_CLEAR, which 
> can prevent user space from configuring a matching setup:

I looked at this again right after PUCK, and KVM does NOT actually prevent
userspace from matching the original, pre-SPR configuration.  KVM effectively
treats ARCH_CAPABILITIES like a CPUID leaf, and lets userspace shove in any
value.  I.e. userspace can still migrate+stuff FB_CLEAR irrespective of hardware
support, and thus there is no need for KVM to lie to userspace.

So in effect, this is a userspace problem where it's being too aggressive in its
sanity checks.

FWIW, even if KVM did reject unsupported ARCH_CAPABILITIES bits, I would still
say this is userspace's problem to solve.  E.g. by using MSR filtering to
intercept and emulate RDMSR(ARCH_CAPABILITIES) in userspace.

>     ARCH_CAP_MDS_NO
>     ARCH_CAP_TAA_NO
>     ARCH_CAP_PSDP_NO
>     ARCH_CAP_FBSDP_NO
>     ARCH_CAP_SBDR_SSDP_NO
> 
> This change has minimal impact, as these bit combinations already mark 
> the host as MMIO immune (via arch_cap_mmio_immune()) and set 
> disable_fb_clear in vmx_update_fb_clear_dis(), resulting in no 
> additional overhead.
> 
> Cc: Emanuele Giuseppe Esposito <eesposit@...hat.com>
> Cc: Paolo Bonzini <pbonzini@...hat.com>
> Cc: Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>
> Signed-off-by: Jon Kohler <jon@...anix.com>
> 
> ---
>  arch/x86/kvm/x86.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c841817a914a..2a4337aa78cd 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1641,6 +1641,20 @@ static u64 kvm_get_arch_capabilities(void)
>  	if (!boot_cpu_has_bug(X86_BUG_GDS) || gds_ucode_mitigated())
>  		data |= ARCH_CAP_GDS_NO;
>  
> +	/*
> +	 * User space might set FB_CLEAR when starting a vCPU on a system
> +	 * that does not enumerate FB_CLEAR but is also invulnerable to
> +	 * other various MDS related bugs. To allow live migration from
> +	 * hosts that do implement FB_CLEAR, leave it enabled.
> +	 */
> +	if ((data & ARCH_CAP_MDS_NO) &&
> +	    (data & ARCH_CAP_TAA_NO) &&
> +	    (data & ARCH_CAP_PSDP_NO) &&
> +	    (data & ARCH_CAP_FBSDP_NO) &&
> +	    (data & ARCH_CAP_SBDR_SSDP_NO)) {
> +		data |= ARCH_CAP_FB_CLEAR;
> +	}
> +
>  	return data;
>  }
>  
> -- 
> 2.43.0
>