linux-kernel - Re: [PATCH] KVM: x86: Expose ARCH_CAP_FB

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <BA4DD4E6-D507-4BB9-8CC4-50043049DC86@nutanix.com>
Date: Wed, 2 Apr 2025 13:46:31 +0000
From: Jon Kohler <jon@...anix.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: Paolo Bonzini <pbonzini@...hat.com>, Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen
	<dave.hansen@...ux.intel.com>,
        "x86@...nel.org" <x86@...nel.org>, "H. Peter
 Anvin" <hpa@...or.com>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Emanuele
 Giuseppe Esposito <eesposit@...hat.com>,
        Pawan Gupta
	<pawan.kumar.gupta@...ux.intel.com>
Subject: Re: [PATCH] KVM: x86: Expose ARCH_CAP_FB_CLEAR when invulnerable to
 MDS



> On Apr 2, 2025, at 9:36 AM, Sean Christopherson <seanjc@...gle.com> wrote:
> 
> !-------------------------------------------------------------------|
>  CAUTION: External Email
> 
> |-------------------------------------------------------------------!
> 
> On Mon, Mar 31, 2025, Jon Kohler wrote:
>> Expose FB_CLEAR in arch_capabilities for certain MDS-invulnerable cases 
>> to support live migration from older hardware (e.g., Cascade Lake, Ice 
>> Lake) to newer hardware (e.g., Sapphire Rapids or higher). This ensures 
>> compatibility when user space has previously configured vCPUs to see 
>> FB_CLEAR (ARCH_CAPABILITIES Bit 17).
>> 
>> Newer hardware sets the following bits but does not set FB_CLEAR, which 
>> can prevent user space from configuring a matching setup:
> 
> I looked at this again right after PUCK, and KVM does NOT actually prevent
> userspace from matching the original, pre-SPR configuration.  KVM effectively
> treats ARCH_CAPABILITIES like a CPUID leaf, and lets userspace shove in any
> value.  I.e. userspace can still migrate+stuff FB_CLEAR irrespective of hardware
> support, and thus there is no need for KVM to lie to userspace.
> 
> So in effect, this is a userspace problem where it's being too aggressive in its
> sanity checks.
> 
> FWIW, even if KVM did reject unsupported ARCH_CAPABILITIES bits, I would still
> say this is userspace's problem to solve.  E.g. by using MSR filtering to
> intercept and emulate RDMSR(ARCH_CAPABILITIES) in userspace.

Thanks, Sean, I appreciate it. I’ll see what sort of trouble I can get in on the user
space side of the house with qemu to see if there is a clean way to special case
this.

Cheers, Jon

> 
>>    ARCH_CAP_MDS_NO
>>    ARCH_CAP_TAA_NO
>>    ARCH_CAP_PSDP_NO
>>    ARCH_CAP_FBSDP_NO
>>    ARCH_CAP_SBDR_SSDP_NO
>> 
>> This change has minimal impact, as these bit combinations already mark 
>> the host as MMIO immune (via arch_cap_mmio_immune()) and set 
>> disable_fb_clear in vmx_update_fb_clear_dis(), resulting in no 
>> additional overhead.
>> 
>> Cc: Emanuele Giuseppe Esposito <eesposit@...hat.com>
>> Cc: Paolo Bonzini <pbonzini@...hat.com>
>> Cc: Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>
>> Signed-off-by: Jon Kohler <jon@...anix.com>
>> 
>> ---
>> arch/x86/kvm/x86.c | 14 ++++++++++++++
>> 1 file changed, 14 insertions(+)
>> 
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index c841817a914a..2a4337aa78cd 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -1641,6 +1641,20 @@ static u64 kvm_get_arch_capabilities(void)
>> if (!boot_cpu_has_bug(X86_BUG_GDS) || gds_ucode_mitigated())
>> data |= ARCH_CAP_GDS_NO;
>> 
>> + /*
>> + * User space might set FB_CLEAR when starting a vCPU on a system
>> + * that does not enumerate FB_CLEAR but is also invulnerable to
>> + * other various MDS related bugs. To allow live migration from
>> + * hosts that do implement FB_CLEAR, leave it enabled.
>> + */
>> + if ((data & ARCH_CAP_MDS_NO) &&
>> +    (data & ARCH_CAP_TAA_NO) &&
>> +    (data & ARCH_CAP_PSDP_NO) &&
>> +    (data & ARCH_CAP_FBSDP_NO) &&
>> +    (data & ARCH_CAP_SBDR_SSDP_NO)) {
>> + data |= ARCH_CAP_FB_CLEAR;
>> + }
>> +
>> return data;
>> }
>> 
>> -- 
>> 2.43.0
>>